Projects

ViT Explainability and Token Analysis for Image Classification
Attention maps generated using the Chefer method and rollout technique illustrate how ViTs attend to different image regions. Introducing register tokens into a ViT-B model enhances feature representation, achieving over 94% accuracy on ImageNet-100. Token norm distributions are analyzed across transformer layers to assess their impact on classification performance. A comparative study between standard and modified ViT models highlights the role of register tokens in improving feature map smoothness and classification reliability.
​

Autonomous Navigation of a Differential Drive Robot
This project focuses on the development and programming of the MBot, a mobile robot equipped with sensors and a Jetson Nano compute module. The robot's perception capabilities were enhanced through Apriltag detection and SLAM, enabling mapping and interaction with its environment.
​

Denoising Diffusion on Two-Pixel Images
Explores the fundamentals of Denoising Diffusion Probabilistic Models (DDPM) in a simplified two-pixel image space, providing a fully visualizable representation of learned generative distributions. A lightweight conditional UNet is trained to predict noise in the reverse diffusion process, incorporating sinusoidal beta scheduling and classifier-free guidance for controlled image generation.
​

Vision-Guided Robotic Manipulation
This project explores the development of a robotic arm capable of detecting and manipulating objects using a combination of kinematic control and computer vision. The control system is built on forward and inverse kinematics, utilizing the product of exponentials approach for precise motion planning.

Point Cloud Processing and Segmentation with PointNet
​PointNet is a deep learning architecture designed for processing and segmenting 3D point cloud data. In this project custom dataset loader efficiently handles LiDAR-based point clouds, implementing preprocessing techniques like random downsampling and batch collation. The network architecture extracts hierarchical feature representations using a PointNet encoder and segmentation module, integrating local and global features for improved accuracy
​

Obstacle Aware Planning on BEVFormer Generated Environments with MPC-CBF
This work builds on BEVFormer — a state of the art framework for generating a 2D bird's-eye-view representation of the environment around a vehicle. We use these outputs as the inputs to our planner, which is a model predictive controller augmented with control barrier functions (hence MPC-CBF) for obstacle avoidance.
​