top of page

Projects

submission_Q3_A.png
ViT Explainability and Token Analysis for Image Classification

Attention maps generated using the Chefer method and rollout technique illustrate how ViTs attend to different image regions. Introducing register tokens into a ViT-B model enhances feature representation, achieving over 94% accuracy on ImageNet-100. Token norm distributions are analyzed across transformer layers to assess their impact on classification performance. A comparative study between standard and modified ViT models highlights the role of register tokens in improving feature map smoothness and classification reliability.

​

github

WhatsApp Image 2025-02-02 at 10.52.33 AM.jpeg
Autonomous Navigation of a Differential Drive Robot

This project focuses on the development and programming of the MBot, a mobile robot equipped with sensors and a Jetson Nano compute module. The robot's perception capabilities were enhanced through Apriltag detection and SLAM, enabling mapping and interaction with its environment.

​

Report

Screenshot from 2025-02-01 19-24-00.png
Denoising Diffusion on Two-Pixel Images

Explores the fundamentals of Denoising Diffusion Probabilistic Models (DDPM) in a simplified two-pixel image space, providing a fully visualizable representation of learned generative distributions. A lightweight conditional UNet is trained to predict noise in the reverse diffusion process, incorporating sinusoidal beta scheduling and classifier-free guidance for controlled image generation.

​

github

blockDetection.png
Vision-Guided Robotic Manipulation

This project explores the development of a robotic arm capable of detecting and manipulating objects using a combination of kinematic control and computer vision. The control system is built on forward and inverse kinematics, utilizing the product of exponentials approach for precise motion planning.

WhatsApp Image 2025-02-03 at 5.19.31 PM.jpeg
Point Cloud Processing and Segmentation with PointNet

​PointNet is a deep learning architecture designed for processing and segmenting 3D point cloud data. In this project custom dataset loader efficiently handles LiDAR-based point clouds, implementing preprocessing techniques like random downsampling and batch collation. The network architecture extracts hierarchical feature representations using a PointNet encoder and segmentation module, integrating local and global features for improved accuracy

​

github

bev_edited.jpg
Obstacle Aware Planning on BEVFormer Generated Environments with MPC-CBF

This work builds on BEVFormer — a state of the art framework for generating a 2D bird's-eye-view representation of the environment around a vehicle. We use these outputs as the inputs to our planner, which is a model predictive controller augmented with control barrier functions (hence MPC-CBF) for obstacle avoidance.

​

github

© 2025

bottom of page