Projects
Selected Projects
- LLaMA 3.2-Vision Object Detection System: Built an object detection system using LLaMA 3.2-Vision, Ollama, OpenCV, FastAPI, Asyncio, and streaming responses.
- LLaMA 3-Based Intelligent Search and RAG: Developed a retrieval-augmented search system with LLaMA 3, LoRA, LanceDB, Tavily, and FastAPI.
- Automatic Video Hand-Tracking for Stroke Recovery: Built a computer vision pipeline using SAM2, Mediapipe, OpenCV, and PyTorch for hand-tracking in recovery videos.
- Brain Tumor Detection using Vision Transformer: Implemented a ViT-based medical image classification model with PyTorch and hyperparameter tuning.
- AI-Powered Web Content Extraction and Scraping: Developed web extraction workflows using LangChain and ReAct agents.
- Magnetic Tile of Motor Surface Defect Detection: Built a YOLOv8-based defect detection system with CBAM, Focal Loss, PyTorch, and Labelimg.
- Image Classification with GCViT: Implemented image classification with Global Context Vision Transformer, Grad-CAM, and TensorFlow.
- Stable Diffusion Image-to-Prompt Generation System: Built an image-to-prompt system using BLIP, CLIP, Sentence-Transformers, and PyTorch.
- UNet Breast Cancer Segmentation: Developed a breast cancer segmentation pipeline using TensorFlow/Keras, Attention U-Net, and Grad-CAM.
- Stable Diffusion-Based Text-to-Image Generation System: Implemented a text-to-image generation system with Stable Diffusion, CLIPTextModel, AutoencoderKL, and UNet.
- Sentiment Multiclass Classification for Mental Health: Built a multiclass mental health sentiment classifier using TF-IDF, Logistic Regression, and GridSearchCV.
- Full Stack Movie Review Application: Developed a full-stack review application with Java, Spring Boot, MongoDB, React, and JavaScript.
Reproduced Research Works
- Attention Is All You Need
- Deep Residual Learning for Image Recognition
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- AdapterFusion: Non-Destructive Task Composition for Transfer Learning
- LoRA: Low-Rank Adaptation of Large Language Models
- CLIP-Adapter: Better Vision-Language Models with Feature Adapters
- Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables