Data Analyst, Indiana University (May 2024 - Current)
School of Medicine
Implemented Sparse CCA, Procrustes Analysis, and Lasso Regression to analyze correlations between gene expression and microbiome data using R Studio and Excel; enhanced Data Visualization using Tableau to create interactive dashboards.
Engineered ETL (Extract, Transform, Load) pipelines utilizing AWS Glue to process and migrate over 30+ GB of data into Amazon Redshift, enhancing data accessibility and optimizing workflow efficiency.
Generated Power BI dashboards, boosting data interpretation. Recognized as Runner-Up for Best Project Presentation.
Leveraged PySpark with MLlib and integrated advanced Excel features (PowerPivot, custom functions, and VBA)to enhance predictive analytics capabilities, achieving a 60% increase in data interpretation accuracy.
Machine Learning Researcher (May2024 - Current)
Indiana University
Led a team of six to develop a multi-modal pipeline integrating genetic and imaging data with ResNet-50, transformers, and foundation models like DINO V2 improving Alzheimer’s disease characterization and progression analysis by 50%
Designed a normalized MySQL database with advanced window functions and pivoting techniques, and developed a custom API for enhanced data retrieval, ensuring optimized query performance.
Developed an automated pipeline to preprocess over 100+ GB of Alzheimer’s disease-related image data, reducing manual efforts by 70%. Leveraged SPM12, FSL, FSLnets and Nilearn to handle fMRI, MRI, and DTI image data.
Devised an architecture using DINO V2 and BEIT for image classification and segmentation, and performed ICA along with Dual Regression on rs-fMRI images, enabling innovative applications in medical imaging
Teaching Assistant - CSCI-B 351: INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Luddy School of Informatics, Computing and Engineering
Global Study Program, University of California Davis
Courses:
Elementary Statistics (STA 013)
Fundamentals of Statistical Data Science (STA 141A)
Data Structures and Algorithm (ECS 032B)
Location: Davis, CA Duration: 3 months (June,22 - Sept,22)
Academic Internship , At National University of Singapore
Topic: Data Analytics using Deep Learning
Location: Singapore Duration:1 month(june,21-july,21)
Led a project to predict Optimal Sleep Time using Health factors, employing various ML and DL models (Random Forest,XG Boost, LSTM and GRU) with Amazon EMR for model building and S3 for data storage, advancing personalized health insights.
Conducted exploratory data analysis to visualize trends and relationships, and optimized model parameters using Grid Search on Amazon SageMaker, boosting performance by 15% for the study on optimal sleep prediction.
Streamlined 3 major ETL pipelines using Azure Data Factory, resulting in $100,000 cost savings for potential clients through enhanced data processing efficiency and reduced operational expenses.
Summer Training at Hewlett Packard Enterprise
Topic: Applied Machine Learning
Location:Singapore Duration:1 month (july,21-august,21)
Led the deployment efforts, by orchestrating the entire project’s migration to Azure through the utilization of Azure Web Apps and Docker for containerization and conducted API testing across 6 endpoints using Azure API management and Postman..
Collaborated on a team project to develop a brain tumor segmentation model using UNet architecture, successfully achieving 93% accuracy in precisely localizing tumor positions within brain imaging data