Hey, I’m

Varun Deepak Gudhe

The Wizard of Data Science

About Me

I’m Varun Deepak Gudhe, a graduate student at North Carolina State University, pursuing my Masters in Computer Science. I’m really passionate about data science and have developed numerous machine learning projects and like to deploy them in the cloud to bring my projects to a wider audience. Currently, I’m exploring the world of machine learning in bioinformatics. I’m also diving into web development.

Here's a peek at the programming languages and tools I've been working with:

Python R Java cpp HTML CSS Javascript bootstrap rails Ruby React Tensorflow Git bash linux PyTorch regex Postman AWS Kubernetes MySQL vscode docker netlify

Experience

Bioinformatics Software Developer (RA) - Guerrero Lab
Sep 2024 - Present

I am currently working as a research assistant under the supervision of Dr. Rafael Guerrero.

  • As the project lead, implementing ancestry simulation models using msprime in C++ and Python to analyze and predict genetic variations across populations, following Agile methods for efficient team collaboration.
Data Scientist (RA) - Ashes Lab
Sep 2023 - present

I am currently working as a research assistant under the supervision of Dr. Carter Clinton.

  • Analyzing metagenomic data from the historic New York African Burial Ground using Next Generation Sequencing and tools like Qiime2 and Kraken to reconstruct the history of the enslaved African population.
  • Categorizing DNA sequences (human, bacterial, animal, etc.) using Bowtie2 and SAMTools; compare human DNA with public databases for genealogical links and disease markers, utilizing bash, CUDA, and HPC.
Bioinformatics Data Scientist (RA) - Guerrero Lab
Sep 2023 - Aug 2024

I am worked as a research assistant under the supervision of Dr. Rafael Guerrero.

  • Developed a bioinformatics pipeline for protein thermal stability analysis, including UID fetching, sequence mapping, quality filtering, and multiple sequence alignment (MAFFT). Constructed phylogenetic trees (FastTree) and generated correlation matrices (vcv_matrix) and heat maps to visualize relationships for regression analysis.
  • Integrated temperature data, optimized sequence validation using pairwise alignment scores. Conducted logistic regression and Bayesian modeling (brms), enhancing data accuracy by eliminating non-variable positions.
Data Science Mentor - NC State Data Science Academy
May 2023 - Aug 2023 and May 2024 - Aug 2024
  • Mentored 25 + students during internships on Data Science projects for rural organizations, enabling hands-on experience with real data, utilizing tools such as Python, R, and SQL, and improving nonprofit operations by 20%.
  • Guided students in delivering impactful presentations using data visualization tools like PowerBI and Tableau, showcasing their findings and providing actionable recommendations to stakeholders.
Data Science Expert - North Carolina School of Science and Mathematics(NCSSM)
Jun 2024 - Jul 2024
  • Guided high school students in analyzing complex HR and medical datasets from UNC Health Blue Ridge, leveraging advanced data science and machine learning techniques.
  • Instructed weekly classes on Data Cleaning, Preprocessing, Clustering, Regression, Model Fitting, Supervised and Unsupervised Learning, and Visualizations.
  • Mentored students in developing interactive visualizations (using tools like Tableau and Matplotlib) and predictive models, presenting their findings to UNC Health Blue Ridge, resulting in actionable insights for the hospital.
Course Collobarative Leader - NC State Data Science Academy
Aug 2023 - present
  • Developed and maintained self-updating datasets sourced from real-time public data sources. Used APIs to connect with public data and create dynamic datasets.
  • Documented codebooks for each dataset and maintaining workflow notes to help future progress by others.
  • Proposed ideas for extracting insights and projects from the datasets to guide research and match course goals.
  • Lead regular meetings to update on project status, milestones, and future plans of action.
  • Published these datasets and their insights for the wider academic community.
Teaching Assistant - NC State Data Science Academy
Oct 2023 - present
  • Assisted NC DHHS employees in the course “Data at Work: Data Analytics in Excel and Beyond” with key topics including ETL Tools, Data Warehousing, Microsoft Excel, SQL, PowerBI, Statistics, and Data Visualization.
  • Conducted regular office hours providing guidance on course concepts, lab techniques, and assisting on their capstone projects.
  • Graded lab assignments and student capstone projects.
Research Assistant - IEC Lab NCSU
Dec 2022 - May 2023
  • Collaborated with Tasmia Shahriar on the AI-based application Simstudent, using data analysis skills.
  • Enhanced model accuracy through data coding, mirroring middle school perspectives.
  • Improved Simstudent’s performance, benefiting many middle school students.
Teaching Assistant - SRM University Ap
Jul 2019 - Jun 2020
  • Tutored and evaluated 60+ students in Python course using Minerva platform.
  • Teaching Assistant for the course Probability and Statistics. Assisted in attending student queries and graded students assignments and final project to aid the professor.

Education

2022 - 2024
Master of Science in Computer Science
North Carolina State University
GPA: 3.78 out of 4.0

Course Work:

  • Design and Analysis of Algorithms, Neural Networks Deep Learning, Automated Learning and Data Analysis, Experimental Stats for Engineers, Cloud Computing, DBMS, Software Engineering, Object Oriented Programming.
2018 - 2022
Bachelor of Science in Computer Science
SRM University AP
GPA: 9.54 out of 10

Course Work:

  • Artificial Intelligence, Machine Learning, Big Data, Data Mining, DBMS, Data Structures, Software Engineering, Computer Networks, Operating Systems, Object Oriented Programming.

Extracurricular Activities:

  • Volunteered as a infra-tech team head for tech fest and organised events for cultural fest.
  • Worked as a Campus Ambassador for SRM-AP to represent SmartKnower.

Projects

Ancestry Simulator
Python C++ CI/CD Algorithm Optimization Debugging Techniques Memory Management Github Github actions
Ancestry Simulator
A C++ and Python integrated program simulating the coalescent process of genetic sequences in polymorphic chromosomal inversions, based on migration-selection balance.
AnkiGPT-4
Python OpenAI Flask CSS JavaScript Langchain TravisCI Kubernetes API Github Github actions Testing
AnkiGPT-4
A Dockerized flashcard web application that processes documents/URLs using Langchain and OpenAI, with a Flask backend and a CSS/JavaScript frontend. Implemented DevOps practices with TravisCI, GitHub Actions, and Kubernetes, achieving 90% test coverage and generating 5-100 flashcards per input.
Sync-Ends Library
Python API Github Github actions
Sync-Ends Library
A Python Library that can detect any change across Postman Collection APIs and instantly send notifications on Slack, Teams & Email.
Prot_pgls
Python R Bioinformatics Phylogenetics ete3 Data Wrangling Statistical Analysis Parallel Processing Large-scale Data processing Database Querying GLM(Generalized Linear Models) Data Visualization
Prot_pgls
Bioinformatics-driven optimization of protein thermostability through sequence analysis, phylogenetic tree processing, and predictive modeling using data from NCBI and PDB databases.
NYABG Metagenomic Analysis
Metagenomic Analysis Next Generation Sequencing Qiime2 GWAS Kraken Bowtie2 SAMTools Bash Python HPC
NYABG Metagenomic Analysis
Metagenomic analysis of the historic New York African Burial Ground using Next Generation Sequencing, with tools like Qiime2, Kraken, Bowtie2, and HPC assisting in DNA categorization and genealogical exploration.
IMU Terrain Classification
Python TensorFlow Keras CNN Time Series Analysis
IMU Terrain Classification
A deep learning model to identify & classify different terrains from IMU time series dataset.
Expertiza
Ruby Rails JavaScript Open-Source Contribution
Expertiza
Expertiza is a web application through which students can submit and peer-review learning objects (articles, code, web sites, etc). The Expertiza project is supported by the National Science Foundation.
Histopathologic Cancer Detection
Python Transfer Learning TensorFlow CNN VGG19
Histopathologic Cancer Detection
A deep learning model (CNN's,VGG19) to identify metastatic cancer in colored image patches, using Transfer Learning, Neural Networks and Image Segmentation.
Road Accident Dashboard
Tableau KPI's Data Analysis Data Visualization
Road Accident Dashboard
An interactive Tableau dashboard that delivers a visual exploration of road accident trends, providing immediate insights into casualty statistics through dynamic KPIs, trend analyses, and geographic data representations for informed decision-making.
Covid-19 India Dashboard
Tableau KPI's Data Analysis Data Visualization
Covid-19 India Dashboard
An interactive Tableau dashboard offering a comprehensive visualization of COVID-19 trends in India, integrating multiple data sources and blending techniques. It provides key insights into cases, vaccination details, demographics, and testing statistics through dynamic maps, line charts, bar charts, donut charts, and stacked bar charts for an in-depth analysis of the pandemic's impact.

Achievements

Get in Touch

Open to exciting roles and collaborations!. Got an opportunity?, I’m all ears. Let’s connect and discuss the possibilities!