Bioinformatics Software Developer (RA)
-
Guerrero Lab
Sep 2024 - Present
I am currently working as a research assistant under the supervision of Dr. Rafael Guerrero.
- As the project lead, implementing ancestry simulation models using msprime in C++ and Python to analyze and predict genetic
variations across populations, following Agile methods for efficient team collaboration.
Data Scientist (RA)
-
Ashes Lab
Sep 2023 - present
I am currently working as a research assistant under the supervision of Dr. Carter Clinton.
- Analyzing metagenomic data from the historic New York African Burial Ground using Next Generation Sequencing
and tools like Qiime2 and Kraken to reconstruct the history of the enslaved African population.
- Categorizing DNA sequences (human, bacterial, animal, etc.) using Bowtie2 and SAMTools; compare human DNA
with public databases for genealogical links and disease markers, utilizing bash, CUDA, and HPC.
Bioinformatics Data Scientist (RA)
-
Guerrero Lab
Sep 2023 - Aug 2024
I am worked as a research assistant under the supervision of Dr. Rafael Guerrero.
- Developed a bioinformatics pipeline for protein thermal stability analysis, including UID fetching, sequence mapping,
quality filtering, and multiple sequence alignment (MAFFT). Constructed phylogenetic trees (FastTree) and generated
correlation matrices (vcv_matrix) and heat maps to visualize relationships for regression analysis.
- Integrated temperature data, optimized sequence validation using pairwise alignment scores. Conducted logistic
regression and Bayesian modeling (brms), enhancing data accuracy by eliminating non-variable positions.
- Mentored 25 + students during internships on Data Science projects for rural organizations, enabling hands-on experience with real data, utilizing tools such as Python, R, and SQL, and improving nonprofit operations by 20%.
- Guided students in delivering impactful presentations using data visualization tools like PowerBI and Tableau, showcasing their findings and providing actionable recommendations to stakeholders.
- Guided high school students in analyzing complex HR and medical datasets from UNC Health Blue Ridge, leveraging advanced data science and machine learning techniques.
- Instructed weekly classes on Data Cleaning, Preprocessing, Clustering, Regression, Model Fitting, Supervised and Unsupervised Learning, and Visualizations.
- Mentored students in developing interactive visualizations (using tools like Tableau and Matplotlib) and predictive models, presenting their findings to UNC Health Blue Ridge, resulting in actionable insights for the hospital.
- Developed and maintained self-updating datasets sourced from real-time public data sources. Used APIs to connect with public data and create dynamic datasets.
- Documented codebooks for each dataset and maintaining workflow notes to help future progress by others.
- Proposed ideas for extracting insights and projects from the datasets to guide research and match course goals.
- Lead regular meetings to update on project status, milestones, and future plans of action.
- Published these datasets and their insights for the wider academic community.
- Assisted NC DHHS employees in the course “Data at Work: Data Analytics in Excel and Beyond” with key topics including ETL Tools, Data Warehousing, Microsoft Excel, SQL, PowerBI, Statistics, and Data Visualization.
- Conducted regular office hours providing guidance on course concepts, lab techniques, and assisting on their capstone projects.
- Graded lab assignments and student capstone projects.
- Collaborated with Tasmia Shahriar on the AI-based application Simstudent, using data analysis skills.
- Enhanced model accuracy through data coding, mirroring middle school perspectives.
- Improved Simstudent’s performance, benefiting many middle school students.
- Tutored and evaluated 60+ students in Python course using Minerva platform.
- Teaching Assistant for the course Probability and Statistics. Assisted in attending student queries and graded students assignments and final project to aid the professor.