genomics healthcare

Data Science in Genomics: Some References

Learning ATGC seem too different from the rest of the cash-cows of data science. Fast forward half a decade, we see viral RNA and reverse transcription polymerase chain reaction (RT PCR) flying around in the news.

What’s a data scientist to do? Learn!

I’ve been doing some biotech courses in Coursera and the past few months. It’s a field I wanted to get into since graduate school but haven’t since there is a niche quality to it. Learning the language of ATGC and AUGC seem too different than the rest of the cash-cows of data science.

Then fast forward half a decade, we find ourselves right in the middle of a pandemic. Suddenly, we see viral RNA and reverse transcription polymerase chain reaction (RT PCR) flying around in the news. The classical methods of clinical trials and wet lab experiments are now complemented by massive amounts of data, statistics and computation. It is a testament to the state of genomics that we have sequenced the SARS-COV-2 virus in a few short weeks.

With my non-biology background, I wanted to immerse myself in this field even more. I’ll share the things I have tried out.

Genomic Course Work

Genome Courses | Coursera

There’s a lot of courses and specializations in Coursera. There’s the excellent John Hopkins Genomic Data Science Specialization and the State University of New York’s Big Data, Genes and Medicine. They have the standard quizzes and hands-on exercises through R and Galaxy. There’s a price tag in getting certified and if that’s your jam, you should definitely get one.

I only audit the courses though. Open for sponsorship. : )


Healthcare Course Work

Stanford’s AI in Healthcare Specialization is a holistic take on healthcare and how AI can play a role in its diverse facets. From its economics, to the datasets one might expect in working in this domain, the specialization can elevate your understanding of genomics in healthcare.

Bonus: Educational Series

Crash Course Anatomy and Physiology is a perfect for-dummies content. For manga fans, Cells at Work works great as to entertain and inform.

Kaggle Competitions

There is a competition in Kaggle that is very timely: OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction. With sequence data of mRNA molecules, competitors must predict the degradation rate features of these molecules. mRNA molecules, which is a central component of mRNA vaccines, are notoriously fragile. By discovering the degradation rates of these molecules, vaccines can be more easily manufactured, stabilized and shipped in the world.

I started late in this competition but I learned from a few notebooks. I’ll share these starter ones. It’s quite surreal that coursework components are now being applied in the timeliest of real-world problems.

By krsnewwave

I'm a software engineer and a data science guy on recommender systems, natural language processing, and computer vision.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s