Deploying a Containerized Streamlit Application in 13(ish) Steps

One of the biggest challenges in data science is deploying a machine learning (ml) application to production. This phase is often complicated by the fact that the code base is sensitive to the operating system it’s running on, and that there can be a number of other complex dependencies needed for a machine learning applicationContinue reading “Deploying a Containerized Streamlit Application in 13(ish) Steps”

Understanding Customer Churn

Note: This blog post is for Udacity’s Data Scientist Nanodegree Program. Introduction There are two main challenges when running a subscription-based business: getting customers and retaining them. The goal of this project was to analyze customer churn data for an online Software as a Service (SaaS) company to understand how attributes about the customer’s enrollmentContinue reading “Understanding Customer Churn”

How I Passed the AWS Certified Cloud Practitioner Exam

Introduction There is a natural progression of an early career data professional’s abilities. They typically begin with learning basic data wrangling/munging skills including SQL and eventually progress to building machine learning models using APIs like scikit-learn. Once the foundational skills are in place, more advanced programming paradigms are used including object-oriented programming, custom modules, versionContinue reading “How I Passed the AWS Certified Cloud Practitioner Exam”

Heart Disease Model Deployment on the Google Cloud

TLDR; you can check out my end-to-end machine learning code (including model deployment) here. All feedback welcome! Introduction I’ve spent the last few weeks exploring the Google Cloud Platform and it’s role in the CRoss-Industry Standard Process for Data Mining (CRISP-DM) lifecycle. Based on my experience, most data scientists (including myself) have experience all theContinue reading “Heart Disease Model Deployment on the Google Cloud”

The Core Pillars of Reproducible Data Science

In 2016, the multi-disciplinary journal Nature polled over 1,500 scientists on their thoughts around scientific reproducibility1. The results were astounding. Seventy percent of researches tried and failed to reproduce other scientific work. Furthermore, 50% were unable to reproduce their own research! The cause of irreproducibility is likely due to a number of reasons including inconsistentContinue reading “The Core Pillars of Reproducible Data Science”

One of the Best Data Science Books You’ll Ever Read

When I first started my journey into data science in 2015, there were a lot of resources. Today, it’s just downright overwhelming. It seems that everyone everywhere has written an article, book, or is selling a course telling you how to go from zero to data scientist in <insert ridiculously short time period here>. OverthinkingContinue reading “One of the Best Data Science Books You’ll Ever Read”