classes sessions

Deep Dive Wednesdays Week 1

Working on week 1 y’all.

Week 1’s goal is to lay out the foundations for data science, mainly data exploration and feature engineering. We worked with IPython, wrangled with NumPy matrices, tussled with Pandas’ indexing and gazed at Matplotlib with amazement. A lot of it was syntactic sugar, and it does takes some time getting used to, but I think the students did well to answer some questions and draw their own conclusions.

We used machine learning’s “hello world” dataset, the Iris datasets (shoutout to S.O.). After exploring it in Pandas and Matplotlib, we scaled up to another dataset, the board games dataset from Board Game Geek. It’s a real world beginner-friendly dataset about board games and how different factors may affect the ratings for board games. Doing the visualizations and correlations, the students drew conclusions on what they saw about the data and our newly engineered features.

I think we segued well with the assertion that our human intuitions about the data may prove incomplete. After all, thinking on more than 2 dimensions is already difficult. How about thousands of dimensions? Millions of samples? To completely understand the data and bring $$$ into the business,  we turn to how machines support or shoot down our hypotheses. We turn to some learning algorithms next week.

Link to GitHub for the student’s copy.

By krsnewwave

I'm a software engineer and a data science guy on recommender systems, natural language processing, and computer vision.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s