Working on week 1 y’all.
Week 1’s goal is to lay out the foundations for data science, mainly data exploration and feature engineering. We worked with IPython, wrangled with NumPy matrices, tussled with Pandas’ indexing and gazed at Matplotlib with amazement. A lot of it was syntactic sugar, and it does takes some time getting used to, but I think the students did well to answer some questions and draw their own conclusions.
We used machine learning’s “hello world” dataset, the Iris datasets (shoutout to S.O.). After exploring it in Pandas and Matplotlib, we scaled up to another dataset, the board games dataset from Board Game Geek. It’s a real world beginner-friendly dataset about board games and how different factors may affect the ratings for board games. Doing the visualizations and correlations, the students drew conclusions on what they saw about the data and our newly engineered features.
I think we segued well with the assertion that our human intuitions about the data may prove incomplete. After all, thinking on more than 2 dimensions is already difficult. How about thousands of dimensions? Millions of samples? To completely understand the data and bring $$$ into the business, we turn to how machines support or shoot down our hypotheses. We turn to some learning algorithms next week.