Learning the machine.

Jun 12, 2017

it's a long way to accuracy.

Learning by doing, well, this is a summary of my own self-teaching experience (in the chronological order) in the field of data science. Hoping it can help.

Andrew Ng’s “Machine Learning” Coursera

Like lots of people, it was my first ML course.

  • Pros : good overview of ML principles, you implement yourself the different algorithms.
  • Cons : assignments in Matlab, Decision Trees are not covered.
  • Workload : 6 hours/week for 2-3 months.

CS109 courses from Harvard.

It’s a kind of 360° view of Data science : stats, programming, infrastructure (AWS E2C), data visualization (Tableau).

  • Pros : Blizstein (for the Stats part) is really cool to watch, an interesting focus on the “Story telling” aspect, assignements as Jupiter Notebooks.
  • Cons : N/A.
  • Workload : 8-12 hours/week for 5 months.

The Analytical Edge

The course i should have started with. A very gentle introduction to data analysis. The theoritical part is light, and the focus is on practical cases, a lot of.

  • Pros : perfect to learn R, tons of exercices with various datasets.
  • Cons : the R code does not use (as of 2016 version) the tidyverse librairies. Thus i strongly recommend the SWIRL tutorial on dplyr and tidyr before starting the course.
  • Workload : 8 hours/week for 1 month.

Stats in R

Brilliant introduction/refresh on inference statistics, frequentists/baysesian stats and linear model.

  • Pros : excellent material to grab the concepts of the confidence intervals, p-values, features selections. It also really helped me to get used to RStudio.
  • Cons : the new Bayesian part (as of March 2017) is quite hard to follow, i switched to Bayesian Methods for hackers
  • Workload : 10 hours /week for 3 months.

Udacity Machine Learning engineer

A kind of recap of the ML concepts, with 6 Jupiter notebooks projects to complete. As of today i’m still on it, and find it an interesting low-cost substitute for anyone who did not make to MIT nor wrote a PhD thesis in the field.

Readings

  • Seth Davidowitz - Everybody Lies Written in an anecdotical way (remember story telling ?), a funny introduction to data science by an economist who did work at Google and produce nices articles in the NYT. Try the google correlate then.

  • Sebastian Raschka : cool introduction to ML with well commented Jupiter Notebooks.

  • PyData conferences : i went to PyData Berlin 2016, and will attend the 2017 session. I really enjoyed the exotics talks like measuring the neutrinos from an Antartic stations, but in my opinion, too much talks were focused on the “Spark Big data cloud” trinity.