Learning by doing, well, this is a summary of my own self-teaching experience (in the chronological order) in the field of data science. Hoping it can help.
Like lots of people, it was my first ML course.
- Pros : good overview of ML principles, you implement yourself the different algorithms.
- Cons : assignments in Matlab, Decision Trees are not covered.
- Workload : 6 hours/week for 2-3 months.
It’s a kind of 360° view of Data science : stats, programming, infrastructure (AWS E2C), data visualization (Tableau).
- Pros : Blizstein (for the Stats part) is really cool to watch, an interesting focus on the “Story telling” aspect, assignements as Jupiter Notebooks.
- Cons : N/A.
- Workload : 8-12 hours/week for 5 months.
The course i should have started with. A very gentle introduction to data analysis. The theoritical part is light, and the focus is on practical cases, a lot of.
- Pros : perfect to learn R, tons of exercices with various datasets.
- Cons : the R code does not use (as of 2016 version) the tidyverse librairies. Thus i strongly recommend the SWIRL tutorial on dplyr and tidyr before starting the course.
- Workload : 8 hours/week for 1 month.
Brilliant introduction/refresh on inference statistics, frequentists/baysesian stats and linear model.
- Pros : excellent material to grab the concepts of the confidence intervals, p-values, features selections. It also really helped me to get used to RStudio.
- Cons : the new Bayesian part (as of March 2017) is quite hard to follow, i switched to Bayesian Methods for hackers
- Workload : 10 hours /week for 3 months.
A kind of recap of the ML concepts, with 6 Jupiter notebooks projects to complete. As of today i’m still on it, and find it an interesting low-cost substitute for anyone who did not make to MIT nor wrote a PhD thesis in the field.
Seth Davidowitz - Everybody Lies Written in an anecdotical way (remember story telling ?), a funny introduction to data science by an economist who did work at Google and produce nices articles in the NYT. Try the google correlate then.
Sebastian Raschka : cool introduction to ML with well commented Jupiter Notebooks.
PyData conferences : i went to PyData Berlin 2016, and will attend the 2017 session. I really enjoyed the exotics talks like measuring the neutrinos from an Antartic stations, but in my opinion, too much talks were focused on the “Spark Big data cloud” trinity.