Head First Data Analysis
Introductory solid book on Data Analysis
I just finished reading “Head First Data Analysis “, by Michael Milton, and it was the first book I’ve ever read from the “Head First” series.
The book was very easily approachable, with concepts introduced only when they were necessary and to make a good, valid, practical point. The whole structure of the book revolves around practice and “real life” examples (although, greatly simplified) to prove how methodical logical steps can naturally lead to a good analysis mechanism.
Each example is structured by presenting a scenario with a problem to solve and some uncertainties to work against. It starts with a small exercise of figuring out what the best course of action is, which usually involves getting relevant data and analyzing the one that we already have. Then, measuring certainties and uncertainties and coming to a conclusion. But after that, it will usually throw in a “plot twist”, something unexpected to display common pitfalls and ways to quickly adapt.
As such, it is true what the book claims: it’s not focused on being “serious” or “deep”, but rather to offer an experience from which we can actually learn. And I liked that a lot!
The contents are pretty solid too. It covers:
- Mental models (assumptions, beliefs, facts, uncertainties)
- Confounders
- Data segmentation
- Control groups in experiments
- Randomization in experiments
- Optimization problems (objective functions, constraints)
- Data visualization (bar charts, scatter plots, histograms, multivariate plotting)
- Evidence testing, hypothesis testing, falsification
- Bayesian statistics (conditional probabilities, base rates)
- Subjective probabilities
- Averages, standard deviation
- Heuristics
- R (installing, loading data, plotting, regression)
- Linear regression
- Linear correlation
- Linear Model errors (residuals, RMSE)
- Interpolation, extrapolation
- Relational databases (concept and :joining” in a spreadsheet program)
- Data cleanup (pattern finding, regexes)
- Excel (formulas, solver)
But mostly, book keeps itself easy to read and to keep up with.
If you’re just starting in Data Analysis or Data Science, this is definitely a book for you. If you need to brush up some essentials, it is still a very good book to check out.