In the past months I have spent some time to learn more about data analysis. After some research I found a very interesting book titled “An Introduction to Statistical Learning (ISL)” by James, Witten, Hastie and Tibshirani. Some of the authors (Hastie, and Tibshirani) are a reference in the field having written “The Elements of Statistical Learning (ESL)” and having developed important models for statistical learning. While reading the book I’ll write some notes and will share them on this blog.
The book is developed in 10 chapters and includes labs implemented using the popular statistical software package R. Following the chapter list:
1) Introduction: introduces the statistical learning history, the content and the premises of the book
2) Statistical Learning: introduces the basic terminology and concepts behind statistical learning
3) Linear Regression: reviews linear regression
4) Classification: discuss two of the most important classical classification methods, logistic regression and linear discriminant analysis
5) Resampling Methods: introduces cross-validation and the bootstrap, which can be used to estimate the accuracy of a number of different methods in order to choose the best one
6) Linear Model Selection and Regularization: consider a host of linear methods, both classical and more modern, which offer potential improvements over standard linear regression. These include stepwise selection, ridge regression, principal components regression, partial least squares, and the lasso
7) Moving Beyond Linearity: discusses a number of non-linear methods that work well for problems with a single input variable. These methods can be also used to fit non-linear additive models for which there is more than one input
8) Tree-Based Methods: investigates tree-based methods, including bagging, boosting, and random forests
9) Support Vector Machines: describes a set of approaches for performing both linear and non-linear classification
10) Unsupervised Learning: describes a setting with input variables but no output variable, and present principal components analysis, K-means clustering, and hierarchical clustering
The book and other additional information can be downloaded at:
A Statistical Learning MOOC was also offered by Trevor Hastie and Rob Tibshirani in January 2014.
It is interesting to note that the book preface ends with this quote: It’s tough to make predictions, especially about the future (Yogi Berra). Do you agree?