McKinsey Global Institute report highlights that the U.S. economy will witness a shortage 250,000 data scientists by 2024 as the demand for data scientists continues to grow at a pace of 12% every year. The best indicator defining the demand and supply crunch for a data scientist job role is the increase in average salary of a data scientist by 16% from 2012 or 2014.There are multiple options to gain data science education- data science degrees, data science MOOC’s ,free data science tutorials and data science books. However, the data science educational option that is exciting to most of the people is the MOOC’s through sites like Coursera, DeZyre, Udacity and others. You might not get an official data science degree by taking these online MOOC’s but these online data science courses will provide you a lot of practical knowledge that is required to land a top gig as a data scientist. People enrolling for data science courses, often have this question on “What are the best books for data science?” or “Which data science books they must read while pursuing an online data science course?”
With the help DeZyre Industry Experts, we have put together a list of best data science books that every aspiring data scientist must read for doing data science.
As it is a well-known fact that Data Science is a multidisciplinary in nature and to become a great enterprise data scientist, one must have knowledge of statistics, mathematics, machine learning and hands-on experience working with popular data science programming languages like Python and R.
OpenIntro to Statistics by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel
For people who do not hail from a Statistics background, this is the first data science book they should start reading. It goes right from the basic concepts of statistics with detailed introduction to regression and other statistical methods. The book contains several real world examples and every concept is explained through an example so that one can easily relate theory with the practical applications. The only drawback with this book is that it is not mathematically demanding i.e. there are no mathematical proof’s for equations.
CLICK HERE to get the 2016 data scientist salary report delivered to your inbox!
An Introduction to Statistical Learning with Application in R by Daniela Witten, Gareth James, Robert Tibshirani, and Trevor Hastie
There is no better book than this to understand the introductory concepts of statistics. With practical explanations of all the available machine learning methods, when to use them, anyone who wants to analyse complex datasets must read this book. The book highlights practical issues encountered when building regression and classification models. For people from a less mathematical background, the formulas are explained in a simple and easy manner to help understand the concepts. This book provides excellent R programming examples to demonstrate mathematical and theoretical basis for machine learning and ensures that the understanding of the concepts has seeped in. The best thing about this book is the practices exercises which test your understanding of the concepts once you have read it.
Think Stats by Allen B Downey
This book requires basic knowledge of Python programming to learn the concepts of probability and statistics. The book is based on a Python libraries for probability distributions. Bayesian statistics is an important concept for data science and many books do not cover this but Think Stats emphasizes on Bayesian Statistics being too important for data science. The best thing about the book is that it follows a case study throughout with a dataset from National Institutes of Health to explain the entire data analysis process, encouraging the readers to work on projects and develop understanding of the probability and statistic concepts.
Doing Data Science- Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt
Referred to as the roadmap to data science, this book is more breadth than depth with some math and code. It has contributor’s chapters on Hadoop, Machine Learning, Finance modelling, Recommendation systems, Statistical inferences, and more, the book is worth a read for all aspiring data scientists. Doing Data Science does not act as a source of learning data science but is more like a historical account of the maturing data science industry. Readers will learn about the technologies that have built the data science arena which is very much required if you want to become a data scientist.
Practical Data Science with R by John Mount and Nina Zumel
“A unique and important addition to any data scientist’s library.”- From the Foreword by Jim Porzak, Cofounder Bay Area R Users Group
This book lives up to its name by providing examples on using data science methods in real world based on open source software’s like RStudio, Squirrel, SQL, H2 DB ,etc. . This data science book makes use of examples from BI, decision support and marketing to help readers understand how to design A/B tests, build efficient predictive models and present insights to audience at different levels. Readers must have
What makes Practical Data Science with R a must read?
- As they say that data scientists spend 90% of their time in data preparation, this book focusses on explaining these steps in detail.
- The book has some free codes and data to produce all the graphs and analysis present in the book.
- Focuses on helping readers understand how to collect and load non-trivial datasets.
R for Data Science by Garrett Grolemund and Hadley Wickham
This book explores various features of R programming language highlighting the reasons on why it was built for data science. Readers of the book will learn the entire data science workflow right from data mining to analysis and data visualization. This book highlights the best practices for cleaning data and visualizing it through graphs and plots. R for data science will help readers learn about literate programming, reproducible research and understand grammar of graphics which will help them save time when doing data science.
Machine Learning with R by Breet Lantz
If you are an R junkie then this a must read data science book for you. This book is definitely not for R beginners but if you have already taken up an online data science course in R and have some knowledge about R programming then you will find this to be one of the best books for data science with R programming. You can refer to this book as the “The Big Big Data Book for R nerds”. The book focusses on the practicality aspects of machine learning instead of focussing on why. With high-level algorithm descriptions and worked examples in R, the best thing about the book is the boxes which summarize the usage information about the machine learning algorithms and other important data science techniques.
R Cookbook, O'Reilly Media
This recipe guide style data science book provides step by step process on tackling various data problems practically. The sections in the book are organized in the form of recipes with a problem, solution and discussion about the same. If Google lets you down when doing data science with R, then this book can be your great companion. You will find some discussions and introductions to statistical topics but do not expect it to contain detailed information on statistical analysis. For people looking for a data science book in R based on more visualizations this might not be a perfect reading partner.
Python Machine Learning by Sebastian Raschka
There is no way to get away with math and equations when it comes to data science and this book makes it perfectly easy to follow the equations for people who do not have a strong mathematical background. Most of the examples in the book are explained in the Sci-Kit learn library but it does contain implementations of few machine learning algorithms that are not part of SciKit learn. This book is a complete bible for machine learning and contains everything needed for real-world problem solving like- dealing with missing data, evaluating models via cross-validations, transforming data into desired formats, extracting features and others.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
Having learnt the basics of Python, reading this book will lift up your data science skills to the next level by helping you understand the data analysis capabilities provided by the Pandas library. Thought this book does not explain in detail on how to do data analysis but emphasizes on how to use the data analysis library Pandas. The book presents sophisticated execution of analysis from various data sources like Yahoo Finance, USA.gov, USDA and Federal Election Commission. The most interesting and engaging parts of this data science book are the examples of the Pandas code used to analyse US baby names and examples from the financial sector.
Disclaimer – This is not an exhaustive list of popular data science books and there are many more that data scientists can add to their reading list. Reading the list of ten of the best data science books mentioned in this article will set you on the path to gain further knowledge about a growing field like data science.
With increasing demand for data scientists and shortage in talent, this is the best time to retool your skillset with data science skills. Regardless of your skill level, put these best data science books on your reading list to find some guiding in principles on doing data science the best way.
“Keep reading books, but remember that a book is only a book, and you should learn to think for yourself.”-Maxim Gorky
If you know any other good free data science educational resources like free data science tutorials, blogs or free data science books, let us know in comments to help the data science community at large.