## My Collection of Books for Being a Data Scientist

- 30/07/2014
- 122
- 0 Like

**Published In**

- Big Data
- Analytics
- Business Intelligence

I have been asked if I can suggest some books for learning data science. That motivated me to take a good look at my book shelf and did an (incomplete and still ongoing) inventory check. Even better, I categorized my books into a few categories so to make it easier to search what I would need.

Here I'm sharing my partial list of book collections for being a proud data scientist. I admit I didn't read each and every book and usually don't read a book from cover to cover. I don't think that is necessary or effective. My reading style is question-driven and purpose-driven.

Here is the list. It took me quite an effort to type the book names. So please pardon the formatting (it seems that posting on LinkedIn also messes up the format a bit).

Data Science and Machine Learning

- Practice

- A First Course in Machine Learning - Rogers {Matlab}

- Applied Predictive Modeling 2013 {R}

- Collective Intelligence in Action (Manning 2008)

- Data Mining and Market Intelligence for Optimal Marketing Returns (2008)

- Data Mining Cookbook Modeling Data for Marketing, Risk,and Customer Relationship Management

- Data Mining Practical Machine Learning Tools and Techniques(3rd)(2011)

- Data Mining Techniques for Marketing Sales and Customer Support (2004) 2Ed

- Handbook of Statistical Analysis and Data Mining Applications (SPSS)

- Machine Learning in Action

- OReilly Programming Collective Intelligence

- Data Mining Concepts And Techniques, 2006

- Data Mining with R

- Data Preparation for Data Mining

- Intro Statistical Learning using R

- Introduction to Neural Networks for Java, 2nd Edition

- Theories

- General

- A Probabilistic Theory of Pattern Recognition

- Artificial Intelligence - A Modern Approach (3e)

- Bootstrap Methods and their Applications

- Elements of Statistical Learning

- introduction to machine learning

- Machine Learning for Hackers

- machine+learning {Tom Mitchell}

- Pattern Classification

- Pattern Recognition and Machine Learning {Bishop}

- Principles+and+Theory+for+Data+Mining+and+Machine+Learning

- Foundations of Machine Learning

- Machine learning A Probabilistic Perspective

- Data Mining Fundamental Concepts and Algorithms

- MIT - Principles of Data Mining

- Principles of Data Mining (Springer, 2007)

- The Top Ten Algorithms In Data Mining (Crc Press 2009)

- Springer - The Nature Of Statistical Learning Theory - 2nd Edition - 2000

- [2011 Webb] Statistical Pattern Recognition 3rd Edition

- An Elementary Introduction to Statistical Learning Theory 2010

- Probability

- Probability for Statistics and Machine Learning Fundamentals and Advanced Topics

- A Probabilistic Theory of Pattern Recognition

- Bayesian

- Bayesian Reasoning and Machine Learning 2013

- Bayesian framework for concept learning

- PAC-Bayesian Pattern Classification with Kernels

- Bayesian Statistics and Marketing

- Bayesian Data Analysis 2nd

- Neural Network

- Brief introduction to Neural network

- Neural Networks Fundation

- Pattern Recognition and Neural Networks (B D Ripley)

- Neural Networks Fundation

- Decision Trees

- **DATA MINING WITH DECISION TREES** Theory and Applications

- Information Theory

- Information Theory, Inference, and Learning Algorithms

- Kernel Methods and SVM

- MIT Press Learning with Kernels (2002)

- An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

- Support Vector Machines for Pattern Classification, Shigeo Abe, 2ed, Springer, 2010

- Boosting

- Boosting Foundations and Algorithms

- Ensemble+Methods+Foundations+and+Algorithms

- Web Mining

- Web Mining and Social Networking

- Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data-2007-Springer

- Web Data Mining Liu Bing (Second edition)

- Web Mining Applications and Techniques (Idea Group 2004)

- Clustering

- Cluster Analysis for Data Mining and system identification(2000)

- Cluster Analysis, 5th Edition

- Data Clustering: Theory, Algorithms and Applications

- Finding Groups in Data An Introduction to Cluster Analysis

Statistics

- Statistics for High-dimensional Data

- An Introduction to Categorical Data Analysis (2007)

- Correlated Data Analysis Modeling, Analytics and Applications (2007)

- statistical analysis with missing data second edition

- Statistical Decision Theory and Bayesian Analysis

- Applied Multivariate Statistical Analysis

Visualization

- Beautiful-Visualization Looking-at-Data-through-the-Eyes-of-Experts

- Data Visualization with D3.js Cookbook(2013 10) by Nick Qi Zhu

- Multivariate Data Visualization with R

- Python Data Visualization Cookbook (2013)

Text Mining and NLP

- Text Mining: Classification, Clustering, and Applications (2009)

- Text Mining Infrastructure in R

Optimizations

Numerical Methods and Algorithms

- Data Analysis Using the Method of Least Squares(2008)

- Introduction to Algorithms 3rd Edition

- Demmel-Applied Numerical Linear Algebra

- Elementary Numerical Analysis - An Algorithmic Approach 3rd Ed

- Iterative Methods for Optimization

- Numerical Methods in Engineering with Python (2ed Jaan Kiusalaas)

- Numerical Optimization Theoretical and Practical Aspects

Big Data

- Mining of Massive Datasets

- scaling up machine learning parallel and distributed approaches

- Structured Parallel Programming - Patterns for Efficient Computation

- Hadoop in Action

- Hadoop The Definitive Guide 2nd Edition

- HBase The Definitive Guide

- MapReduce Design Patterns

- Programming Hive

- Programming Pig.

- 30/07/2014
- 122
- 0 Like

## My Collection of Books for Being a Data Scientist

- 30/07/2014
- 122
- 0 Like

#### Richard Xie

Principal Data Scientist & Innovator at ThreatTrack Security Inc.

Opinions expressed by Gladwin Analytics members are their own.

#### Top Authors

I have been asked if I can suggest some books for learning data science. That motivated me to take a good look at my book shelf and did an (incomplete and still ongoing) inventory check. Even better, I categorized my books into a few categories so to make it easier to search what I would need.

Here I'm sharing my partial list of book collections for being a proud data scientist. I admit I didn't read each and every book and usually don't read a book from cover to cover. I don't think that is necessary or effective. My reading style is question-driven and purpose-driven.

Here is the list. It took me quite an effort to type the book names. So please pardon the formatting (it seems that posting on LinkedIn also messes up the format a bit).

Data Science and Machine Learning

- Practice

- A First Course in Machine Learning - Rogers {Matlab}

- Applied Predictive Modeling 2013 {R}

- Collective Intelligence in Action (Manning 2008)

- Data Mining and Market Intelligence for Optimal Marketing Returns (2008)

- Data Mining Cookbook Modeling Data for Marketing, Risk,and Customer Relationship Management

- Data Mining Practical Machine Learning Tools and Techniques(3rd)(2011)

- Data Mining Techniques for Marketing Sales and Customer Support (2004) 2Ed

- Handbook of Statistical Analysis and Data Mining Applications (SPSS)

- Machine Learning in Action

- OReilly Programming Collective Intelligence

- Data Mining Concepts And Techniques, 2006

- Data Mining with R

- Data Preparation for Data Mining

- Intro Statistical Learning using R

- Introduction to Neural Networks for Java, 2nd Edition

- Theories

- General

- A Probabilistic Theory of Pattern Recognition

- Artificial Intelligence - A Modern Approach (3e)

- Bootstrap Methods and their Applications

- Elements of Statistical Learning

- introduction to machine learning

- Machine Learning for Hackers

- machine+learning {Tom Mitchell}

- Pattern Classification

- Pattern Recognition and Machine Learning {Bishop}

- Principles+and+Theory+for+Data+Mining+and+Machine+Learning

- Foundations of Machine Learning

- Machine learning A Probabilistic Perspective

- Data Mining Fundamental Concepts and Algorithms

- MIT - Principles of Data Mining

- Principles of Data Mining (Springer, 2007)

- The Top Ten Algorithms In Data Mining (Crc Press 2009)

- Springer - The Nature Of Statistical Learning Theory - 2nd Edition - 2000

- [2011 Webb] Statistical Pattern Recognition 3rd Edition

- An Elementary Introduction to Statistical Learning Theory 2010

- Probability

- Probability for Statistics and Machine Learning Fundamentals and Advanced Topics

- A Probabilistic Theory of Pattern Recognition

- Bayesian

- Bayesian Reasoning and Machine Learning 2013

- Bayesian framework for concept learning

- PAC-Bayesian Pattern Classification with Kernels

- Bayesian Statistics and Marketing

- Bayesian Data Analysis 2nd

- Neural Network

- Brief introduction to Neural network

- Neural Networks Fundation

- Pattern Recognition and Neural Networks (B D Ripley)

- Neural Networks Fundation

- Decision Trees

- **DATA MINING WITH DECISION TREES** Theory and Applications

- Information Theory

- Information Theory, Inference, and Learning Algorithms

- Kernel Methods and SVM

- MIT Press Learning with Kernels (2002)

- An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

- Support Vector Machines for Pattern Classification, Shigeo Abe, 2ed, Springer, 2010

- Boosting

- Boosting Foundations and Algorithms

- Ensemble+Methods+Foundations+and+Algorithms

- Web Mining

- Web Mining and Social Networking

- Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data-2007-Springer

- Web Data Mining Liu Bing (Second edition)

- Web Mining Applications and Techniques (Idea Group 2004)

- Clustering

- Cluster Analysis for Data Mining and system identification(2000)

- Cluster Analysis, 5th Edition

- Data Clustering: Theory, Algorithms and Applications

- Finding Groups in Data An Introduction to Cluster Analysis

Statistics

- Statistics for High-dimensional Data

- An Introduction to Categorical Data Analysis (2007)

- Correlated Data Analysis Modeling, Analytics and Applications (2007)

- statistical analysis with missing data second edition

- Statistical Decision Theory and Bayesian Analysis

- Applied Multivariate Statistical Analysis

Visualization

- Beautiful-Visualization Looking-at-Data-through-the-Eyes-of-Experts

- Data Visualization with D3.js Cookbook(2013 10) by Nick Qi Zhu

- Multivariate Data Visualization with R

- Python Data Visualization Cookbook (2013)

Text Mining and NLP

- Text Mining: Classification, Clustering, and Applications (2009)

- Text Mining Infrastructure in R

Optimizations

Numerical Methods and Algorithms

- Data Analysis Using the Method of Least Squares(2008)

- Introduction to Algorithms 3rd Edition

- Demmel-Applied Numerical Linear Algebra

- Elementary Numerical Analysis - An Algorithmic Approach 3rd Ed

- Iterative Methods for Optimization

- Numerical Methods in Engineering with Python (2ed Jaan Kiusalaas)

- Numerical Optimization Theoretical and Practical Aspects

Big Data

- Mining of Massive Datasets

- scaling up machine learning parallel and distributed approaches

- Structured Parallel Programming - Patterns for Efficient Computation

- Hadoop in Action

- Hadoop The Definitive Guide 2nd Edition

- HBase The Definitive Guide

- MapReduce Design Patterns

- Programming Hive

- Programming Pig.

- 30/07/2014
- 122
- 0 Like

## Richard Xie

Principal Data Scientist & Innovator at ThreatTrack Security Inc.

Opinions expressed by Gladwin Analytics members are their own.