Description
Book SynopsisRather than presenting Python as Java or C, this textbook focuses on the essential Python programming skills for data scientists and advanced methods for big data analysts.
Unlike conventional textbooks, it is based on Markdown and uses full-color printing and a code-centric approach to highlight the 3C principles in data science: creative design of data solutions, curiosity about the data lifecycle, and critical thinking regarding data insights. Q&A-based knowledge maps, tips and suggestions, notes, as well as warnings and cautions are employed to explain the key points, difficulties, and common mistakes in Python programming for data science. In addition, it includes suggestions for further reading.
This textbook provides an open-source community via GitHub, and the course materials are licensed for free use under the following license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
More teaching materials including Codes, Datasets, Slides, Syllabus can be found at https://github.com/LemenChao/PythonDataScience
Table of Contents1. Python and Data Science
Q&A
1.1 From data analysis to data science
1.2 Python language and its characteristics
1.3 Precautions for data analysis based on Python
1.4 Python development environment and how to build it
Exercises
2. Basic Python Programming for Data Science
Q&A
2.1 Variables and their definition methods
2.2 Operators, expressions, statements
2.3 Data type and data structure
2.4 Packages and modules
2.5 Built-in functions, module functions and custom functions
Exercises
3. Advanced Python Programming for Data Science
Q&A
3.1 Iterators and iterable objects
3.2 Decorators and generators
3.3 Help and Doc Strings
3.4 Exception handling, assertion and debugging
3.5 Search path, current working directory
3.6 Object-oriented programming
Exercises
4. Data preprocessing and wrangling
Q&A
4.1 Random numbers and Random/Sklearn
4.2 Vectorized computing and NumPy
4.3 Data frame calculation and Pandas
4.4 Data visualization and MatPlotlib/Seaborn and others
Exercises
5. Data analysis algorithms and models
Q&A
5.1 Statistical modelling with statsmodels
5.2 Machine learning with scikit-learn
Exercises