前言:简介 (2 学时)
数据科学导论
Part 0: Course Overview (2 hours)
Introduction to data science
第一部分:Python 程序设计(12 学时)
Python 基本数据类型;流程控制;输入输出;函数与模块等;标准库,内置函数;科学计算包 Numpy & Scipy;
正则表达式;网页爬虫;表格数据处理;数据清理;数据可视化等
Part 1 Python Programming (12 hours)
Basics of python: data types; flow control; IO; function & modularity
Python standard library; built-in functions
Numerical computing using numpy & scipy
Regular expressions
Scraping data from web
Analyzing tabular data using pandas
Data Wrangling
Data visualization with matplotlib
第二部分: 数据科学中的理论(12 学时)
概率,变量,分布,数据关系,模拟实验设计,面向对象,程序优化,Skewed data,图算法和网络科学理论,
邻接矩阵等
Part 2 Foundational Mathematics for Programming and Data Science (12 hours)
Basic probabilities
Single variable analysis
Normal distributions
Data relationships
Simulation and top-down design
Object-oriented programming
Code optimization
Skewed data
Basic graph/network theory/matrix
第三部分: 数据分析和可视化(12 学时)
数据分析与可视化方法(pandas/matplotlib/seaborn);时间序列数据;文本数据;图像数据;数据降维;网络数
据分析与可视化;交互可视化简介等。
Part 3 Data Analysis and Visualization (12 hours)
Exploratory Data Analysis and effective visualization: pandas/matplotlib/seaborn
Time series
Text analysis
Image data
Dimensionality Reduction/PCA/MDS/LLE
Network analysis and visualization: Gephi
Interactive Visualization: Plotly/Bokeh/ /D3
第四部分:机器学习简介(10 学时)
回归分析,贝叶斯分析, Scikit-Learn 简介;决策树与随机森林简介等。
Part 4 Introduction to Machine Learning (10 hours)
Regression
Bayes
Introduction to scikit-learn, learning a model
Decision trees and Random forests