第二章:数据预处理:数据收集、提取和清理,python 语言程序简介(4 学时)
Chapter 2: Data preprocessing: Data collection, data extraction, and data cleansing, python programming (4 hours)
第三章:回归模型:线性回归和正则化方法(4 学时)
Chapter 3: Supervised learning: Linear regression, regularization methods: Ridge and Lasso (4 hours)
第四章:分类模型:k-近邻,决策树,支持向量机,逻辑回归和朴素贝耶斯法则(8 学时)
Chapter 4: Classification: k-nearest neighbours, decision trees, support vector machine, logistic regression and Naïve Bayes rules (8
hours)
第五章:集成算法:袋装,随机森林,提升,AdaBoost 算法,梯度提升决策树(4 学时)
Chapter 5: Ensemble learning: Bagging, Stochastic forests, boosting, AdaBoost, gradient boosting decision tree (GBDT) (4 hours)
第六章:聚类模型:k 平均方法,层次聚类(4 学时)
Chapter 6: Clustering models: k-means, hierarchical clustering, association rules (4 hours)
第七章:特征与模型选择:偏差-方差分解,评价指标,交叉核实(2 学时)
Chapter 7: Feature and model selection: Bias-variance decomposition, evaluation indices, cross-validation (2 hours)
第八章:降维:线性判别分析,主成分分析(4 学时)
Chapter 8: Dimension reduction: Linear discriminant analysis (LDA), principle component analysis (PCA) (4 hours)
第九章:EM 算法和高斯混合模型(2 学时)
Chapter 9: Expectation-Maximization (EM) methods and Gaussian mixed models (2 hours)
第十章:概率图模型,图算法与社交网络分析(4 学时)
Chapter 10: Probabilistic graphical models, graphical algorithms and social network analysis (4 hours)
第十一章:神经网络与深度学习(4 学时)
Chapter 11: Neural networks and deep learning (4 hours)
第十二章:自然语言处理和文本分析(4 学时)
Chapter 12: Natural language processing (NLP) and text analysis (4 hours)
第十三章:推荐系统(2 学时)
Chapter 13: Recommender systems (2 hours)
第十四章:其他专题:在线学习,大规模数据与分布式计算(如有时间)
Chapter 14: Online learning, large scale data and distributed computing (if time permit)