1
课程详述
COURSE SPECIFICATION
以下课程信息可能根据实际授课需要或在课程检讨之后产生变动。如对课程有任何疑问,请
联系授课教师。
The course information as follows may be subject to change, either during the session because of unforeseen
circumstances, or following review of the course at the end of the session. Queries about the course should be
directed to the course instructor.
1.
课程名称 Course Title
工程机器学习基础
Machine Learning for Engineering
2.
授课院系
Originating Department
机械与能源工程系
Department of Mechanical and Energy Engineering
3.
课程编号
Course Code
ME338
4.
课程学分 Credit Value
3
5.
课程类别
Course Type
专业核心课
Major Core Courses
6.
授课学期
Semester
春季
Spring
7.
授课语言
Teaching Language
英文
English
8.
他授课教师)
Instructor(s), Affiliation&
Contact
For team teaching, please list
all instructors
责任教师:
张巍
机械与能源工程系
zhangw3@sustech.edu.cn
2021 年春季学期由潘佳(外聘教师)老师授课
Wei Zhang
Department of Mechanical and Energy Engineering
zhangw3@sustech.edu.cn
Jia Pan
jpan@cs.hku.hk
9.
实验员/所属系、
方式
Tutor/TA(s), Contact
待公布
To be announced
10.
选课人数限额(可不填)
Maximum Enrolment
Optional
2
11.
授课方式
Delivery Method
习题/辅导/讨论
Tutorials
实验/实习
Lab/Practical
其它(请具体注明)
OtherPlease specify
总学时
Total
学时数
Credit Hours
48
12.
先修课程、其它学习要求
Pre-requisites or Other
Academic Requirements
线性代数 A MA107A
概率论与数理统计(MA212
计算机程序设计基础 BCS102B
Linear Algebra A (MA107A)
Probability and Statistics (MA212)
Introduction to Computer Programming (CS102B)
13.
后续课程、其它学习规划
Courses for which this
course is a pre-requisite
14.
其它要求修读本课程的学系
Cross-listing Dept.
教学大纲及教学日历 SYLLABUS
15.
教学目标 Course Objectives
本课程将介绍机器学习相关的基本知识,为日后学生在机器学习相关方面的深入学习与应用打下基础。本课程将教学生
如何解决线性回归问题、如何用逻辑回归的方法制定和解决线性分类问题、如何使用 k-means 来制定和解决聚类问题,还
将让学生对过拟合、交叉验证、模型阶数选择、特征选择、神经网络和 PCA等概念和方法有所了解,同时让学生积累一
定的优化以及概率模型相关的经验。为保障学生的实际动手与应用能力,本课程还将教学生如何用 Python/Scikit-
Learn/PyTorch 实现基本的机器学习和深度学习任务。通过上述教学方式使学生掌握基础的统计与机器学习相关理论及动
手实践能力。
This course aims to teach students basic knowledge of machine learning and lay a foundation for their further study
and application in machine learning. This course will teach students how to formulate and solve linear regression
problems, how to formulate and solve linear classification problems using logistic regression, and how to formulate and
solve clustering problems using k-means. Students will also know the basic methods and concepts of over-fitting, cross-
validation, model-order selection, feature selection, neural networks, PCA, etc., and gain experience with optimization
and probabilistic models. In order to ensure students' hands-on and practical abilities, this course will also teach
students how to implement basic machine-learning and deep learning tasks in Python/sklearn/PyTorch.
16.
预达学习成果 Learning Outcomes
完成本课程的学习后,学生应能够完成:
1)制定并解决线性回归问题;
2)用逻辑回归的方法制定和解决线性分类问题;
3)使用 k-means 来制定和解决聚类问题;
4)了解过拟合,交叉验证,模型阶数选择,特征选择,神经网络,主成分分析;
5)获得一些与优化和概率模型相关的经验;
6)在 Python/sklearn/PyTorch 中实现基本的机器学习和深度学习任务。
After this course, students should be able to:
(1) Formulate and solve linear regression problems.
(2) Formulate and solve linear classification problems using logistic regression.
(3) Formulate and solve clustering problems using k-means.
(4) Understand over-fitting, cross-validation, model-order selection, feature selection, neural networks, PCA.
(5) Gain experience with optimization and probabilistic models.
(6) Implement basic machine-learning and deep learning tasks in Python/sklearn/PyTorch.
3
17.
课程内容及教学日 (如授课语言以英文为主,则课程内容介绍可以用英文;如团队教学或模块教学,教学日历须注明
主讲人)
Course Contents (in Parts/Chapters/Sections/Weeks. Please notify name of instructor for course section(s), if
this is a team teaching or module course.)
课程内容
教学要求
学时
分配
绪论
机器学习的应用背景
本课程的性质、任务和主要内容
机器学习的基本概念
Introduction
Application background of
Machine Learning
The objectives and outcome of
Machine Learning
Concept of Machine Learning
了解机器学习的发展历程及应用背景
了解机器学习的基本概念
了解机器学习任务的分类及其特点
Understand the history, application, and importance of
Machine Learning
Understand the basic concept of Machine Learning
Understand the classification of different machine learning
tasks
2h
线性回归
例子:了解糖尿病患者的血糖水平
多变量线性模型
最小二乘解
最小二乘解的几何解释
Python 实现多元线性回归
特殊情况:简单线性回归
特征转换
Linear Regression
Motivating Example:
Understanding Glucose Levels in
Diabetics
The Multiple Variable Linear
Model
The Least-Squares Solution
Understanding the LS Solution
Multiple Linear Regression in
Python
Special Case: Simple Linear
Regression
Feature Transformations
将一个机器学习任务描述为多元线性回归:识别特征和
目标变量;识别特征转换的可能性,例如独热码
以矩阵/向量的形式描述回归模型
了解最小二乘解的模型参数
通过 R2 评估拟合情况
了解如何使用 numpy sklearn 包在 Python 中实现线性
回归
Formulate a machine learning task as multiple linear
regression: Understand advantage over simple linear
regression; Identify feature and target variables; Recognize
possibilities for feature transformation, such as one-hot-
coding
Describe the regression model in matrix/vector form
Understand the least-squares solution for the model
coefficients
Assess goodness-of-fit via R2
Know how to implement linear regression in Python using
the numpy and sklearn packages
4h
验证及偏差-方差权衡
不同误差的类型
交叉验证
偏差-方差权衡
机器学习中的模型阶数选择及
Python 示例
Validation and Bias Variance Trade-off
Types of error
Cross validation
Bias variance trade-off
Python solution to model order
selection examples of machine
learning
了解机器学习中模型阶数选择问题
了解不同误差类型
理解偏差-方差权衡
掌握交叉验证方法
掌握多项式拟合的阶数选择
了解如何用 python 实现机器学习的模型阶数选择
Understand the model order selection problem in machine
learning
Understand different error types of data fitting
Understand the concept of bias variance trade-off
Understand cross validation method
Understand the order selection of polynomial fitting
Know how to do model order selection in python
2h
正则化最小二乘与特征选择
例子:预测前列腺癌
岭回归
LASSO 回归
两种回归方式与特征权重的关系
正则化回归的概率解释
Regularized Least-Squares and Feature
Selection
Motivating Example: Predicting
Prostate Cancer
理解特征选择背后的动机和思想
了解一些基础的特征选择方法: 穷举搜索,逐步选择,
目标互相关,正则化
理解岭回归和 LASSO 以及它们的系数路径
理解它们与 MLE MAP 之间的联系
了解如何使用 sklearn 实现 LASSO
理解如何使用交叉验证选择正则化强度
Understand motivation and idea behind feature selection
Understand feature selection methods based on: exhaustive
search, stepwise selection, target cross-correlation,
4h
4
Ridge Regression
LASSO
Probabilistic Interpretation of
Regularized Regression
regularization
Understand ridge regression and LASSO:
interpret their coefficient paths
Know how to implement LASSO using sklearn
know how to select the regularization strength using cross-
validation
Understand connections to ML estimation and MAP
estimation
逻辑回归
逻辑回归与分类问题的关联
二分类问题
多分类问题
分类误差分析
机器学习中的逻辑回归分类问题
Python 示例
Logistic Regression
Relationship between Logistic
Regression and Classification
Problems
Binary classification problems
Multiclass classification problems
Classification error metrics
Python solution to logistic
regression classification problems
of machine learning
了解逻辑回归与分类问题的关联
了解逻辑函数表达式、交叉熵、正则化等概念
掌握使用逻辑回归进行二分类和多分类的方法
掌握分类问题的误差分析方法
了解如何在 Python 中实现用逻辑回归进行数据分类
Understand the relationship between Logistic Regression
and Classification problems
Know logistic function, cross-entropy, ML fitting,
regularization
Learn how to use logistic regression in classification
problems
Understand the error metric of classification problems
Know how to implement and access classification using
sklearn
4h
非线性优化与梯度下降
例子:为逻辑回归构建优化器
多变量函数的梯度梯度下降
自适应步长
凸性
Nonlinear Optimization & Gradient Descent
Motivating Example: Build an
Optimizer for Logistic Regression
Gradients of Multi-Variable
Functions Gradient Descent
Adaptive Stepsize
Convexity
确定优化问题中的损失函数、参数和约束条件
计算损失函数的梯度
了解如何在 Python 中高效计算梯度
掌握如何编写梯度下降的迭代表达式
了解步长对收敛性的影响
熟悉自适应步长方案,如 Armijo 规则
判断损失函数是否为凸函数
理解凸性对梯度下降的影响
Identify the cost function, parameters, and constraints in an
optimization problem
Compute the gradient of a cost function for scalar, vector,
or matrix parameters
Know how to efficiently compute a gradient in Python
Write the gradient-descent update
Understand the effect of the stepsize on convergence
Be familiar with adaptive stepsize schemes like the Armijo
rule
Determine if a loss function is convex
Understand the implications of convexity for gradient
descent
4h
最大边缘分类和支持向量机
例子:识别手写数字
最大边际分类
支持向量分类器
支持向量机
Maximum-Margin Classification and the
Support Vector Machine
Motivating Example: Recognizing
Handwritten Digits
Maximum-Margin Classification
The Support Vector Classifier
The Support Vector Machine
理解线性分类边界的几何性质
掌握边际最大化分类器
理解支持向量机(SVM)
了解如何使用 sklearn 实现 SVC SVM
Understand the geometry of the linear classification
boundary
Understand the margin-maximizing classifier
Understand the support vector classifier (SVC) Understand
the support vector machine (SVM)
Know how to implement the SVC and SVM with sklearn
4h
神经网络
例子:特征学习转换
前馈神经网络
随机梯度下降法
理解 2 层前馈神经网络的特征学习转换、网络结构、激
活函数的选择以及训练损失
理解小批量训练和随机梯度下降
理解梯度计算的反向传播方法
6h
5
使用 PyTorch 实现和训练神经网
通过反向传播计算梯度
Neural Networks
Motivating Example: Learning a
Feature Transformation
Feed-Forward Neural Networks
Training via Stochastic Gradient
Descent
Implementing and Training Neural
Nets with PyTorch
Gradient Computation via Back-
Propagation
了解如何使用 PyTorch 实现神经网络 Understand 2-layer
feedforward neural networks: include learning feature
transformations, network architecture, choice of activation
functions and training loss
Understand mini-batch training and stochastic gradient
descent
Understand the back-propagation approach to gradient
computation
Know how to implement a neural network using PyTorch
卷积和深度神经网络
例子:ImageNet 大型视觉识别挑
深度网络和特性层次结构
二维卷积基础知识
卷积神经网络
PyTorch 中创建和可视化卷积
训练 CNN 网络:
Backpropagation Batch-Norm
Dropout
从著名的预先培训的网络中学习
Convolutional and Deep Neural Networks
Motivation: ImageNet Large-Scale
Visual Recognition Challenge
Deep Networks and Feature
Hierarchies
2D Convolution Basics
Convolutional Neural Networks
Creating and Visualizing
Convolutional Layers in PyTorch
Training CNNs: Backpropagation,
Batch-Norm, Dropout, Etc.
Transfer Learning from Famous
Pre-Trained Networks
认识到 CNN 网络可用来学习和挖掘层次特征
二维卷积:
理解局部模式匹配
理解卷积的边界条件以及其和相关性之间的关系
知道如何使用 scipy.signal PyTorch 实现卷积
卷积神经网络:
理解卷积层,稠密层,子抽样,池化理解反向传播训练
识别训练技巧,如批处理规范、辍学、数据扩充
迁移学习和预培训网络:
熟悉 AlexNetVGGInceptionResNet 等著名网络,
知道如何在 PyTorch 中使用预先训练好的网络
Recognize CNNs as learning and exploiting hierarchical
features
2D convolution:
Understand as local pattern matching
Understand boundary conditions and relation between
convolution and correlation
Know how to implement convolution with scipy.signal
and PyTorch
Convolutional neural networks:
Understand convolutional layers, dense layers,
subsampling, pooling
Understand backpropagation training
Recognize training tricks like batch-norm, dropout, data
augmentation
Transfer learning and pre-trained networks:
Be familiar with AlexNet, VGG, Inception, ResNet, and
other famous networks
Know how to work with pre-trained networks in PyTorch
6h
主成分分析
降维
主成分分析
数据可视化 PCA
通过 SVD 计算 PCA
Python 示例:特征面和基于 PCA
的分类
Principal Component Analysis
Dimensionality Reduction
Principal Component Analysis
(PCA)
PCA for Data Visualization
Computing PCA via the SVD
Python Example: Eigenfaces and
PCA-based Classification
了解特征降维的需求
将主成分分析理解为 RSS-极小化线性逼近
理解正交投影
PCA 理解为子空间拟合
理解数据协方差特征向量在 PCA 中的作用
了解如何使用 PoV 度量 PCA 的性能
了解如何使用 SVD 计算 PCA
了解如何将 PCA 用于数据可视化
了解 PCA 系数在监督学习任务中的应用 Recognize need
for feature dimensionality reduction
Understand PCA as RSS-minimizing linear approximation
Understand orthogonal projection
Recognize PCA as subspace fitting
Understand the role of the data-covariance eigenvectors in
PCA
Know how to measure PCA performance using PoV
Understand how to compute PCA using the SVD
Understand how PCA can be used for data visualization
Understand how the PCA coefficients can be used in
supervised learning tasks
4h
6
集群、K-MeansNMF EM
例子:文档集群
集群和 k-means
文本挖掘与 Bag-of-WordsTF-
IDF,和 K-means
低阶模型:LSA NMF
高斯混合模型(GMMs)
GMMs 的期望最大化拟合
其他聚类方法
Clustering, K-Means, NMF, and EM
Motivating Example: Document
Clustering
Clustering and K-Means
Text Mining with Bag-of-Words,
TF-IDF, and K-Means
Low-Rank Models: LSA and NMF
Gaussian Mixture Models (GMMs)
Expectation-Maximization (EM)
Fitting of GMMs
Other Clustering Methods
了解 k-means聚类目标和 Lloyd 算法
理解用于文本挖掘的术语文档矩阵和 TF-IDF 分数
理解 NMF,它与 PCA 的关系,以及它在集群中的应用
了解 GMMs 及其在集群中的应用
了解 EM 算法及其在 GMM 参数拟合中的应用
Understand the k-means clustering objective and Lloyd’s
algorithm
Understand term-document matrices and TF-IDF scores for
text-mining
Understand NMF, its relation to PCA, and application to
clustering
Understand GMMs and their application to clustering
Understand the EM algorithm and its application to GMM
parameter fitting
4h
随机森林和其他集成方法
并行集成方法:装袋和粘贴
决策树和随机森林
顺序集成方法:Boosting
Random Forests and other Ensemble Methods
Parallel Ensemble Methods:
Bagging and Pasting
Decision Trees and Random Forests
Sequential Ensemble Methods:
Boosting
了解集成方法:装袋、粘贴、以及随机特征选择
理解决策树:特征阈值和决策区域、自上而下的树归纳
训练、同质性指标、随机森林
了解顺序集成方法如 Adaboost Gradient boosting
Understand parallel ensemble methods
bagging, pasting
random feature selection
Understand decision trees
feature thresholding and decision regions
training via top-down tree induction
homogeneity metrics: variance reduction, gini impurity,
random forests
Understand sequential ensemble methods like Adaboost,
Gradient boosting
4h
18.
教材及其它参考资 Textbook and Supplementary Readings
Textbook: James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning, 2013.
Supplementary Readings: Soroush Nasiriany, Garrett Thomas, William Wang, and Alex Yang, A Comprehensive Guide to
Machine Learning, 2018.
Ethem Alpaydin, Introduction To Machine Learning, 3
rd
edition, 2015
Sebastian Raschka, Python Machine Learning, 2015
课程评估 ASSESSMENT
19.
评估形式
Type of
Assessment
评估时间
Time
占考试总成绩百分
% of final
score
违纪处罚
Penalty
备注
Notes
出勤 Attendance
课堂表现
Class
Performance
小测验
Quiz
10%
课程项目 Projects
25%
平时作业
Assignments
35%
7
期中考试
Mid-Term Test
30%
期末考试
Final Exam
期末报告
Final
Presentation
其它(可根据需要
改写以上评估方
式)
Others (The
above may be
modified as
necessary)
20.
记分方式 GRADING SYSTEM
A. 十三级等级制 Letter Grading
B. 二级记分制(通过/不通过) Pass/Fail Grading
课程审批 REVIEW AND APPROVAL
21.
本课程设置已经过以下责任人/委员会审议通过
This Course has been approved by the following person or committee of authority