课程大纲
COURSE SYLLABUS
1.
课程代码/名称
Course Code/Title
高维统计分析
High Dimensional Statistics
2.
课程性质
Compulsory/Elective
Elective
3.
课程学分/学时
Course Credit/Hours
3/48
4.
授课语言
Teaching Language
中英双语 English & Chinese
5.
授课教师
Instructor(s)
授课教师:李曾
Zeng Li
所属学系:统计与数据科学系 Department of Statistics and Data Science
联系方式:
liz9@sustech.edu.cn
6
是否面向本科生开放
Open to
undergraduates
or not
7.
先修要求
Pre-requisites
(如面向本科生开放,请注明区分内容。
If the course is open to undergraduates,
please indicate the difference.
统计线性模型(MA329),多元统计分析(MA304
Statistical Linear Models (MA329), Multivariate Statistical Analysis (MA 304)
8.
教学目标
Course Objectives
(如面向本科生开放,请注明区分内容。 If the course is open to undergraduates, please indicate the
difference.
本课程旨在引导学生学习学科前沿的高维数据统计分析方法,帮助学生加深对惩罚最小二乘此类方法的理解,达到
让学生学会使用适当的统计方法来处理高维数据分析中的问题。
This course aims to guide students learn frontier modeling methods in high dimensional statistical analysis and
data science. It helps students deepen their understanding in penalized least square methods and reac
h the goal
of solving practical high dimensional problems using advanced statistical methods and software.
9.
教学方法
Teaching Methods
(如面向本科生开放,请注明区分内容。
If the course is open to undergraduates, please indicate the
difference.
教师授课,课堂讨论
10.
教学内容
Course Contents
(如面向本科生开放,请注明区分内容。 If the course is open to undergraduates, please indicate the
difference.
Section 1
1 介绍(1 hour
1.1 高维问题的出现
1.2 维数问题的影响
1 Introduction (1 Hour)
1.1 Rise of Dimensionality
1.2 Impact of Dimensionality
Section 2
2 多元与非参数回归分析 (7 Hours)
2.1 多元线性回归
2.2 加权最小二乘
2.3 Box-Cox 变换
2.4 模型建立与展开
2.5 岭回归
2.6 Reproducing Kernel Hilbert Space 中的回归
2.7 交叉验证
2 Multiple and Nonparametric Regression (7 Hours)
2.1 Multiple Linear Regression
2.2 Weighted Least Squares
2.3 Box-Cox transformation
2.4 Model Building and Basis Expansions
2.5 Ridge Regression
2.6 Regression in Reproducing Kernel Hilbert Space
2.7 Leave-one-out and Generalized Cross-validation
Section 3
3 Lasso 线性模型 (4 Hours)
3.1 Lasso 估计量
3.2 交叉验证和推断
3.3 Lasso 估计量的计算
3.4 Lasso 解的唯一性
3 The Lasso for Linear Models (4 Hours)
3.1 The Lasso estimator
3.2 Cross Validation and Inference
3.3 Computation of Lasso Solution
3.4 Uniqueness of Lasso Solutions
Section 4
4 广义线性模型 (5 Hours)
4.1 多类别 Logistic 回归
4.2 对数线性模型与泊松广义线性模型
4.3 Cox
比例风险模型
4.4 支持向量机
4.5 算法
4 Generalized Linear Models (5 Hours)
4.1 Multiclass Logistic Regression
4.2 Log-linear Models and Poisson GLM
4.3 Cox Proportional Hazards Models
4.4 Support Vector Machine
4.5 Computational Details
Section 5
5 用于变量选择的惩罚最小二乘方法 (14 Hours)
5.1 传统变量选择方法
5.2 凸惩罚最小二乘方法
5.3 Lasso L1 惩罚
5.4 贝叶斯变量选择
5.5 数值算法
5.6 惩罚参数选择
5.7 残差方差和重拟合交叉验证
5 Penalized least square methods for variable selection (14 Hours)
5.1 Classical variable selection Criteria
5.2 Folded concave penalized least squares
5.3 Lasso and L1-regularization
5.4 Bayesian Variable Selection Procedures
5.5 Numerical Algorithms
5.6 Regularization parameter selection
5.7 Residual variance and refitted cross validation
Section 6
6 Lasso 惩罚项的推广 (4 Hours)
6.1 Elastic Net
6.2 Group Lasso
6.3 稀疏可加模型
6.4 Fused lasso
6 Generalizations of the Lasso Penalty (4 Hours)
6.1 Elastic Net
6.2 Group Lasso
6.3 Sparse Additive Models
6.4 Fused Lasso
Section 7
7 优化方法 (5 Hours)
7.1 凸优化条件
7.2 梯度下降
7.3 最小角度回归
7.4 ADMM
7.5 最小最大方法
7 Optimization Methods (5 Hours)
7.1 Convex Optimality Conditions
7.2 Gradient Descent and Coordinated Descent
7.3 Least Angle Regression
7.4 Alternating Direction Method of Multipliers
7.5 Minorization-Maximization Algorithms
Section 8
8 用于变量选择的惩罚似然方法 (8 Hours)
8.1 广义线性模型
8.2 惩罚似然方法
8.3 数值算法
8.4 调节参数选择
8 Penalized likelihood methods for variable selection (8 Hours)
8.1 Generalized linear models
8.2 Variable selection via Penalized Likelihood
8.3 Numerical Algorithms
8.4 Tuning parameters selection
11.
课程考核
Course Assessment
1
考核形式
Form of examination
2.
分数构成
grading policy
3
如面向本科生开放,请注明区分内容。
If the
course is open to undergraduates, please indicate the difference.
平时作业
40% +
期中考试
20% +
期末报告
40%
(考查)
12.
教材及其它参考资
Textbook and Supplementary Readings
1. Jianqing Fan, Runze Li, Cunhui Zhang and Hui Zou. (2020). Statistical foundations of Data Science. Chapman
and Hall/CRC.
2. Hastie, T., Tibshirani, R., Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations.
Chapman and Hall/CRC.
3. Bühlmann, P., Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory, and applications.
Springer Science & Business Media.
4. Philippe Rigollet, Jan Christian Hutter. (2019). High dimensional statistics. MIT lecture notes,