第十章(4 学时):文本挖掘
- 文本的预处理
- 词嵌入
- 变换器
Lecture (48 hours)
Section 1 (5 credit hours): Overview of Data Science
- definition and characteristics of data science, big data and statistics
- difference between statistics and machine learning
- difference between artificial intelligence and machine learning
- different categories of data
- opportunities and pitfalls of big data analysis
Section 2 (4 credit hours): Data Exploration
- data visualization
- dimensionality reduction
Section 3 (5 credit hours): Probability Distributions
- basics of probability distributions
- the importance of heavy-tail distributions
Section 4 (5 credit hours): Statistical Testing
- how to form a statistical hypothesis
- how to evaluate a statistical hypothesis
- beware of p-value hacking
Section 5 (5 credit hours): Numerical Optimization