History of Machine Learning (4 hours)
Week 1 (lecture 1): History of machine learning in computer science: Early
days, milestones, current state-of-the art, the main categories of machine learning
methods. Introduction to platforms to be used in class, such as TensorFlow, Keras,
Jupiter notebooks, etc. with syllabus description [Install R, Python and
dependencies; Run a simple neural network on the famous MNIST computer vision
dataset]
Week 1 (lecture 2): History of machine learning in geosciences: Developments
from the 1980s to nowadays in domains such as seismology, remote sensing,
tomography, rock mechanics, geodesy, natural hazard assessment and early
warning systems [Explore available machine learning geoscience code projects on
GitHub repositories in preparation to future exercises; start getting familiar with
common computational platforms used in those repos, such as TensorFlow, Keras,
other R/Python packages, as well as with common interface platforms such as
TensorBoard and Jupyter notebooks].
Basics of Probability Theory & Maximum Likelihood Estimation (10 hours)
All the machine learning techniques to be explored in this course derive from
simple probabilistic concepts. Those include the basic rules of probability theory
from which the main probability distributions can be demonstrated. Maximum
likelihood estimation (MLE) and Bayes Theorem then provide the principal tools
needed to fit a model to some data. We will generate some simple data
representative of geophysical processes and use available low-dimensional
tabulated geo-data such as earthquake catalogues and results from rock laboratory
experiments to illustrate all concepts.
Week 2 (lecture 3): Probability theory axioms, derivation of probability
distributions (Binomial, Poisson, Normal, etc.), stochastic methods [how to call
probability distributions in R/Python and how to create a geo-data sample from any
given distribution]
Week 2 (lecture 4): Bayes Theorem, likelihood function, prior and posterior
distributions [Run a Bayesian inference R code that estimates seismicity
parameters during an underground reservoir stimulation, discover how probability
distributions evolve as more data come in]
Week 3 (lecture 5): Basics of regression: Linear regression, non-linear
regression, loss functions & regularisation (Lasso and Ridge), training, validation
and test set definition, performance metrics [Fit a simple 1-dimensional geo-data
set with regression tools and investigate underfitting versus overfitting; possible
dataset: number of acoustic emissions in rock sample as a function of time]
Week 3 (lecture 6): Basics of classification: Logistic regression, performance
metrics derived from the confusion matrix, application examples in geosciences
[Develop a logistic regression model to classify geographic cells as aftershock/no-
aftershock on post-mainshock data, find proper features, calculate performance
metrics]
Week 4 (lecture 7): Gradient descent fundamentals: Parameter space
exploration, beware of local minima, analytical solutions, general algorithms,