First Part: Introduction to distributed storage and parallel computing [4 hours]
> Basic concepts, motivations, current situation and development, application prospects [2 hours]
> Examples of parallel computing using Linux and Python [2 hours]
Second Part: Methods for distributed storage and parallel computing [18 hours]
> Infrastructures of distributed systems [2 hours]
> Map and Reduce for parallel computing [4 hours]
> Workload balance and scheduling [2 hours]
> Communication and synchronisation in parallel computing [4 hours]
> Transactions and locks [2 hours]
> Fault-tolerance, Byzantine fault, and Paxos/RAFT protocols [2 hours]
> Distributed file system for distributed storage (e.g. HDFS) [2 hours]
Third Part: Parallel computing in practice [14 hours]
> Data processing in multi-threads/multi-processes [4 hours]
Data crawling, cleaning, preprocessing [2 hours]
Experiments [2 hours]
> Hadoop and PySpark [4 hours]
Basic concepts and usages of Hadoop and PySpark [2 hours]
Experiments [2 hours]
> Machine learning with multiple GPUs [6 hours]
Clustering, regression, classification, collaborative filter [4 hours]
Experiments [2 hours]
Fourth Part: Distributed storage and parallel computing in the future [6 hours]
> Training LARGE neural networks : data parallelism, model parallelism and beyond [2 hours]
> Blockchain -- decentralized distributed system [4 hours]