ELEN0062 - Introduction to machine learning

Random ML quote

We chose it because we deal with huge amounts of data. Besides, it sounds really cool.

Larry Page, Google



Project 29 Sep. 2016

Statement for project 1
Antonio's presentation about Python, Numpy and Sklearn
Sklearn cheatsheet

Deadline 25 Oct. 2016 Don't forget to submit your project 1 on the submission platform
Project 27 Oct. 2016

Statement for project 2


22 Nov. 2016

23 Nov. 2016

Don't forget to submit your project 2 on the submission platform
Project 24 Nov. 2016

Project 3

Deadline 01 Dec. 2016 The toy example must have run on the Kaggle plateform
Deadline 17 Dec. 2016

End of the challenge

Deadline 19 Dec. 2016

Don't forget to submit your report and code for the third project on the submission platform

Deadline TBA Presentations

Supplementary material

Here is a very scarce list of supplementary material related to the field of machine learning. I tend to update this section when I come across interesting stuff but if you feel like you need more material on some topic, do not hesitate to ask!

Machine learning in general

There are tons of online and accessible material in the domain of machine learning:

Classification and regression trees

Linear regression

  • The geometry of Least Squares (1 variable)
  • Note that the ANOVA is a special case of linear models where the input variables are dummy one-hot class variables. Consequently, the basis vector of the column space are orthogonal and the problem reduces to many 1 variable least squares.

    Nearest neighbor(s)

    Artifical neural networks

    There have been three hypes about ANN. The first one was about the perceptrons in the 60s until it was discovered it could not solve a XOR problem. The second hype started with the discovery of backpropagation but it soon became clear that the large and/or deep neural nets were very hard to train. We are in the midts of the third one right now with `deep learning`: neural nets with several (many) invisible layers.

    Learning theory (Bias/Variance...)

    Model assessment and selection

    Support Vector Machines

    Ensemble methods

    Feature selection

    Unsupervised learning


    There are many YouTube channels about ML. Here are a few:

    Third project: the challenge

    The third project is organized in the form a challenge, where you will compete against each other. This year, the challenge is about predicting the rating that a given user would give to a particular film. All the relevant information can be found on the Kaggle plateform which will hold the challenge.

    The project is divided into four parts. All the deadlines can be found in the schedule section above.

    1. Setup for the project
      • Create an account on the Kaggle plateform
      • Form groups of two. Concatenate your sXXXXXX id's as group names.
      • Test the toy example
    2. Propose the best model you can
    3. Submit an archive on the submission platform in tar.gz format, containing a report that describes the different steps of your approach and your main results along with your source code. Use the same ids as for the Kaggle plateform. The report must contain the following information:
      • A detailed description of all the approaches that you have used to win the challenge. The kaggle winning model guideline should be followed for each approach.
      • A detailed description of your hyper-parameters optimization approach and your model validation technique.
      • A table summarizing the performance of your differents approaches containing for each approach at least the name of the approach, the validation score, the score on the public and the private leaderboard.
      • Any complementary information or figures that you want to mention.
    4. Present succinctly your approach to the rest of the class. (More information coming soon)

    Have fun!

    Last modified on January 19 2017 19:51