Research
I carry out research in machine learning. I'm mainly interested in the design of (computationally and statistically efficient) supervised and semi-supervised learning algorithms in order to exploit structured input and output spaces (sequences, images, time-series, graphs), with applications in bioinformatics, computer vision, and computer networks.
Please find below a (non exhaustive) list of representative research themes with main references (more or less in chronological order):
(last update: January, 2010. See here for a more complete list of publications)
- Machine learning:
- Decision trees and ensemble methods
During my phd thesis and afterwards, I have developed several tree-based ensemble methods, among which the dual perturb and combine method that simulates the averaging effect of ensembles with only one model and the extremely randomized tree algorithm, a random forest-like method that goes further in terms of randomization. This latter method has been used quite extensively by our group and others, notably in the context of image classification (see below).
- Time series classification and structured inputs
During my phd thesis, I developed several techniques for time series classification, among which the "segment and combine" approach. This latter method has been generalized for image classification and other structured input problems.
- Computer vision
Together with Raphaël Marée, Justus Piater, and Louis Wehenkel, we have developed an original method for image classification based on the extraction of random subwindows from the images and their classification with ensemble of extremely randomized trees
This method has been subsequently extended for image annotation and image retrieval: - Reinforcement learning
I participated in the development of the fitted q-iteration algorithm that uses extremely randomized trees as function approximators
- Structured outputs
During a postdoc in Florence d'Alché-Buc's group, we have proposed an extension of standard regression trees for handling kernelized output spaces. The approach can be used for learning an approximation of a kernel as a function of some input features, as well as for handling structured output problems.
- Feature selection
We are currently working on the development of methods for improving the interpretability of feature ranking techniques and hence helping in the determination of a relevance threshold in these rankings.
- Parallel and large-scale machine learning
Since 2010, we have started working on the development of parallel and large-scale machine learning algorithms:
- Decision trees and ensemble methods
- Bioinformatics
- Mass spectrometry
We have developed an approach based on tree-based ensemble methods for the determination of proteic biomarkers and predictive models from mass spectrometry data. The paper about the methodology:
and some biomedical applications: - Supervised inference of biological networks
During a postdoc in Florence d'Alché-Buc's group, we have applied the output kernel tree approach for the inference of protein-protein interactions and metabolic networks in Yeast. We are currently working on the application of these techniques on other kinds of networks
- Gene regulatory network inference
We have developed a method called GENIE3 for the inference of gene regulatory network from expression data. This method has been the best performer of the DREAM4 (multifactorial track) and DREAM5 network inference challenges.
- Genome-wide association studies
With Vincent Botta and Louis Wehenkel, we are working on the extension of decision tree-based methods for dealing with SNP data in the context of genome-wide association studies
- Mass spectrometry
- Other applications:
- Networking
Since 2004, I collaborate with the RUN team (Prof. Guy Leduc) for the application of machine learning techniques in networking. Two recent references:
- Power systems
In the past, I have worked on the application of machine learning techniques in power systems
- Networking
- Review papers
I wrote two review papers, one about bias/variance tradeoff as part of a handbook about data mining and knowledge discovery and, with Alexandre Irrthum and Louis Wehenkel, one about decision tree-based methods and their application in computational and systems biology