Emilie Morvant
Assistant Professor (Maître de Conférences) in Machine Learning
Title:
Learning Majority Vote for Supervised Classification and Domain Adaptation: PAC-Bayesian Approaches and Similarity Combination
Abstract:
Nowadays, due to the expansion of the web a plenty of data are available and many applications need to
make use of supervised machine learning methods able to take into account different information sources. For instance, for
multimedia semantic indexing applications, one have to efficiently take advantage of information about color, textual, texture
or sound sources of the document. Most of the existing methods try to combine these multimodal informations, either by
directly fusionning the descriptors or by combining similarities or classifiers, in order to produce a classification model more
reliable for the considered task. Usually, these multimodal facets imply two main issues. On the one hand, one have to be able
to correctly make use of all the a priori information available. On the other hand, the data, on which the model will be applied,
does not come from the same probability distribution than the data used during the learning step. In this context, we have to
adapt the model on new data, which is known as domain adaptation. In this thesis, we propose several theoretically-founded
contributions for tackle these issues.
A first serie of contributions studies the problem of learning a weighted majority vote
over a set of voters in a supervised classification setting. These results fall within the context of the PAC-Bayesian theory
allowing to derive generalization abilities for such a vote by assuming an a priori on the relevance of the voters.
Our first contribution aims at extending a recent algorithm, MinCq, minimizing a bound over the error of the majority vote in binary classification. This extension can take into account an a priori belief on the performances of the voters. This belief is expressed as an aligned distribution. We illustrate its usefulness for combining nearest neighbor classifiers [1], and for classifier fusion
on a multimedia semantic indexing task [2]. Then, we propose a theoretical contribution for multiclass classification tasks. Our
approach is based on an original PAC-Bayesian analysis considering the operator norm of the confusion matrix as an error
measure [3][4].
Our second series of contributions relates to domain adaptation. In this situation we present our third result for
combining similarities in order to infer a representation space for moving closer the learning distribution and the testing
distribution. This contribution is based on the theory of learning from good similarity functions and is justified by
the minimization of an usual bound in domain adaptation [5]. For our last contribution, we propose the first PAC-Bayesian
analysis for domain adaptation. This analysis is based on a consistent divergence measure between distributions allowing us
to derive a generalization bound for learning majority votes in binary classification. Moreover, we propose a first algorithm
specialized to linear classifiers and able to directly minimize our bound [6].
Associated Publications:
- [1]
Learning A Priori Constrained Weighted Majority Votes
Aurélien Bellet ; Amaury Habrard ; Emilie Morvant ; Marc Sebban
Machine Learning Journal (MLJ), 97(1-2):129-154, 2014, DOI: 10.1007/s10994-014-5462-z
[pdf] [published version] [bibtex]
- [2]
Majority Vote of Diverse Classifiers for Late Fusion
Emilie Morvant ; Amaury Habrard ; Stéphane Ayache
S+SSPR 2014 - IAPR Joint International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recignition (SSPR), Joensuu, Finland.
[pdf] [bibtex] [research report arXiv:1207.1019]
- [3]
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
Emilie Morvant ; Sokol Koço ; Liva Ralaivola
International Conference on Machine Learning, 2012, Edinburgh, United Kingdom. pp. 815-822
[pdf] [bibtex] [video] [discussion] [research report arXiv:1202.6228]
- [4]
On Generalizing the C-Bound to the Multiclass and Multi-label Settings
François Laviolette, Emilie Morvant, Liva Ralaivola, Jean-Francis Roy
NIPS 2014 Workshop on Representation and Learning Methods for Complex Outputs , Montréal, Canada.
[pdf] [research report arXiv:1408.1336]
- [5]
Parsimonious Unsupervised and Semi-Supervised Domain Adaptation with Good Similarity Functions
Emilie Morvant ; Amaury Habrard ; Stéphane Ayache
Knowledge and Information Systems (KAIS), 33(2):309-349, 2012, DOI: 10.1007/s10115-012-0516-7
[pdf] [published version] [bibtex]
- [6]
PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers
Pascal Germain ; Amaury Habrard ; François Laviolette ; Emilie Morvant
International Conference on Machine Learning, 2013, Atlanta, USA
[pdf] [bibtex] [PBDA code]