Counting objects in images is within the reach of everyone. When the number of objects to count becomes large, the task is more difficult and we tend to make errors because of some oversights. For a computer, counting or detecting objects in images is hard. However, the repetitive characteristic of this task does not worry it.
In histopathology, the study of pathology combined to histology (the study of cells and tissues), scientists frequently have to count cells from images of tissues to address hypotheses about pathological or developmental processes. The rise of digital imaging services and the power of computers and networks led to the emergence of digital histopathology, a new field where histopathologists analyze cells and tissues on computer screens. Therefore, it would be interesting to have automatic counting methods available to lighten the work of life scientists.
It this thesis, the problem of cell counting and the closely related problem of cell detection are studied. Supervised machine learning workflows are proposed in that purpose. Five methods are presented, (re-)implemented and tried on four databases of histopathological images, whose two are newly introduced for this kind of problem. We set up the software modules allowing to use these algorithms directly on databases from Cytomine, a web-based application for collaborative analysis of multi-gigapixel images.
Basically, two ways of counting are studied. Firstly, counting can be done by cell detection, the counting trivially being the number of detections. Secondly, counting can be estimated by density estimation, knowing the density for each image’s pixel.
An important aspect of this work is the assessment and the comparison of these approaches. Performance results in machine learning literature are sometimes too optimistic due to weak model validation. Robust metrics are thus defined in order to make comparisons possible. Each machine learning algorithm has a set of parameters that can be tuned. They have a large influence on the performance. In order to choose the best combination of parameters, a systematic model selection protocol is employed. It is the first time that such a set of methods has been evaluated on several datasets of such a large size. Our study has helped to better understand the functioning of these methods and to establish their limits.
Results are encouraging. In general, at least one algorithm is able to reach a counting error less than 10% for each dataset.