[Séminaire] Clustering high-dimensional count data through a mixture of multinomial PCA

  • Formation
  • Recherche
Publié le 13 octobre 2020 Mis à jour le 28 juin 2022
Date(s)

le 21 janvier 2021

Lieu(x)
Seminar held remotely via Microsoft Teams

Seminar held by Nicolas Jouvin (Université Paris 1) on January 21, 2021 at 10:00

Speaker: Nicolas Jouvin (PhD student at Université Paris 1, SAAM Laboratory).

Abstract: Count data is used in many scientific fields in the form of frequency counts for instance as the occurrences of distinct words in a bag-of-words model for text analysis, or as read counts in genomics. This presentation addresses the problem of count data clustering, with the help of a mixture model. Based on the latent Dirichlet allocation, also known as the multinomial PCA, it allows the integration of clustering and dimension reduction to deal with high-dimensional datasets. We present a new variational EM algorithm for this model, combined with a greedy heuristic. We illustrate the qualitative interest of the proposed methodology in a real-world application, for the clustering of anatomopathological medical reports, in partnership with expert practitioners from the Institut Curie hospital.

Due to the current pandemic, this seminar will be held remotely via Microsoft Teams.

To register, please send an email to