GLMM Based Clustering of Multivariate Mixed Type Longitudinal Data

Publikace na Matematicko-fyzikální fakulta |

2022

Abstrakt

Modern studies usually follow a subject repeatedly over a long period of time and result in longitudinal or panel data. Monitored outcomes may be of miscellaneous nature, which we call a mixed type data - ranging from simple binary indicators to specific numeric values.

Methods for modelling of such a pallete of potentially dependent outcomes are very scarce in the literature. Utilizing generalized linear mixed model (GLMM) methodology, a statistical model for repeatedly observed binary (logistic regression), ordinal (ordinal logit), general categorical (multinomial logit) and numeric outcomes (classical linear mixed model) has been proposed.

Potential relationships among outcomes are captured by jointly distributed random effects. Moreover, assumed latent heterogeneity within the data is supposed to be caused by division of observed subjects into several apriori unknown heterogeneous groups.

Following the principles of model based clustering (MBC), a mixture of the proposed models is created while designing the cluster differences to be in parameters pre-specified by the analyst. Bayesian approach is taken for the model estimation and inference is based on Markov chain Monte Carlo (MCMC) methodology.

Appropriate setting of prior distribution allows us to even face the problem of apriori unknown number of latent clusters by inducing sparse finite mixture. Capabilities of the introduced method are demonstrated by the analysis of currenly still ongoing EU-SILC study targetting European households.

Klíčová slova

longitudinal data GLMM MBC clustering MCMC