TY - JOUR
T1 - Probabilistic PARAFAC2
AU - Jørgensen, Philip J H
AU - Nielsen, Søren F
AU - Hinrich, Jesper L
AU - Schmidt, Mikkel N
AU - Madsen, Kristoffer H
AU - Mørup, Morten
PY - 2024/8/17
Y1 - 2024/8/17
N2 - The Parallel Factor Analysis 2 (PARAFAC2) is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example, because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable to improve robustness to noise and provide a principled approach for determining the number of factors, but challenging because direct model fitting requires that factor loadings be decomposed into a shared matrix specifying how the components are consistently co-expressed across samples and sample-specific orthogonality-constrained component profiles. We develop two probabilistic formulations of the PARAFAC2 model along with variational Bayesian procedures for inference: In the first approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the second, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulations to the conventional direct fitting algorithm based on maximum likelihood on synthetic data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data showing that the probabilistic formulations are more robust to noise and model order misspecification. The probabilistic PARAFAC2, thus, forms a promising framework for modeling multi-way data accounting for uncertainty.
AB - The Parallel Factor Analysis 2 (PARAFAC2) is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example, because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable to improve robustness to noise and provide a principled approach for determining the number of factors, but challenging because direct model fitting requires that factor loadings be decomposed into a shared matrix specifying how the components are consistently co-expressed across samples and sample-specific orthogonality-constrained component profiles. We develop two probabilistic formulations of the PARAFAC2 model along with variational Bayesian procedures for inference: In the first approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the second, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulations to the conventional direct fitting algorithm based on maximum likelihood on synthetic data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data showing that the probabilistic formulations are more robust to noise and model order misspecification. The probabilistic PARAFAC2, thus, forms a promising framework for modeling multi-way data accounting for uncertainty.
KW - multi-way modeling
KW - orthogonality constraint
KW - PARAFAC2
KW - tensor decomposition
KW - variational inference
UR - http://www.scopus.com/inward/record.url?scp=85202599971&partnerID=8YFLogxK
M3 - Journal article
C2 - 39202167
SN - 1099-4300
VL - 26
JO - Entropy (Basel, Switzerland)
JF - Entropy (Basel, Switzerland)
IS - 8
M1 - 697
ER -