TY - JOUR
T1 - Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining
AU - Kirk, Isa Kristina
AU - Simon, Christian
AU - Banasik, Karina
AU - Holm, Peter Christoffer
AU - Haue, Amalie Dahl
AU - Jensen, Peter Bjødstrup
AU - Juhl Jensen, Lars
AU - Rodríguez, Cristina Leal
AU - Pedersen, Mette Krogh
AU - Eriksson, Robert
AU - Andersen, Henrik Ullits
AU - Almdal, Thomas
AU - Bork-Jensen, Jette
AU - Grarup, Niels
AU - Borch-Johnsen, Knut
AU - Pedersen, Oluf
AU - Pociot, Flemming
AU - Hansen, Torben
AU - Bergholdt, Regine
AU - Rossing, Peter
AU - Brunak, Søren
N1 - © 2019, Kirk et al.
PY - 2019/12/10
Y1 - 2019/12/10
N2 - Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.
AB - Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.
U2 - 10.7554/eLife.44941
DO - 10.7554/eLife.44941
M3 - Journal article
C2 - 31818369
VL - 8
JO - eLife
JF - eLife
SN - 2050-084X
M1 - e44941
ER -