TY - JOUR
T1 - Automatic detection of n-degree family members
AU - Pedersen, Emil M.
AU - Steinbach, Jette
AU - Pedersen, Carsten B.
AU - Schork, Andrew J.
AU - Krebs, Morten D.
AU - Vilhjálmsson, Bjarni J.
AU - Privé, Florian
N1 - Publisher Copyright:
Copyright © 2025 Pedersen, Steinbach, Pedersen, Schork, Krebs, Vilhjálmsson and Privé.
PY - 2025
Y1 - 2025
N2 - Summary: Family-based genetic studies often require the identification of relatives up to a specified degree, but existing tools are either restricted to second-degree relatives, return entire connected pedigrees, or require multiple pre- or post-processing steps. We implemented five new functions, namely, prepare_graph, get_kinship, graph_to_trio, get_relations, and Relation_per_proband_plot, in the R package LTFHPlus to address these limitations. prepare_graph constructs a directed graph from population-level trio data using the igraph package and supports attaching additional attributes to individuals. From this graph, relatives of arbitrary degree can be identified efficiently. get_kinship calculates a kinship matrix for all individuals in a (sub)graph, and graph_to_trio reconstructs trio information from identified families, enabling downstream use with other pedigree tools. In addition, familial relations can be labelled from the graph using the function get_relations, and the total and average of each relation per proband can be plotted using Relation_per_proband_plot. Using the publicly available minnbreast dataset, we constructed a graph containing 28,081 individuals and 30,720 familial edges. Across 1,000 repetitions, the median run-time for identifying all relatives up to the third degree for 500 randomly selected individuals was 0.03 s, and kinship matrix calculation had a median run-time of 1.57 s (single-threaded execution). These functions provide a reproducible, scalable, and interoperable solution for integrating family information into genetic analyses.
AB - Summary: Family-based genetic studies often require the identification of relatives up to a specified degree, but existing tools are either restricted to second-degree relatives, return entire connected pedigrees, or require multiple pre- or post-processing steps. We implemented five new functions, namely, prepare_graph, get_kinship, graph_to_trio, get_relations, and Relation_per_proband_plot, in the R package LTFHPlus to address these limitations. prepare_graph constructs a directed graph from population-level trio data using the igraph package and supports attaching additional attributes to individuals. From this graph, relatives of arbitrary degree can be identified efficiently. get_kinship calculates a kinship matrix for all individuals in a (sub)graph, and graph_to_trio reconstructs trio information from identified families, enabling downstream use with other pedigree tools. In addition, familial relations can be labelled from the graph using the function get_relations, and the total and average of each relation per proband can be plotted using Relation_per_proband_plot. Using the publicly available minnbreast dataset, we constructed a graph containing 28,081 individuals and 30,720 familial edges. Across 1,000 repetitions, the median run-time for identifying all relatives up to the third degree for 500 randomly selected individuals was 0.03 s, and kinship matrix calculation had a median run-time of 1.57 s (single-threaded execution). These functions provide a reproducible, scalable, and interoperable solution for integrating family information into genetic analyses.
KW - family-based studies
KW - genetic epidemiology
KW - graph theory
KW - kinship matrix
KW - pedigree analysis
KW - R package
KW - trio data
UR - http://www.scopus.com/inward/record.url?scp=105026817608&partnerID=8YFLogxK
U2 - 10.3389/fgene.2025.1708315
DO - 10.3389/fgene.2025.1708315
M3 - Journal article
C2 - 41458211
AN - SCOPUS:105026817608
SN - 1664-8021
VL - 16
JO - Frontiers in genetics
JF - Frontiers in genetics
M1 - 1708315
ER -