Statistical Genetics in Association Studies and Prediction

Department of Genetic Epidemiology

Association studies are an important tool when identifying genetic risk factors in complex diseases.
To this end, researchers often conduct case-control studies that compare the frequencies of gene variants between healthy and diseased individuals. A higher frequency of a gene variant within the cases points to an involvement of this particular gene variant in the development of the disease.
There are two types of association study: candidate gene studies, in which only a small number of genes is examined, and genome-wide association studies (GWAS) covering the whole genome.

The Department focusses on the development of novel statistical methods for candidate gene studies and GWAS as well as the identification of genetic risk factors. We are particularly interested in the areas of scaling problems, gene-gene and gene-environment interaction, pathway analysis, and prediction.

Our research focus in methodology

Problems of Scale in Statistical Genetics

One of the central theoretical and practical problems of genome-wide association studies (GWAS) is the large amount of data. Standard arrays today start at 500,000 genetic markers; after imputation, with the help of publicly accessible databases for reference populations such as HapMap or the "1000 Genomes Project", this often increases to around 9 million markers. Sequence analyses result in even larger amounts of genetic material. This leads to problems in data preparation and subsequent association analysis. We work on the development of statistical methods for such high-dimensional genetic data.

Gene-Gene and Gene-Environment Interaction

Biological processes that can lead to the development of a disease are usually composed of both gene products and environmental factors. In particular, interactions between several genes or between genes and the environment therefore also play an important role in the development of disease.

However, interactions are unfortunately very difficult to detect, as the statistical tests that exist for this purpose either have too little power - especially in a genome-wide context, or have problems with compliance with the first kind error.

Therefore, we are working on developing better methods to detect interactions as well. Examples are gene-radon interaction (see lung cancer and radon), gene-time interaction (see psychosis) and gene-gene interaction in pathways (see GWAS pathways).

Genome Wide Association Study (GWAS) Pathways

The interaction of numerous genetic and environmental factors in biological processes runs in pathways in complex diseases. Therefore, pathway information should be used to improve the results of genome-wide association studies (GWAS). Gene set analysis methods focus on the identification of entire significant pathways rather than individual markers.

By focussing on the pathway, single gene variants with only small effects in the same pathway can strengthen themselves together.

We use kernel machine learning, a methodological combination of two statistical subfields based on Reproducing Kernel Hilbert Spaces: mixed models and geostatistics. This makes it possible, for example, to investigate gene-gene interactions within the network of a pathway. Further developments for longitudinal data as well as the integration of further -omics data and Baysian approaches are also of interest.

Prediction

It would be desirable in everyday clinical practice to use genetic associations to predict disease risk, the further development of a disease, or the course of therapy better.

Good quality risk models can be used to assess the individual risk of patients in order to make decisions on preventive measures or therapeutic options. In general, the accuracy of the prediction is controlled using new data that are independent of the data used to fit the model. However, in the context of genetic data, such independent data sets for validation are only rarely available.

Therefore, methods to validate models without further independent data are in development. Of particular interest here are genetic risk models of patient survival. We continue to work on polygenic risk scores and kernel methods including the development of methods to study the longitudinal course of a trait.

This might also interest you

Follow us