Data
Updated information regarding what data is used can be found in the changelog page.
Updated information regarding what data is used can be found in the changelog page.
Events for a given individual and a given phenocode will be merged if they are less than or equal to 30 days apart. For example if an individual as K11_APPENDACUT events at the following dates: 2000-01-01, 2000-01-20, 2000-02-10, 2000-02-28, then all these events will become one at date 2000-01-01.
This is done as an attempt to remove events that are follow-ups rather than initial diagnoses.
Number of individuals having at least one event for a given phenocode, divided by the total number of individuals in the FinnGen study. No adjustment is done to account for the difference between the age distribution of the FinnGen cohort and the one of the Finnish population.
The implementation of the mortality statistics makes use of:
The model used is: y ~ prior endpoint + birth year + sex
If the endpoint is sex-specific, then the sex covariate is removed from the model.
Lagged hazard ratios are computed by considering only up to 1, 5, and 15 years of exposed time.
The regression are done using the lifelines library.
The absolute risk represents the probability of dying. It is defined as AR = 1 - survival_probability. The survival probability is derived from the fitted Cox model with the following parameters:
Most of the study follows the NB-COMO study.
The model used is: y ~ prior + birth_year + sex
If the endpoint is sex-specific, then the sex covariate is removed from the model.
Lagged hazard ratios are computed by considering only up to 1, 5, and 15 years of exposed time.
The regression are done using the lifelines library.
Due to the sensitive nature of the data, the age when entering and leaving the study has an accuracy of 1 year.
The drug score is computed in a 2-step process:
The resulting probability value is the drug score. The highest the drug score is, the more likely the drug is to be taken after the given endpoint.
Availabe on GitHub for both the data processing pipeline and the website.