Analysis of data from 500,000 individuals in UK Biobank demonstrates an inherited component to ME/CFS

Guest blog by Professor Chris Ponting and colleagues

UK Biobank – a national biobank different from the ME/CFS biobank – has data from around 500,000 individuals, including both healthy people and those with one or more of the many different diseases in the UK population. About 2,000 people in the sample reported that they had been given a diagnosis of CFS.​

Analysis of data from this biobank indicates an inherited biological component for ME/CFS. The results show only one statistically significant change in a particular section of DNA and even this is problematic. This analysis indicates that a much bigger study, with many more ME/CFS cases, will be needed to indicate which genes and biological pathways are altered in people with ME/CFS.​


Chris Ponting

Myalgic encephalomyelitis (ME, also described as chronic fatigue syndrome, CFS) is a devastating long-term condition affecting 250,000 UK individuals. People with ME experience severe, disabling fatigue associated with post-exertional malaise. A few make good progress and may recover, while most others remain ill for years and may never recover. There is no known cause, or effective treatment for most. Consequently, it is vital to try new approaches to understand the reasons for the development of the condition.

This blog sets out what we can glean from the release, last summer, of data from about 500,000 individuals who make up the UK Biobank. (This biobank is not to be confused with the UK ME/CFS Biobank, UKMEB.) The data were acquired from individuals between 40 and 69 years of age in 2006-2010 who live across the UK. These people provided samples (e.g. blood, urine and saliva) and answered questionnaires. In addition, for some of these people their electronic health record data are being linked in. Importantly for this blog, the DNA variation (‘genotype’) of all the volunteer participants has been determined.

Genetic variation can provide insights into the causes of disease when these have a heritable component (i.e. are inherited down through the generations). DNA sequence is not altered by disease (except in cancer) and so variants can reveal the causes, rather than consequences, of disease.


Here we draw heavily from an analysis of the UK Biobank data by Oriol Canela-Xandri, Konrad Rawlik and Albert Tenesa which is described in a preprint available from bioRxiv. (The authors have kindly shared their results in this way in order to share results with others before the findings have been peer reviewed.)

From this (specifically, Supplemental Table 1) we see that data were analysed from 1,829 people among the UK Biobank cohort who self-reported as having been diagnosed with ME/CFS. The table also provides five pieces of information:

(1) The prevalence of ME/CFS among UK Biobank individuals was 0.448%. In other words, picking any person randomly in the UK then there is an even chance that they know someone with ME/CFS if they know about 200 people.

(2) There is a reasonably strong female bias: the prevalence rates are female = 0.611%; male = 0.255%; so there are 2.4-fold more females than males with ME/CFS in the UK Biobank cohort.

(3) Extrapolating these numbers to the UK as a whole, here are the full population prevalence predictions (using 2016 estimates for UK census populations).

Female Male Total
ENGLAND 171,630 69,339 240,969
SCOTLAND 16,784 6,781 23,565
WALES 9,668 3,906 13,574
N IRELAND 5,783 2,336 8,119
UK (total) 203,865 82,362 286,227

There is one caveat that should be mentioned with respect to these numbers. This is that the 500,000 people assessed in the Biobank, despite being recruited for assessment at 22 centres in Scotland, Wales and England, are not fully representative of the general population. There appears to be a “healthy volunteer” selection bias which would imply that the prevalence estimates are lower-bound values. Furthermore, if ME/CFS prevalence is different in other groups then this is not accounted for in the numbers above.

(4) ME/CFS has a biological component because the heritability of ME/CFS is not zero. Canela-Xandri et al. estimate that the genetic heritability (liability scale) is 0.080. This is slightly lower than the median heritability of heritable binary traits (0.11; see Figure 1). So among all such things measured, it’s in the lower half of the heritability, but not zero. Note that this doesn’t rule out non-heritable biological causes.

(5) The analysis identifies one, and only one, DNA position whose genetic variation associates with (in part) ME/CFS susceptibility. (The plot below is called a Manhattan plot and any point above the dashed line is predicted to be a significant “hit”. Each dot represents a position (X axis) along a chromosome – shown alternatively in red and blue – and its position on the Y-axis indicates the statistical significance of the association: the higher the better.)

Manhattan plot biobank GWAS
Statistical significance for the association between each DNA position and ME/CFS across 22 chromosomes. The arrow highlights the one “significant hit”.

This proposed “significant hit” is on chromosome 10 (position 74828696; rs150954845). The calculated p-value is 2.5×10-12. This DNA change (A-to-T) is predicted to alter a protein called P4HA1, changing an aspartic acid (“D”; GAT) for a valine (“V”; GTT) at its 124th amino acid position.  P4HA1 is prolyl 4-hydroxylase subunit alpha 1: in other words, one part of prolyl 4-hydroxylase, a key enzyme in collagen synthesis. We know what this molecule looks like and where the aspartic acid (D124) occurs within it (below; courtesy of Luis Sanchez-Pulido).


We can even see at a resolution of 10-10 of a metre what effect such a change would have on the protein (below; courtesy of Luis Sanchez-Pulido).



So, should we believe that this amino acid change alters someone’s risk of developing ME? For five reasons we need to be cautious:

(a) ME is a complex condition, likely to be caused by many DNA changes each of small effect acting together with the environment, so the fact that only one association was found indicates that the study is under-powered. This means that it doesn’t have the number of patients sufficient to provide the statistical power needed to detect the major DNA changes associated with the illness: more individuals means greater statistical power.

(b) Second, this part of the protein is not conserved across evolution. There is even a nematode worm known that has a valine at exactly the position (124) that would be predicted to alter risk for ME in humans. This isn’t conclusive, but an amino acid change at a position that is shared across different species would have given us greater confidence in the prediction.

(c) Third, very few people have this amino acid change. Only 0.01% of the population have this alteration, and at such low levels it is difficult to calculate levels of significance accurately particularly when the numbers of people self-reporting with ME (here, n=1,829) are so much lower than the entire cohort (500,000).

(d) Fourth, this association was not reported to be significant in a separate study.

(e) The study relies of self-report of receiving diagnosis of chronic fatigue syndrome, so these cases have not been diagnosed by researchers as meeting any particular definition of ME/CFS.


  1. If the UK Biobank prevalence of ME/CFS is repeated across different populations, then 34 million people worldwide will have this disorder, 2.4-fold more women than men.
  1. ME/CFS has a biological component, as shown by its non-zero heritability in UK Biobank.
  1. To obtain robust indications of which genes and which biological pathways are altered in which cells or tissues in people living with ME/CFS, then a much larger study is required. A GWAS with ten- or twenty-thousand cases, is likely to be necessary. Results will then need to be replicated in a separate cohort.

Chris Ponting, Luis Sanchez-Pulido, Katie Nicoll-Baines, Thibaud Boutin and Shona Kerr.

With thanks to Cathie Sudlow, Veronique Vitart, Oriol Canela-Xandri and Albert Tenesa for helpful comments.

MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XU, UK

7 thoughts on “Analysis of data from 500,000 individuals in UK Biobank demonstrates an inherited component to ME/CFS

  1. Susan

    The sequence analysis from the UK bio bank was the protein coding region only, is this correct? Since ME is acquired and is not evident until, in most cases, adulthood, is it surprising that the results were not robust?


    1. These kind of studies aren’t necessary to pick up diseases that are simply caused by a malfunctioning gene/protein. Rather, this kind of work aims to pick up risk factors, genes that play a role if other factors are in play, such as a particular infection. It doesn’t mean that every patient would have that gene (in fact most wouldn’t), it would be a clue as to what might be going wrong. But it’s only quite a small study and a much bigger study would be needed to find more genes that can play a role.


  2. Many thanks for this blog, Simon and Chris. Very interesting and informative.

    Some questions:

    1) I’ve just had a quick look at the UK Biobank website but I couldn’t see how patients were recruited. As many severe ME patients are bedridden without access to medical services, is it likely that people with severe ME are underrepresented in the biobank samples?

    2) I don’t understand the second of five reasons to be cautious (b). It’s not conserved across species but there is a nematode with it. Can someone try to explain this to me?

    3) Is the estimate that a GWAS with 10-20 thousand cases would be necessary to obtain robust indications based on the assumption the ME/CFS is a single disease? If ME/CFS included x number of different diseases, would it be necessary to include 10-20x cases in order to obtain robust indications?


  3. rs150954845 is a specific variant. Has anyone looked at this data grouping by gene rather than each separate variant? Are there genes which pop out as having statistically more protein impacting variants (even if not exactly the same variant)?

    Liked by 1 person

  4. deboruth

    I wish someone could arrange a review of the various ME/”cfs” biobanks scattered about the known world. There are serious questions to be asked about criteria used for inclusion. I fear that none are going to comprise proper ICC ME. Serious dilution threatens from the various definitions of “cfs,” keeping in mind that the CDC-propelled Fukuda can have more than 160 iterations, as Lenny Jason has calculated. This is because one chooses four symptoms from among a list of eight. Also, keep in mind that in both Holmes and Fukuda the original empirical observations about patients were edited and censored by Dr. Stephen Straus from NIAID. Equally, one has to ask whether alleged cases were defined by Oxford, which requires solely six months of “fatigue” of no particular description and therefore will yield nearly 50% depressed persons without (other) biological cause.(The cohorts in PACE reportedly included 47% depressed persons.)


  5. Pingback: UK Biobank data demonstrates an inherited component to ME/CFS | WAMES (Working for ME in Wales)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s