Analysis of data from 500,000 individuals in UK Biobank demonstrates an inherited component to ME/CFS

Analysis of data from 500,000 individuals in UK Biobank demonstrates an inherited component to ME/CFS

Print Friendly, PDF & Email

Guest blog by Professor Chris Ponting and colleagues.

UK Biobank – a national biobank different from the ME/CFS biobank – has data from around 500,000 individuals, including both healthy people and those with one or more of the many different diseases in the UK population. About 2,000 people in the sample reported that they had been given a diagnosis of CFS.

Analysis of data from this biobank indicates an inherited biological component for ME/CFS. The results show only one statistically significant change in a particular section of DNA and even this is problematic. This analysis indicates that a much bigger study, with many more ME/CFS cases, will be needed to indicate which genes and biological pathways are altered in people with ME/CFS.


Chris Ponting

Myalgic encephalomyelitis (ME, also described as chronic fatigue syndrome, CFS) is a devastating long-term condition affecting 250,000 UK individuals. People with ME experience severe, disabling fatigue associated with post-exertional malaise. A few make good progress and may recover, while most others remain ill for years and may never recover. There is no known cause, or effective treatment for most. Consequently, it is vital to try new approaches to understand the reasons for the development of the condition.

This blog sets out what we can glean from the release, last summer, of data from about 500,000 individuals who make up the UK Biobank. (This biobank is not to be confused with the UK ME/CFS Biobank, UKMEB.) The data were acquired from individuals between 40 and 69 years of age in 2006-2010 who live across the UK. These people provided samples (e.g. blood, urine and saliva) and answered questionnaires. In addition, for some of these people their electronic health record data are being linked in. Importantly for this blog, the DNA variation (‘genotype’) of all the volunteer participants has been determined.

Genetic variation can provide insights into the causes of disease when these have a heritable component (i.e. are inherited down through the generations). DNA sequence is not altered by disease (except in cancer) and so variants can reveal the causes, rather than consequences, of disease.


Here we draw heavily from an analysis of the UK Biobank data by Oriol Canela-Xandri, Konrad Rawlik and Albert Tenesa which is described in a preprint available from bioRxiv. (The authors have kindly shared their results in this way in order to share results with others before the findings have been peer reviewed.)

From this (specifically, Supplemental Table 1) we see that data were analysed from 1,829 people among the UK Biobank cohort who self-reported as having been diagnosed with ME/CFS. The table also provides five pieces of information:

(1) The prevalence of ME/CFS among UK Biobank individuals was 0.448%. In other words, picking any person randomly in the UK then there is an even chance that they know someone with ME/CFS if they know about 200 people.

(2) There is a reasonably strong female bias: the prevalence rates are female = 0.611%; male = 0.255%; so there are 2.4-fold more females than males with ME/CFS in the UK Biobank cohort.

(3) Extrapolating these numbers to the UK as a whole, here are the full population prevalence predictions (using 2016 estimates for UK census populations).

Female Male Total
ENGLAND 171,630 69,339 240,969
SCOTLAND 16,784 6,781 23,565
WALES 9,668 3,906 13,574
N IRELAND 5,783 2,336 8,119
UK (total) 203,865 82,362 286,227

There is one caveat that should be mentioned with respect to these numbers. This is that the 500,000 people assessed in the Biobank, despite being recruited for assessment at 22 centres in Scotland, Wales and England, are not fully representative of the general population. There appears to be a “healthy volunteer” selection bias which would imply that the prevalence estimates are lower-bound values. Furthermore, if ME/CFS prevalence is different in other groups then this is not accounted for in the numbers above.

(4) ME/CFS has a biological component because the heritability of ME/CFS is not zero. Canela-Xandri et al. estimate that the genetic heritability (liability scale) is 0.080. This is slightly lower than the median heritability of heritable binary traits (0.11; see Figure 1). So among all such things measured, it’s in the lower half of the heritability, but not zero. Note that this doesn’t rule out non-heritable biological causes.

(5) The analysis identifies one, and only one, DNA position whose genetic variation associates with (in part) ME/CFS susceptibility. (The plot below is called a Manhattan plot and any point above the dashed line is predicted to be a significant “hit”. Each dot represents a position (X axis) along a chromosome – shown alternatively in red and blue – and its position on the Y-axis indicates the statistical significance of the association: the higher the better.)

Manhattan plot biobank GWAS
Statistical significance for the association between each DNA position and ME/CFS across 22 chromosomes. The arrow highlights the one “significant hit”.

This proposed “significant hit” is on chromosome 10 (position 74828696; rs150954845). The calculated p-value is 2.5×10-12. This DNA change (A-to-T) is predicted to alter a protein called P4HA1, changing an aspartic acid (“D”; GAT) for a valine (“V”; GTT) at its 124th amino acid position.  P4HA1 is prolyl 4-hydroxylase subunit alpha 1: in other words, one part of prolyl 4-hydroxylase, a key enzyme in collagen synthesis. We know what this molecule looks like and where the aspartic acid (D124) occurs within it (below; courtesy of Luis Sanchez-Pulido).


We can even see at a resolution of 10-10 of a metre what effect such a change would have on the protein (below; courtesy of Luis Sanchez-Pulido).


So, should we believe that this amino acid change alters someone’s risk of developing ME? For five reasons we need to be cautious:

(a) ME is a complex condition, likely to be caused by many DNA changes each of small effect acting together with the environment, so the fact that only one association was found indicates that the study is under-powered. This means that it doesn’t have the number of patients sufficient to provide the statistical power needed to detect the major DNA changes associated with the illness: more individuals means greater statistical power.

(b) Second, this part of the protein is not conserved across evolution. There is even a nematode worm known that has a valine at exactly the position (124) that would be predicted to alter risk for ME in humans. This isn’t conclusive, but an amino acid change at a position that is shared across different species would have given us greater confidence in the prediction.

(c) Third, very few people have this amino acid change. Only 0.01% of the population have this alteration, and at such low levels it is difficult to calculate levels of significance accurately particularly when the numbers of people self-reporting with ME (here, n=1,829) are so much lower than the entire cohort (500,000).

(d) Fourth, this association was not reported to be significant in a separate study.

(e) The study relies of self-report of receiving diagnosis of chronic fatigue syndrome, so these cases have not been diagnosed by researchers as meeting any particular definition of ME/CFS.


  1. If the UK Biobank prevalence of ME/CFS is repeated across different populations, then 34 million people worldwide will have this disorder, 2.4-fold more women than men.
  1. ME/CFS has a biological component, as shown by its non-zero heritability in UK Biobank.
  1. To obtain robust indications of which genes and which biological pathways are altered in which cells or tissues in people living with ME/CFS, then a much larger study is required. A GWAS with ten- or twenty-thousand cases, is likely to be necessary. Results will then need to be replicated in a separate cohort.

Chris Ponting, Luis Sanchez-Pulido, Katie Nicoll-Baines, Thibaud Boutin and Shona Kerr.

With thanks to Cathie Sudlow, Veronique Vitart, Oriol Canela-Xandri and Albert Tenesa for helpful comments.

MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XU, UK

12 thoughts on “Analysis of data from 500,000 individuals in UK Biobank demonstrates an inherited component to ME/CFS

  1. The sequence analysis from the UK bio bank was the protein coding region only, is this correct? Since ME is acquired and is not evident until, in most cases, adulthood, is it surprising that the results were not robust?

    1. These kind of studies aren’t necessary to pick up diseases that are simply caused by a malfunctioning gene/protein. Rather, this kind of work aims to pick up risk factors, genes that play a role if other factors are in play, such as a particular infection. It doesn’t mean that every patient would have that gene (in fact most wouldn’t), it would be a clue as to what might be going wrong. But it’s only quite a small study and a much bigger study would be needed to find more genes that can play a role.

  2. Many thanks for this blog, Simon and Chris. Very interesting and informative.

    Some questions:

    1) I’ve just had a quick look at the UK Biobank website but I couldn’t see how patients were recruited. As many severe ME patients are bedridden without access to medical services, is it likely that people with severe ME are underrepresented in the biobank samples?

    2) I don’t understand the second of five reasons to be cautious (b). It’s not conserved across species but there is a nematode with it. Can someone try to explain this to me?

    3) Is the estimate that a GWAS with 10-20 thousand cases would be necessary to obtain robust indications based on the assumption the ME/CFS is a single disease? If ME/CFS included x number of different diseases, would it be necessary to include 10-20x cases in order to obtain robust indications?

  3. This nematode… Is it Caenorhabditis elegans? Or any other that goes into dauer phase under metabolic stress?

  4. rs150954845 is a specific variant. Has anyone looked at this data grouping by gene rather than each separate variant? Are there genes which pop out as having statistically more protein impacting variants (even if not exactly the same variant)?

  5. I wish someone could arrange a review of the various ME/”cfs” biobanks scattered about the known world. There are serious questions to be asked about criteria used for inclusion. I fear that none are going to comprise proper ICC ME. Serious dilution threatens from the various definitions of “cfs,” keeping in mind that the CDC-propelled Fukuda can have more than 160 iterations, as Lenny Jason has calculated. This is because one chooses four symptoms from among a list of eight. Also, keep in mind that in both Holmes and Fukuda the original empirical observations about patients were edited and censored by Dr. Stephen Straus from NIAID. Equally, one has to ask whether alleged cases were defined by Oxford, which requires solely six months of “fatigue” of no particular description and therefore will yield nearly 50% depressed persons without (other) biological cause.(The cohorts in PACE reportedly included 47% depressed persons.)

  6. Very interesting research! Also I noted that the substitution is of a strongly hydrophilic amino acid for a hydrophobic one.

    Interestingly, in the most common Sickle Cell anemia it is a substitution to a Valine leading to hemoglobin clumping and to the “sickle cells” that is the root cause.

    The one thing is that it should be confirmed using a strict definition of ME such as the International Consensus or, at least, the Canadian, as the prevalence of .44% is substantially higher than the prevalence found using the Canadian Definition.

    Of course that is expected with self-report but having it confirmed with a strong definition would be a huge step towards validating what could be an important biological clue.

    One thing people might want to know is collagen is the scaffolding that holds together our internal organs and blood vessels among other things. Usually you here it on beauty shows for its role in gluing down the skin. If we are producing less collagen or have some defective type, perhaps eliciting an immune reaction, it would make sense that would cause a multisystem disease.

    Rank speculation, but it could be a severe infection, or inflammatory process generally, induces autoimmunity against collagen or protein manufacturing it, leading to reduced types of some collagens or impaired manufacture of some types, leading directly or indirectly to this disease.


  7. Hi Simon, thank you for this blog post!

    I am looking at the .xlsx file containing the supplementary tables. I have found the prevalence of ME/CFS in Table S1 and I guess that the association between rs150954845 and ME/CFS comes from Table S6.

    ME/CFS is defined by phenotype “selfReported_n_1545” (from Table S1), so this variant is present in only one ME/CFS patient out of 1829 (line 7991 of Table S6) and in one patient with suppurative and unspecified otitis media (phenotype “clinical_c_H66”, line 72572 of Table S6). Is it correct?

    1. Reply from Chris Ponting:

      Paulo’s points are correct. The issue with respect to the single case individual is exactly why we said that caution is necessary (in particular (c) of the five cautions/caveats). So, yes, even if this were to be a causal variant, then it would be for this one case individual only.

Comments are closed.

Comments are closed.