Retracted: Microbiome and Cancer

 “Microbiome analyses of blood and tissues suggest cancer diagnostic approach” is a paper published by first authors Gregory D. Poore and Evguenia Kopylova under Principal Investigator Rob Knight, from UCSD’s Bioengineering Department. This paper was published in March of 2020 in Nature. 

In this paper, Poore et al. design a study to determine the existence of a “cancer microbiome” signature and aim to assess if it can be used to predict cancer versus healthy samples, distinguish between cancer types, and predict cancer stage. To do this, they used the publicly available data from the Cancer Genome Atlas (TCGA), which is an NIH project that is aimed at correlating gene changes that are associated with specific cancers. The TCGA contains whole-genome and whole-transcriptome sequences from 33 different cancer types, encompassing more than 10,000 patients, and includes samples of various types (tissue vs. whole blood). They filtered the TCGA data for non-human reads, which they mapped to microbial genomes and created genus-level taxonomy profiles for each sample. Then, they normalized this data and fed it into a machine learning (ML) model and showed that it is able to identify “microbial communities” that are “unique to each cancer type” using tissue samples as well as blood plasma samples. They compared the TCGA cancer samples to healthy control samples obtained from plasma samples through the UCSD medical center, and showed that their ML model was “strong for discriminating (i) one cancer type versus all others (n = 32 types of cancer) and (ii) tumour versus normal (n = 15 types of cancer)”. 

After the paper’s publication, it was strongly criticized by Abraham Gihawi and Steven Salzberg for the following principal reasons: 

1. The decontamination techniques mentioned in the paper were not applied to the critical ML models of one cancer vs. all others and tumor vs. normal. When used, the model 

was only able to show statistically significant improvement in prediction in 5 of the 33 cancer types. While this is still an important finding, it was not adequately represented in the original paper. 

2. Human reads were improperly filtered out and counted as microbial due to incorrect data preprocessing. This led to a gross overestimation of the total number of microbial reads in the samples. In Gihawi and Salzberg’s reanalysis of the data, they found that 98% of the reads mapped by Poore et al. to be microbial were, in fact, human sequences. 

3. Improper normalization of the read counts led to the artificial inflation of taxonomic profiles and the creation of an artificial signature for each cancer that was not present in the raw data, thereby confounding model accuracy. 

4. The top features used by the ML model to make predictions were not relevant to human disease, as they were microbes found in the deep sea or plants, and have never been identified in humans. 

5. Lastly, Gihawi and Salzberg make the point that this paper by Poore et al. led to the publication of a dozen other papers using the same flawed dataset, which have found other associations between various cancers and microbes that “are likely to be invalid”. 

These criticisms ultimately led to Nature retracting the paper in June of 2024, citing “concerns about the robustness of specific microbial signatures reported as associated with cancer”. 

Annotated Paper 

Review 

Related Articles