Welcome to the IKCEST
Deleted coronavirus genome sequences trigger scientific intrigue
A medical worker collects a throat swab from a teacher, Wuhan

SARS-2-CoV testing in Wuhan, China, where the first cases of COVID-19 were reported.Credit: Zhao Jun/VCG/Getty

Efforts to study the early stages of the coronavirus pandemic have received help from a surprising source. A biologist in the United States has ‘excavated’ partial SARS-CoV-2 genome sequences from the beginnings of the pandemic’s likely epicentre in Wuhan, China, that were deposited — but later removed — from a US government database.

The partial genome sequences address an evolutionary conundrum about the early genetic diversity of the coronavirus SARS-CoV-2, although scientists emphasize that they do not shed light on its origins. Nor is it fully clear why researchers at Wuhan University asked for the sequences to be removed from the Sequence Read Archive (SRA), a repository for raw sequencing data maintained by the National Center for Biotechnology Information (NCBI), part of the US National Institutes of Health (NIH).

“These sequences are informative, they’re not transformative,” says Jesse Bloom, a viral evolutionary geneticist at the Fred Hutchinson Cancer Research Center in Seattle, Washington, who describes in a 22 June preprint how he recovered the sequences1.

Bloom discovered the sequences after searching for genomic data from the pandemic’s early stages. A research paper from May 2020 contained a table of publicly available sequence data, which included entries Bloom had not come across2. The sequences were associated with a paper, which applied a technology known as nanopore sequencing to detecting SARS-CoV-2 genetic material in samples from people. That study was published in the journal Small in June 20203, having been posted on bioRxiv in March of that year4.

When Bloom looked for the sequences in the SRA using the details listed in the May 2020 paper, the database returned no entries. The SRA keeps sequences in cloud storage maintained by Google, and Bloom wondered whether he could find archived versions of the sequences on cloud servers. This approach worked, and Bloom was able to recover data from 50 samples, 13 of which contained enough raw data to generate partial genome sequences.

Evolutionary mystery

The sequences help to solve an evolutionary mystery about the early stages of the pandemic, says Bloom. The earliest viral sequences from Wuhan are from individuals linked to the city’s Huanan Seafood Market in December 2019, which was initially thought to be where the coronavirus first jumped from animals to people. But the seafood market sequences are more distantly related to SARS-CoV-2’s closest relatives in bats — the most likely ultimate origin of the virus — than are later sequences, including one collected in the United States.

That was surprising, says Bloom, because you would expect that viruses from the early stages of Wuhan’s epidemic would be most closely related to the bat-infecting relatives of SARS-CoV-2. The recovered sequences, which were probably collected in January and February 2020, show this to be the case — they are more closely related to the bat viruses than are the sequences from people linked to the seafood market.

This adds to a growing body of evidence, including reports of probable cases dating back to November 2019, that the first human cases of COVID-19 were not associated with the Huanan Seafood Market, say Bloom and other scientists.

“To me it seemed like Wuhan market was one of the first super-spreading events,” says Sudhir Kumar, an evolutionary geneticist at Temple University in Philadelphia, Pennsylvania. The sequences that Bloom unearthed, he adds, suggest that SARS-CoV-2 developed extensive diversity in the early stages of the pandemic in China — including in Wuhan.

Stephen Goldstein, a virologist at the University of Utah in Salt Lake City, points out that the sequences Bloom recovered were not hidden: they are described in detail, with enough sequence information to know their evolutionary relationship to other early SARS-CoV-2 sequences, in the Small paper. “I don't think this preprint tells us a whole lot new, but it does bring to the forefront sequence data that has been publicly available, though under the radar,” Goldstein says.

Bloom says that although the sequences were published, their removal from the SRA meant that few scientists knew about them. A report commissioned by the World Health Organization on the pandemic’s origins did not include the sequences in an evolutionary analysis of early SARS-CoV-2 data. “Nobody noticed they existed,” Bloom says.

The corresponding authors of the Small paper did not respond to questions from Nature’s news team about why they asked for the sequences to be removed from the SRA, which happened before the paper was published. In a statement, the NIH said it removed the data at the request of the researchers, who said they planned to submit them to another database.

Bloom — who co-authored a letter calling for a renewed investigation into the origins of the pandemic, including the possibility that the virus escaped or leaked from a lab5 — says his study sheds no light on the origins of the pandemic, nor on why the sequences were removed. But he hopes his efforts will encourage researchers to “think outside the box” and look to other sources, such as archival data, to glean more information from the early days of the pandemic. “There are probably more data out there,” he says.

Original Text (This is the original text for your reference.)

A medical worker collects a throat swab from a teacher, Wuhan

SARS-2-CoV testing in Wuhan, China, where the first cases of COVID-19 were reported.Credit: Zhao Jun/VCG/Getty

Efforts to study the early stages of the coronavirus pandemic have received help from a surprising source. A biologist in the United States has ‘excavated’ partial SARS-CoV-2 genome sequences from the beginnings of the pandemic’s likely epicentre in Wuhan, China, that were deposited — but later removed — from a US government database.

The partial genome sequences address an evolutionary conundrum about the early genetic diversity of the coronavirus SARS-CoV-2, although scientists emphasize that they do not shed light on its origins. Nor is it fully clear why researchers at Wuhan University asked for the sequences to be removed from the Sequence Read Archive (SRA), a repository for raw sequencing data maintained by the National Center for Biotechnology Information (NCBI), part of the US National Institutes of Health (NIH).

“These sequences are informative, they’re not transformative,” says Jesse Bloom, a viral evolutionary geneticist at the Fred Hutchinson Cancer Research Center in Seattle, Washington, who describes in a 22 June preprint how he recovered the sequences1.

Bloom discovered the sequences after searching for genomic data from the pandemic’s early stages. A research paper from May 2020 contained a table of publicly available sequence data, which included entries Bloom had not come across2. The sequences were associated with a paper, which applied a technology known as nanopore sequencing to detecting SARS-CoV-2 genetic material in samples from people. That study was published in the journal Small in June 20203, having been posted on bioRxiv in March of that year4.

When Bloom looked for the sequences in the SRA using the details listed in the May 2020 paper, the database returned no entries. The SRA keeps sequences in cloud storage maintained by Google, and Bloom wondered whether he could find archived versions of the sequences on cloud servers. This approach worked, and Bloom was able to recover data from 50 samples, 13 of which contained enough raw data to generate partial genome sequences.

Evolutionary mystery

The sequences help to solve an evolutionary mystery about the early stages of the pandemic, says Bloom. The earliest viral sequences from Wuhan are from individuals linked to the city’s Huanan Seafood Market in December 2019, which was initially thought to be where the coronavirus first jumped from animals to people. But the seafood market sequences are more distantly related to SARS-CoV-2’s closest relatives in bats — the most likely ultimate origin of the virus — than are later sequences, including one collected in the United States.

That was surprising, says Bloom, because you would expect that viruses from the early stages of Wuhan’s epidemic would be most closely related to the bat-infecting relatives of SARS-CoV-2. The recovered sequences, which were probably collected in January and February 2020, show this to be the case — they are more closely related to the bat viruses than are the sequences from people linked to the seafood market.

This adds to a growing body of evidence, including reports of probable cases dating back to November 2019, that the first human cases of COVID-19 were not associated with the Huanan Seafood Market, say Bloom and other scientists.

“To me it seemed like Wuhan market was one of the first super-spreading events,” says Sudhir Kumar, an evolutionary geneticist at Temple University in Philadelphia, Pennsylvania. The sequences that Bloom unearthed, he adds, suggest that SARS-CoV-2 developed extensive diversity in the early stages of the pandemic in China — including in Wuhan.

Stephen Goldstein, a virologist at the University of Utah in Salt Lake City, points out that the sequences Bloom recovered were not hidden: they are described in detail, with enough sequence information to know their evolutionary relationship to other early SARS-CoV-2 sequences, in the Small paper. “I don't think this preprint tells us a whole lot new, but it does bring to the forefront sequence data that has been publicly available, though under the radar,” Goldstein says.

Bloom says that although the sequences were published, their removal from the SRA meant that few scientists knew about them. A report commissioned by the World Health Organization on the pandemic’s origins did not include the sequences in an evolutionary analysis of early SARS-CoV-2 data. “Nobody noticed they existed,” Bloom says.

The corresponding authors of the Small paper did not respond to questions from Nature’s news team about why they asked for the sequences to be removed from the SRA, which happened before the paper was published. In a statement, the NIH said it removed the data at the request of the researchers, who said they planned to submit them to another database.

Bloom — who co-authored a letter calling for a renewed investigation into the origins of the pandemic, including the possibility that the virus escaped or leaked from a lab5 — says his study sheds no light on the origins of the pandemic, nor on why the sequences were removed. But he hopes his efforts will encourage researchers to “think outside the box” and look to other sources, such as archival data, to glean more information from the early days of the pandemic. “There are probably more data out there,” he says.

Comments

    Something to say?

    Log in or Sign up for free

    Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
    Translate engine
    Article's language
    English
    中文
    Pусск
    Français
    Español
    العربية
    Português
    Kikongo
    Dutch
    kiswahili
    هَوُسَ
    IsiZulu
    Action
    Related

    Report

    Select your report category*



    Reason*



    By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

    Submit
    Cancel