Pathoscope: Species identification and strain attribution with unassembled sequencing data

Owen E. Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L. Clement, Eduardo Castro-Nallar, Quinn Snell, G. Bruce Schaalje, Mark J. Clement, Keith A. Crandall, W. Evan Johnson

Resultado de la investigación: Article

63 Citas (Scopus)

Resumen

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly - which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico ''environmental'' samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.

Idioma originalEnglish
Páginas (desde-hasta)1721-1729
Número de páginas9
PublicaciónGenome Research
Volumen23
N.º10
DOI
EstadoPublished - oct 2013

Huella dactilar

Genome
Databases
Biosurveillance
Biological Factors
Computer Simulation
Technology
Health

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Citar esto

Francis, O. E., Bendall, M., Manimaran, S., Hong, C., Clement, N. L., Castro-Nallar, E., ... Johnson, W. E. (2013). Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Research, 23(10), 1721-1729. https://doi.org/10.1101/gr.150151.112
Francis, Owen E. ; Bendall, Matthew ; Manimaran, Solaiappan ; Hong, Changjin ; Clement, Nathan L. ; Castro-Nallar, Eduardo ; Snell, Quinn ; Schaalje, G. Bruce ; Clement, Mark J. ; Crandall, Keith A. ; Johnson, W. Evan. / Pathoscope : Species identification and strain attribution with unassembled sequencing data. En: Genome Research. 2013 ; Vol. 23, N.º 10. pp. 1721-1729.
@article{e9e5b1a13f2140b78abae9c9cc583a02,
title = "Pathoscope: Species identification and strain attribution with unassembled sequencing data",
abstract = "Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly - which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico ''environmental'' samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.",
author = "Francis, {Owen E.} and Matthew Bendall and Solaiappan Manimaran and Changjin Hong and Clement, {Nathan L.} and Eduardo Castro-Nallar and Quinn Snell and Schaalje, {G. Bruce} and Clement, {Mark J.} and Crandall, {Keith A.} and Johnson, {W. Evan}",
year = "2013",
month = "10",
doi = "10.1101/gr.150151.112",
language = "English",
volume = "23",
pages = "1721--1729",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "10",

}

Francis, OE, Bendall, M, Manimaran, S, Hong, C, Clement, NL, Castro-Nallar, E, Snell, Q, Schaalje, GB, Clement, MJ, Crandall, KA & Johnson, WE 2013, 'Pathoscope: Species identification and strain attribution with unassembled sequencing data', Genome Research, vol. 23, n.º 10, pp. 1721-1729. https://doi.org/10.1101/gr.150151.112

Pathoscope : Species identification and strain attribution with unassembled sequencing data. / Francis, Owen E.; Bendall, Matthew; Manimaran, Solaiappan; Hong, Changjin; Clement, Nathan L.; Castro-Nallar, Eduardo; Snell, Quinn; Schaalje, G. Bruce; Clement, Mark J.; Crandall, Keith A.; Johnson, W. Evan.

En: Genome Research, Vol. 23, N.º 10, 10.2013, p. 1721-1729.

Resultado de la investigación: Article

TY - JOUR

T1 - Pathoscope

T2 - Species identification and strain attribution with unassembled sequencing data

AU - Francis, Owen E.

AU - Bendall, Matthew

AU - Manimaran, Solaiappan

AU - Hong, Changjin

AU - Clement, Nathan L.

AU - Castro-Nallar, Eduardo

AU - Snell, Quinn

AU - Schaalje, G. Bruce

AU - Clement, Mark J.

AU - Crandall, Keith A.

AU - Johnson, W. Evan

PY - 2013/10

Y1 - 2013/10

N2 - Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly - which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico ''environmental'' samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.

AB - Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly - which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico ''environmental'' samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.

UR - http://www.scopus.com/inward/record.url?scp=84885070139&partnerID=8YFLogxK

U2 - 10.1101/gr.150151.112

DO - 10.1101/gr.150151.112

M3 - Article

C2 - 23843222

AN - SCOPUS:84885070139

VL - 23

SP - 1721

EP - 1729

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 10

ER -