Improved ontology for eukaryotic single-exon coding sequences in biological databases

Roddy Jorquera, Carolina González, Philip Clausen, Bent Petersen, David S. Holmes

Resultado de la investigación: Article

Resumen

Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

Idioma originalEnglish
PublicaciónDatabase
Volumen2018
N.º2018
DOI
EstadoPublished - 1 ene 2018

Huella dactilar

exons
Ontology
Exons
Genes
Databases
genes
Introns
Untranslated Regions
introns
Vocabulary
RNA
RNA Transport
Molecular Sequence Annotation
Messenger RNA
neoplasms
alternative splicing
Firearms
Alternative Splicing
Transcription
Ports and harbors

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Citar esto

Jorquera, R., González, C., Clausen, P., Petersen, B., & Holmes, D. S. (2018). Improved ontology for eukaryotic single-exon coding sequences in biological databases. Database, 2018(2018). https://doi.org/10.1093/database/bay089
Jorquera, Roddy ; González, Carolina ; Clausen, Philip ; Petersen, Bent ; Holmes, David S. / Improved ontology for eukaryotic single-exon coding sequences in biological databases. En: Database. 2018 ; Vol. 2018, N.º 2018.
@article{a10a017f93864534ae290bb534b1dd8a,
title = "Improved ontology for eukaryotic single-exon coding sequences in biological databases",
abstract = "Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.",
author = "Roddy Jorquera and Carolina Gonz{\'a}lez and Philip Clausen and Bent Petersen and Holmes, {David S.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/database/bay089",
language = "English",
volume = "2018",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",
number = "2018",

}

Jorquera, R, González, C, Clausen, P, Petersen, B & Holmes, DS 2018, 'Improved ontology for eukaryotic single-exon coding sequences in biological databases', Database, vol. 2018, n.º 2018. https://doi.org/10.1093/database/bay089

Improved ontology for eukaryotic single-exon coding sequences in biological databases. / Jorquera, Roddy; González, Carolina; Clausen, Philip; Petersen, Bent; Holmes, David S.

En: Database, Vol. 2018, N.º 2018, 01.01.2018.

Resultado de la investigación: Article

TY - JOUR

T1 - Improved ontology for eukaryotic single-exon coding sequences in biological databases

AU - Jorquera, Roddy

AU - González, Carolina

AU - Clausen, Philip

AU - Petersen, Bent

AU - Holmes, David S.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

AB - Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

UR - http://www.scopus.com/inward/record.url?scp=85057214810&partnerID=8YFLogxK

U2 - 10.1093/database/bay089

DO - 10.1093/database/bay089

M3 - Article

C2 - 30239665

AN - SCOPUS:85057214810

VL - 2018

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

IS - 2018

ER -