Large-scale, multi-genome analysis of alternate open reading frames in bacteria and archaea

Felipe Veloso, Gonzalo Riadi, Daniela Aliaga, Ryan Lieph, David S. Holmes

Resultado de la investigación: Review article

14 Citas (Scopus)

Resumen

Analysis of over 300,000 annotated genes in 105 bacterial and archaeal genomes reveals an unexpectedly high frequency of large (>300 nucleotides) alternate open reading frames (ORFs). Especially notable is the very high frequency of alternate ORFs in frames +3 and -1 (where the annotated gene is defined as frame + 1). The occurrence of alternate ORFs is correlated with genomic G+C content and is strongly influenced by synonymous codon usage bias. The frequency of alternate ORFs in frame -1 is also influenced by the occurrence of codons encoding leucine and serine in frame +1. Although some alternate ORFs have been shown to encode proteins, many others are probably not expressed because they lack appropriate signals for transcription and translation. These latter can be mis-annotated by automatic gene finding programs leading to errors in public databases. Especially prone to mis-annotation is frame -1, because it exhibits a potential codon usage and theoretical capacity to encode proteins with an amino acid composition most similar to real genes. Some alternate ORFs are conserved across bacterial or archaeal species, and can give rise to mis-annotated "conserved hypothetical" genes, while others are unique to a genome and are misidentified as "hypothetical orphan" genes, contributing significantly to the orphan gene paradox.

Idioma originalEnglish
Páginas (desde-hasta)91-105
Número de páginas15
PublicaciónOMICS A Journal of Integrative Biology
Volumen9
N.º1
DOI
EstadoPublished - 2005

Huella dactilar

Archaea
Open Reading Frames
Bacteria
Genes
Genome
Codon
Archaeal Genome
Bacterial Genomes
Base Composition
Leucine
Serine
Transcription
Proteins
Nucleotides
Databases
Amino Acids

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Biochemistry
  • Molecular Biology
  • Molecular Medicine

Citar esto

Veloso, Felipe ; Riadi, Gonzalo ; Aliaga, Daniela ; Lieph, Ryan ; Holmes, David S. / Large-scale, multi-genome analysis of alternate open reading frames in bacteria and archaea. En: OMICS A Journal of Integrative Biology. 2005 ; Vol. 9, N.º 1. pp. 91-105.
@article{8c2312a89a154b019916830a81a81725,
title = "Large-scale, multi-genome analysis of alternate open reading frames in bacteria and archaea",
abstract = "Analysis of over 300,000 annotated genes in 105 bacterial and archaeal genomes reveals an unexpectedly high frequency of large (>300 nucleotides) alternate open reading frames (ORFs). Especially notable is the very high frequency of alternate ORFs in frames +3 and -1 (where the annotated gene is defined as frame + 1). The occurrence of alternate ORFs is correlated with genomic G+C content and is strongly influenced by synonymous codon usage bias. The frequency of alternate ORFs in frame -1 is also influenced by the occurrence of codons encoding leucine and serine in frame +1. Although some alternate ORFs have been shown to encode proteins, many others are probably not expressed because they lack appropriate signals for transcription and translation. These latter can be mis-annotated by automatic gene finding programs leading to errors in public databases. Especially prone to mis-annotation is frame -1, because it exhibits a potential codon usage and theoretical capacity to encode proteins with an amino acid composition most similar to real genes. Some alternate ORFs are conserved across bacterial or archaeal species, and can give rise to mis-annotated {"}conserved hypothetical{"} genes, while others are unique to a genome and are misidentified as {"}hypothetical orphan{"} genes, contributing significantly to the orphan gene paradox.",
author = "Felipe Veloso and Gonzalo Riadi and Daniela Aliaga and Ryan Lieph and Holmes, {David S.}",
year = "2005",
doi = "10.1089/omi.2005.9.91",
language = "English",
volume = "9",
pages = "91--105",
journal = "OMICS A Journal of Integrative Biology",
issn = "1536-2310",
publisher = "Mary Ann Liebert Inc.",
number = "1",

}

Large-scale, multi-genome analysis of alternate open reading frames in bacteria and archaea. / Veloso, Felipe; Riadi, Gonzalo; Aliaga, Daniela; Lieph, Ryan; Holmes, David S.

En: OMICS A Journal of Integrative Biology, Vol. 9, N.º 1, 2005, p. 91-105.

Resultado de la investigación: Review article

TY - JOUR

T1 - Large-scale, multi-genome analysis of alternate open reading frames in bacteria and archaea

AU - Veloso, Felipe

AU - Riadi, Gonzalo

AU - Aliaga, Daniela

AU - Lieph, Ryan

AU - Holmes, David S.

PY - 2005

Y1 - 2005

N2 - Analysis of over 300,000 annotated genes in 105 bacterial and archaeal genomes reveals an unexpectedly high frequency of large (>300 nucleotides) alternate open reading frames (ORFs). Especially notable is the very high frequency of alternate ORFs in frames +3 and -1 (where the annotated gene is defined as frame + 1). The occurrence of alternate ORFs is correlated with genomic G+C content and is strongly influenced by synonymous codon usage bias. The frequency of alternate ORFs in frame -1 is also influenced by the occurrence of codons encoding leucine and serine in frame +1. Although some alternate ORFs have been shown to encode proteins, many others are probably not expressed because they lack appropriate signals for transcription and translation. These latter can be mis-annotated by automatic gene finding programs leading to errors in public databases. Especially prone to mis-annotation is frame -1, because it exhibits a potential codon usage and theoretical capacity to encode proteins with an amino acid composition most similar to real genes. Some alternate ORFs are conserved across bacterial or archaeal species, and can give rise to mis-annotated "conserved hypothetical" genes, while others are unique to a genome and are misidentified as "hypothetical orphan" genes, contributing significantly to the orphan gene paradox.

AB - Analysis of over 300,000 annotated genes in 105 bacterial and archaeal genomes reveals an unexpectedly high frequency of large (>300 nucleotides) alternate open reading frames (ORFs). Especially notable is the very high frequency of alternate ORFs in frames +3 and -1 (where the annotated gene is defined as frame + 1). The occurrence of alternate ORFs is correlated with genomic G+C content and is strongly influenced by synonymous codon usage bias. The frequency of alternate ORFs in frame -1 is also influenced by the occurrence of codons encoding leucine and serine in frame +1. Although some alternate ORFs have been shown to encode proteins, many others are probably not expressed because they lack appropriate signals for transcription and translation. These latter can be mis-annotated by automatic gene finding programs leading to errors in public databases. Especially prone to mis-annotation is frame -1, because it exhibits a potential codon usage and theoretical capacity to encode proteins with an amino acid composition most similar to real genes. Some alternate ORFs are conserved across bacterial or archaeal species, and can give rise to mis-annotated "conserved hypothetical" genes, while others are unique to a genome and are misidentified as "hypothetical orphan" genes, contributing significantly to the orphan gene paradox.

UR - http://www.scopus.com/inward/record.url?scp=17444368305&partnerID=8YFLogxK

U2 - 10.1089/omi.2005.9.91

DO - 10.1089/omi.2005.9.91

M3 - Review article

C2 - 15805780

AN - SCOPUS:17444368305

VL - 9

SP - 91

EP - 105

JO - OMICS A Journal of Integrative Biology

JF - OMICS A Journal of Integrative Biology

SN - 1536-2310

IS - 1

ER -