"Endless diversity" in bacterial genomes?
It's always nice when there's a groundbreaking article in the literature, and the subject just happens to be your baby. My current research focuses on Streptococcus agalactiae (group B streptococcus, GBS), a bacterium that is the leading cause of neonatal meningitis in the United States. It also is a leading cause of invasive infection in the elderly, and can cause sepsis and toxic shock-like syndrome in healthy adults. No vaccine is currently available.
But what's garnered attention recently hasn't been any clinical presentations or new case reports of GBS disease; it's the bacterium's DNA. Specifically, the whole genomic sequences of 8 different strains of GBS, and the conclusions the authors have come to regarding bacterial genetic diversity--that it may be "endless."
(Continued below)
Many species of bacteria are DNA scavengers, receiving new pieces of DNA via horizontal gene transfer. Some take these up easily--they are said to be "naturally transformable." With others, it's more difficult for new DNA to get in. In many cases, bacteriophages may bring new genes in when they infect a bacterium. However they're introduced, this sharing of genes in bacteria is an extremely common phenomenon. Anyone who is familiar with the phenomenon of antibiotic resistance knows that one way this can spread among organisms is via horizontal transfer of these genes, often on little bits of DNA called plasmids.
All this transfer can sometimes muck up examinations of bacterial phylogeny (evolutionary history and relatedness of bacterial species). Carl Zimmer wrote an excellent piece on this over the summer on The Loom, so I'll refer you there for further discussion on that topic. But the work Zimmer references, and indeed, much of the work in this area, is done comparing DNA sequences between species, rather than within a single species. The authors of a recent PNAS paper took the latter approach regarding GBS.
In their paper, the authors sequenced 6 new isolates of GBS. They compared all of these, plus 2 additional isolates that had been sequenced previously. It was found that the 8 isolates shared a "core genome" of around 1800 genes. The cool part of their research, though, was when they looked at the unique genes in each isolate. The sum of all of the core plus all of the unique genes in the species has been termed the "pan-genome." With 8 isolates sequenced, the authors looked to see if they could estimate the size of this "pan-genome," in order to get an idea of how many isolates would have to be sequenced to capture the extent of the total genetic diversity in GBS. Using their model, they concluded that for every new GBS genome sequenced, they would find an average of 33 new strain-specific genes to add to the pan-genome (see figure below). Pretty fantastic and intimidating* at the same time.
This has wide implications in both microbiology and evolutionary biology. First, it again highlights the debate: "what, exactly, is a species?" Is sharing a certain percentage of their DNA enough? What's the cut-off? They provide the example of Bacillus anthracis (anthrax), which seems to have a very limited pan-genome. Is it a separate species, or is it merely a clonal complex within the Bacillus cereus species? This is only a portion of an ongoing debate that often seems to bring up more questions than answers.
Second, is an "infinite diversity" of genes even available? A (valid) criticism that has been made of this paper is that they're premature to extrapolate their numbers from only 8 isolates. And the authors acknowledge this. However, it's really not so surprising. Despite the recognition of bacteria as disease-causing agents well over 100 years ago, we know amazingly little about these organisms. The Institute for Genomic Research (TIGR) has made some huge leaps forward, sequencing genomes of over 1800 predicted species and finding 1.2 million new genes from the Sargasso Sea, and playing a role in the investigation into the diversity of human gut microflora. In that study, they determined that, of ~400 "phylotypes" (basically, a phylogenetic lineage), fully 80% of them were from species that hadn't even been cultured yet. A similar situation exists with our oral microflora (which I blogged previously, here). With this much bacterial diversity merely in these unexamined microbes within our bodies, who knows how much is out there in the environment?
Finally, in addition to all of the "wow, that's neat" biology nerd stuff, this current research has real implications for more practical areas, such as medicine. A vaccine made from a component of the "core" genome, for instance, would provide broader protection than one made from an element of the variable portion of the pan-genome. A similar situation holds for drug targets.
I expect to see many similar papers in the future, as microbial genome sequencing increases from a single isolate to multiple isolates. 'Tis a great time to be a biologist, indeed.
*Why intimidating? As I mentioned, I study GBS. More specifically, I study it from an epidemiological perspective: what is it that makes one strain nastier than another? There are already many parameters to take into account: differences in the host, differences in the microbial ecology, and my main focus, differences in the strain. Now they tell me each strain may have as many as 33 unique genes? Sheesh!
[Edited to add: see more discussion over at The Panda's Thumb.]
But what's garnered attention recently hasn't been any clinical presentations or new case reports of GBS disease; it's the bacterium's DNA. Specifically, the whole genomic sequences of 8 different strains of GBS, and the conclusions the authors have come to regarding bacterial genetic diversity--that it may be "endless."
(Continued below)
Many species of bacteria are DNA scavengers, receiving new pieces of DNA via horizontal gene transfer. Some take these up easily--they are said to be "naturally transformable." With others, it's more difficult for new DNA to get in. In many cases, bacteriophages may bring new genes in when they infect a bacterium. However they're introduced, this sharing of genes in bacteria is an extremely common phenomenon. Anyone who is familiar with the phenomenon of antibiotic resistance knows that one way this can spread among organisms is via horizontal transfer of these genes, often on little bits of DNA called plasmids.
All this transfer can sometimes muck up examinations of bacterial phylogeny (evolutionary history and relatedness of bacterial species). Carl Zimmer wrote an excellent piece on this over the summer on The Loom, so I'll refer you there for further discussion on that topic. But the work Zimmer references, and indeed, much of the work in this area, is done comparing DNA sequences between species, rather than within a single species. The authors of a recent PNAS paper took the latter approach regarding GBS.
In their paper, the authors sequenced 6 new isolates of GBS. They compared all of these, plus 2 additional isolates that had been sequenced previously. It was found that the 8 isolates shared a "core genome" of around 1800 genes. The cool part of their research, though, was when they looked at the unique genes in each isolate. The sum of all of the core plus all of the unique genes in the species has been termed the "pan-genome." With 8 isolates sequenced, the authors looked to see if they could estimate the size of this "pan-genome," in order to get an idea of how many isolates would have to be sequenced to capture the extent of the total genetic diversity in GBS. Using their model, they concluded that for every new GBS genome sequenced, they would find an average of 33 new strain-specific genes to add to the pan-genome (see figure below). Pretty fantastic and intimidating* at the same time.
This has wide implications in both microbiology and evolutionary biology. First, it again highlights the debate: "what, exactly, is a species?" Is sharing a certain percentage of their DNA enough? What's the cut-off? They provide the example of Bacillus anthracis (anthrax), which seems to have a very limited pan-genome. Is it a separate species, or is it merely a clonal complex within the Bacillus cereus species? This is only a portion of an ongoing debate that often seems to bring up more questions than answers.
Second, is an "infinite diversity" of genes even available? A (valid) criticism that has been made of this paper is that they're premature to extrapolate their numbers from only 8 isolates. And the authors acknowledge this. However, it's really not so surprising. Despite the recognition of bacteria as disease-causing agents well over 100 years ago, we know amazingly little about these organisms. The Institute for Genomic Research (TIGR) has made some huge leaps forward, sequencing genomes of over 1800 predicted species and finding 1.2 million new genes from the Sargasso Sea, and playing a role in the investigation into the diversity of human gut microflora. In that study, they determined that, of ~400 "phylotypes" (basically, a phylogenetic lineage), fully 80% of them were from species that hadn't even been cultured yet. A similar situation exists with our oral microflora (which I blogged previously, here). With this much bacterial diversity merely in these unexamined microbes within our bodies, who knows how much is out there in the environment?
Finally, in addition to all of the "wow, that's neat" biology nerd stuff, this current research has real implications for more practical areas, such as medicine. A vaccine made from a component of the "core" genome, for instance, would provide broader protection than one made from an element of the variable portion of the pan-genome. A similar situation holds for drug targets.
I expect to see many similar papers in the future, as microbial genome sequencing increases from a single isolate to multiple isolates. 'Tis a great time to be a biologist, indeed.
*Why intimidating? As I mentioned, I study GBS. More specifically, I study it from an epidemiological perspective: what is it that makes one strain nastier than another? There are already many parameters to take into account: differences in the host, differences in the microbial ecology, and my main focus, differences in the strain. Now they tell me each strain may have as many as 33 unique genes? Sheesh!
[Edited to add: see more discussion over at The Panda's Thumb.]