BioCyc and Pathway Tools Blog: A New Curated BioCyc Database for Clostridium difficile

Peptoclostridium (Clostridium) difficile (commonly nicknamed “Cdiff”) is a spore-forming bacterium that causes serious healthcare-associated infections. In the United States alone, it is estimated that Cdiff infections were responsible for more than 29,000 deaths in 2011¹. Antibiotic resistance and recurrent infections are common problems in treating Cdiff infections.

The BioCyc collection currently contains twelve Clostridium/Peptoclostridium difficile databases; all of them can be easily accessed from a new home page, http://cdifficile.biocyc.org/. We chose the database for a strain commonly used in the laboratory, Peptoclostridium difficile 630, for a pilot project to update the genome annotation and to add literature curation.

The genome of P. difficile 630 was first sequenced in 2006, and was re-annotated in 2011. Genome sequences of several additional strains have become available since then. Using the Pathway Tools software, we initially imported an updated genome annotation from RefSeq into the existing “tier 3” P. difficile 630 database. We then added further annotation updates that had been captured in the MicroScope database at the Pasteur Institute.

Following the annotation update, we reviewed the complement of metabolic pathways that had been automatically imported into the database when it was built by PathoLogic. It is generally easier to delete a pathway that should not be present in a database than to find and import or newly create a pathway that should be present. By design, PathoLogic therefore over-predicts the occurrence of pathways. Thus, much of the review work resulted in deletion of metabolic pathways that are not in fact present in P. difficile 630. For further improvements to the metabolic pathway complement that is present in the database, we encourage and appreciate the advice of experts in the field.

The annotation updates described above led to an increase from 45 to 136 genes that had been labeled with an “Evidence 1a/1b” note, i.e., “Function experimentally demonstrated in the studied strain/species”. This set of genes provided an attractive target for initial literature curation. We therefore reviewed the publications cited in the genome annotations, complemented by our own literature searches in PubMed. This resulted in the addition of experimental evidence codes to 67 proteins and addition of a total of 80 GO terms to 41 individual proteins in the database that is currently available as part of the BioCyc collection. In addition, we summarized the experimental literature for proteins with known function and added all appropriate literature references that were found in PubMed.

A significant part of the review and curation work was completed for the BioCyc release on March 20, 2015. The initial review of the entire genome is now complete, and additional literature review is ongoing. An updated version of the database will be included in the next BioCyc release.

Exciting Developments in Cdiff Research

A method for reliably creating targeted knockout mutations was recently used for a high-throughput screen to define all essential genes and genes involved in sporulation². The work was done with strain R20291 and could therefore not be directly incorporated into the database for strain 630. We have generated a BioCyc SmartTable that lists all genes that were unambiguously identified as essential:
http://cdifficile.biocyc.org/group?id=biocyc13-1553-3652814396

After the release of our current version of BioCyc, a publication on genome resequencing of P. difficile 630 appeared³. This new version of the genome has not yet been incorporated into BioCyc.

What’s in a Name?

Clostridium difficile has recently been reclassified and renamed to Peptoclostridium difficile⁴. It will not be easy, and may prove to be impossible, to change the habitual usage of its long-established name by the community. So why was this necessary? After all, “Clostridium difficile” is already a mouthful, as evidenced by the invention of the “Cdiff” moniker; and “Peptoclostridium difficile” clearly does not make the pronunciation of the name any easier.

The classification of an organism into a particular genus or species should imply a certain amount of relatedness, both in evolutionary and phenotypic terms. It turns out that organisms that historically carry the genus name Clostridium are very diverse. Phylogenetic trees based on 16S rRNA and conserved proteins (e.g. ribosomal proteins) have enabled improved classification within this group. To better represent the diversity of organisms, distinguishing between the “true” Clostridium genus and more distantly related organisms within the family Clostridiaceae, certain groups of organisms received new genus names. The new “Pepto” prefix derives from the Greek for “digestion” or “able to digest”. Maybe everyone can get used to “Pdiff”?

References:

1 Lessa et al. (2015), N Engl J Med 372:825-34, PMID 25714160

http://www.ncbi.nlm.nih.gov/pubmed/25714160

2 Dembek et al. (2015), MBio 6(2):e02383, PMID 25714712

http://mbio.asm.org/content/6/2/e02383-14

3 Riedel et al. (2015), Genome Announc 3(2):e00276-15, PMID 25858846

http://genomea.asm.org/content/3/2/e00276-15.long

4 Yutin and Galperin (2013), Environ Microbiol 15(10):2631-41, PMID 23834245

http://onlinelibrary.wiley.com/doi/10.1111/1462-2920.12173/abstract