Monday, November 16, 2015

Everything you always wanted to know about the Enzyme Commission Part II


In this blog we will discuss a few more aspects of the Enzyme Commission and its classification work that were not covered in the previous blog.

Scope of Enzyme Classification
The classification system used by the EC aims to cover enzymes that fall under one of the following six broad categories:

Class 1: Oxidoreductases
Class 2: Transferases
Class 3: Hydrolases
Class 4: Lyases
Class 5: Isomerases
Class 6: Ligases

As you can see, transporters are not covered by the EC list unless they also catalyze an additional reaction that falls under one of these categories (e.g. the phosphoenolpyruvate-dependent phosphotransferase transporters known as PTS). While peptidases fit under class 3, the Enzyme Commission has limited the classification of peptidases in recent years due to the difficulty in drafting reactions that accurately describe the peptidase specificity.


Principles of Classification
Each top class contains several subclasses. For example, Class 4 contains the subclasses 4.1 carbon-carbon lyases, 4.2 carbon-oxygen lyases, 4.3 carbon-nitrogen lyases, etc. The subclasses, in turn, contain sub-subclasses, e.g. 4.1.1, carboxy-lyases. The sub-subclass in which an enzyme resides defines the first three fields in the enzyme’s EC number. The fourth and last field is simply a serial number within that sub-subclass.

The subclasses and sub-subclasses sometime contain the numbers 98 and 99. In general, when both of those numbers exist under the same parent class, 98 is reserved for well-characterized enzymes that do not fit the other subclasses, while 99 indicates some uncertainty about the enzyme (for example, when the identity of an electron acceptor is not known).

The principles of classification are too complex to describe here. They are described in detail at http://enzyme-database.org/rules.php.

Most of the enzymes fit well in one of the existing sub-subclasses. However, some enzymes catalyze complex reactions that do not fit any particular class. In other cases an enzyme might fit in more than one class. In these cases the commission members need to discuss the issue and decide, and occasionally a new sub-subclass is defined.


What Is The Process of Classifying An Enzyme?
Members of the Enzyme Commission create new entries using an online system Called DraftEnz, which was developed by A. McDonald. The members define the exact sub-subclass to which the enzyme belongs, and the entry receives at this point a temporary internal serial number (e.g. 3.1.3.d). The new entry is reviewed by the other members of the commission, who may suggest modifications to any part of the entry. When a member is satisfied with the entry, he or she may vote for it, and when an entry has received at least two non-author votes, it is ready to move to the next stage, which is internal review.

When a sufficient number of new entries have received the necessary votes, a batch of new entries is moved to internal review, at which time they can be viewed at a dedicated web page, and receive their final serial numbers. All the members of the commission are requested to review them. The internal review process ensures that all members get to review all entries, and problems that were not caught earlier are likely to be spotted.

After one month at internal review, the entries are moved to public review. At this stage the entries are visible to the public at the ExplorEnz website by clicking on the tab “New/Amended Enzymes”. The entries are kept at this stage for another month to allow sufficient time for the community to provide feedback. Once the entries clear this stage, they are moved to ExplorEnz and become official.


Some Statistics
In addition to creating new entries, the commission often revises older entries to reflect newer information that has been generated after the entries were created. Existing entries can be revised, deleted, and sometimes transferred to a different EC number. Entries are transferred if new information shows that the reaction catalyzed by the enzyme is different than what was previously thought, requiring the classification under a different sub-subclass, or if new information shows that the enzyme is identical to an enzyme that is classified under a different EC number.

Currently there are 5638 entries in the EC list of enzymes. This number does not include 664 entries that have been transferred and 303 entries that have been deleted.

Since 2010 the commission has created or modified 2221 entries. This is an impressive number for a small group of volunteers, but it is probably a drop in the bucket considering the vast number of well-characterized enzymes that have not been classified yet.


What You Can Do to Help
If you would like to help, it is straight forward to create a new EC entry! You do not even have to suggest the sub-subclass (although you can if you would like). Take a look at a few of the EC entries to get familiar with the format. Then, go to http://enzyme-database.org/forms.php and fill out the form for a new submission. Just make sure you read the information at the beginning of the form, which explains what the requirements are.

Wednesday, November 4, 2015

Everything you always wanted to know about the Enzyme Commission


If you have used BioCyc, you probably noticed that many reactions have EC numbers printed next to them. EC numbers are everywhere – in the primary literature, in annotated genomes, in databases, in online encyclopedias. Where do they come from and what exactly do they mean?

A Bit of History
In the early days the naming of enzymes was not systematic. As a result, many different enzymes were given the same name and, on the other hand, several different names were assigned to the same enzyme. Many of the names were not particularly helpful; for example, the enzyme now known as EC 1.6.99.1, NADPH dehydrogenase, was originally named “old yellow enzyme”.
To sort out the mess, Dixon and Webb introduced a classification system in their 1958 book “Enzymes”, which was based on the reaction catalyzed by the enzyme. Although it was rather limited, it provided the foundation for the current classification system. At about the same time, the International Union of Biochemistry has decided to form an official international commission on enzymes to develop a better classification and naming system. The first full report of the commission was published in 1965, using a six-category system that is still used today. Although this is not the place to describe classification principles, in general each enzyme receives a unique four-component identification number that not only identifies it, but also provides insight into the enzymatic activity of the enzyme. Each EC entry provides additional information such as lists of names and synonyms, references, and often commentary. Full details about the principles of the classification system can be found at http://enzyme-database.org/rules.php and https://en.wikipedia.org/wiki/Enzyme_Commission_number.

The Present
Fast forward 50 years, and the Enzyme Commission (EC) is still going strong. The importance and usefulness of the EC numbers has only increased with time. With the explosion in sequencing volume, having an accurate genome annotation has become critical, and EC numbers provide a well-defined, non-ambiguous method for annotation of enzyme function. Software packages such as Pathway Tools make the most out of this information, assigning the appropriate reactions to the annotated genes based on their EC numbers when building metabolic networks for newly-sequenced genomes. The content of the enzyme list, which used to be published in books, is made available through two online databases that are updated several times a year. A searchable MySQL version of the database, including downloadble data in multiple formats, is available at the ExplorEnz database at http://enzyme-database.org. More than 5600 enzymes are currently classified, and hundreds are added each year.

Who is the EC?
The Enzyme Commission is now part of the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN). It consists of a small number of experts who volunteer their time to the project. Active members (listed alphabetically) include K. Axelsen (Switzerland), R. Cammack (UK), R. Caspi (USA), M. Kotera (Japan), A. McDonald (Ireland), G.P. Moss (UK), D. Schomburg (Germany), I. Schomburg (Germany), and K.F. Tipton (Ireland). The commission members are using an online curation system that was developed by A. McDonald, called ExplorEnz. Members of the committee continue to classify new enzymes, modify existing entries as new information becomes available, and extend or modify the classification rules to accommodate new challenges.
If you would like to request a new EC entry for an enzyme that hasn’t been classified yet, or submit an error or update report about an existing entry, submission forms are available at http://enzyme-database.org/forms.php. Since MetaCyc curator R. Caspi is a member of the EC, you are also welcome to send EC-related questions or comments to biocyc-support@AI.SRI.COM.

Additional Information
  1.  Dixon, M. and Webb, E.C. (1958), Enzymes. Longmans Green, London, pp. 183–227.
  2.  Tipton, K. and Boyce, S. (2000) History of the enzyme nomenclature system. Bioinformatics, 16, 34-40.
  3.  McDonald, A.G., Boyce, S. and Tipton, K.F. (2009) ExplorEnz: the primary source of the IUBMB enzyme list. Nucleic Acids Res, 37, D593-597.
  4.  McDonald, A.G. and Tipton, K.F. (2014) Fifty-five years of enzyme classification: advances and difficulties. Febs J, 281, 583-592.

Friday, April 24, 2015

A New Curated BioCyc Database for Clostridium difficile


Peptoclostridium (Clostridium) difficile (commonly nicknamed “Cdiff”) is a spore-forming bacterium that causes serious healthcare-associated infections. In the United States alone, it is estimated that Cdiff infections were responsible for more than 29,000 deaths in 20111. Antibiotic resistance and recurrent infections are common problems in treating Cdiff infections.

The BioCyc collection currently contains twelve Clostridium/Peptoclostridium difficile databases; all of them can be easily accessed from a new home page, http://cdifficile.biocyc.org/. We chose the database for a strain commonly used in the laboratory, Peptoclostridium difficile 630, for a pilot project to update the genome annotation and to add literature curation.

Wednesday, April 15, 2015

Querying Databases by Organism Properties

The latest release (version 19.0) of BioCyc includes PGDBs for 5500 different organisms, and we expect that number to grow with every future release. With such numbers, unless you already have a specific species and strain in mind, it becomes impractical to browse through the complete list of organisms. We already allow users of the BioCyc website to select organisms specifically by name or taxonomic class. We describe here extensions to that selection process that enable users to search for organisms based on a larger set of properties of the organism, such when and where the sample was collected and what kind of environment it lives in.

Friday, March 6, 2015

Procedure for Creating Metabolic Models from Sequenced Genomes




In the past, construction of quantitative metabolic flux models has been an extremely time-consuming process, requiring 12-18 months to create a bacterial model.  One of our main goals in designing the MetaFlux module for creating metabolic models within Pathway Tools has been to speed up this process by automating as many of its steps as possible, and by providing software power tools for debugging metabolic models (a viewpoint that was put forward by our colleague Jeremy Zucker).  We can now create metabolic models using MetaFlux in approximately 1 month.
 This blog surveys our recommended procedure for creating metabolic models from sequenced genomes using Pathway Tools.  

Thursday, February 26, 2015

Metabolic Modeling to Predict Organism Phenotypes


Here we explore one of the major applications of steady-state metabolic modeling: the prediction of organism growth rates under varying perturbations.  The two most common perturbations studied with metabolic models are variations in the nutrients available to the organism (e.g., changes in carbon source, nitrogen source, and oxygen availability), and the presence of gene knockouts.  These two perturbations can be combined since the effects of gene knockouts can be modeled under different nutrient mixes. 

Friday, January 30, 2015

Metabolic Modeling for Validation of Genome Annotations



A major advance in bioinformatics in the last decade is the rapidity with which we can now create quantitative metabolic models from sequenced genomes.  In this and future blog posts we will examine several applications of metabolic modeling.  This post introduces metabolic modeling, considers its use for validation of genome annotations, and proposes that construction of metabolic models can form a routine part of the genome annotation process.

Thursday, January 22, 2015

Searching for Metabolic Routes in Pathway Tools

The Metabolic Route Search Problem

Consider the problem of performing an in-depth exploration of the metabolic network of an organism that you study, to compare alternative paths within that network whereby the organism can transform a starting metabolite into an ending metabolite.  What are the lengths and properties of these alternative pathways? 

Consider now a broader problem, namely the metabolic-engineering problem of finding the most efficient modification to the biochemical network of an organism to allow the organism to synthesize a new metabolite from a feedstock compound.  One aspect of "most efficient" is minimize the number of reactions added from an external database of known reactions.

RouteSearch [1] is a Pathway Tools component that solves both of the preceding problems by computing optimal metabolic routes, that is, an optimal series of biochemical reactions that connects start and goal compounds, given various cost parameters to control the optimality of the routes found.  RouteSearch can display several of the best routes it finds using an interactive graphical web page.  When RouteSearch is used for metabolic engineering, it uses the MetaCyc database as its external reaction database.

In computing optimality, RouteSearch takes into account the conservation of nonhydrogen atoms from the start compound to the goal compound. Perhaps surprisingly, it is possible to devise reaction paths that conserve no atoms from start to goal compound!  The more atoms that are conserved, the more efficient the transformation from start to goal.  To compute the number of conserved atoms, RouteSearch uses precomputed atom mappings of reactions that are available in MetaCyc [2]. An atom mapping of a reaction gives a one to one correspondence of each nonhydrogen atom from reactants to products.

RouteSearch is available only in Web mode in Pathway Tools (since version 17.0, March 2013). It is also available at BioCyc.org but without the possibility to add reactions from MetaCyc (that mode is available only for locally installed versions of Pathway Tools). More details on how to use MetaCyc with RouteSearch are given in the following section.