BioCyc and Pathway Tools Blog

Thursday, March 10, 2016

Introducing Pathway Collages...

Figure 1

Pathway Tools has long been recognized for the quality of our automatically generated individual metabolic pathway diagrams, which are intuitive to biologists, can be shown at varying levels of detail, and can be customized in various ways, including with the overlay of omics data. When a more global view is called for, our cellular overview diagram depicts the entire metabolic network for an organism, with capabilities for selective highlighting and overlay of omics data. However, to understand some biochemical situations, viewing a single pathway is insufficient, whereas viewing the entire metabolic network results in information overload. Pathway Collages, new in Pathway Tools version 19.5, are an attempt to bridge this gap, allowing users to create high-quality, customized, user-manipulable diagrams containing collections of user-specified pathways.

Pathway Collages can be explored and edited via the Pathway Collage Viewer web browser application. This application, implemented using the Cytoscape.js open-source JavaScript graph visualization library, supports panning, zooming, and all the editing and customization operations described in this post and the documentation embedded within the Pathway Collage Viewer itself. Feel free to experiment yourself with the example pathway collage online at http://biocyc.org/cytoscape-js/ovsubset.html?graph=example1&showHelp=T, or create your own following the instructions below.

Figure 2

Three example Pathway Collage figures are illustrated here. Figure 1 depicts a Pathway Collage consisting of four E. coli pathways overlaid with gene expression data. This diagram has already been manually adjusted by repositioning the pathways relative to each other and tweaking node font sizes and shapes. Metabolites that are shared between pathways are indicated by drawing connecting lines between them.

Figure 2 shows a collage consisting of two E. coli pathways overlaid with predicted reaction flux data. In this diagram, rather than drawing connecting lines, compounds that are shared between the two pathways are merged, showing glycolysis flowing seamlessly into fermentation.

Figure 3

Figure 3 depicts a collage containing a larger number of pathways at a lower zoom level, so metabolite, enzyme and gene names are automatically suppressed (the font size of the pathway labels has been increased so those labels remain visible). In addition to manually repositioning pathways, merging some common nodes, and changing the default colors, some metabolites of interest have been highlighted in purple.

Now that you've seen what you can do with a Pathway Collage, how can you create one for yourself? Pathway Collages can be created from either the BioCyc website (or other Pathway Tools-based website) or from desktop Pathway Tools. There are five basic steps.

Specify the set of pathways to be included. The simplest and most reliable way to specify a set of pathways is to generate a SmartTable containing the desired pathways, and then export the SmartTable to a Pathway Collage. This works both for the desktop and web versions of Pathway Tools, and enables you to keep your list of pathways around in case you ever want to edit it or regenerate your collage. There are other ways to specify a set of pathways, such as by interactively clicking on them in the cellular overview diagram (desktop only), from an omics dataset (web only), or by creating a seed collage from a single pathway and then interactively adding more (web only). We may add additional options to specify pathways in the future. Consult the documentation for more details.
Export to Pathway Collage Viewer. Pathway Tools will compute automatic layouts of the individual pathways within the collage, then position those diagrams next to one another horizontally, and send that initial layout of the collage to the Pathway Collage Viewer application in your web browser.
Interactively refine and customize the collage. This can involve repositioning items, showing connections, adding, deleting or merging elements, editing labels, highlighting elements of interest, and/or customizing node and edge styles. By default, only the metabolites along the main backbone of a pathway are included in the diagrams, but side metabolites can be added interactively. Additional pathways involving a metabolite of interest can also be added interactively.
Import omics data to be visualized on the collage (optional). Omics data can be added either before or after the collage is generated. The collage can display omics data associated with either genes, metabolites, or reactions. When multi-timepoint gene expression data is displayed, the display of enzyme names is suppressed.
Save or export the collage. At any time, a pathway collage can be saved as a JSON-format graph file on your computer; that file can later be loaded back in to the collage viewer (not all browsers support this operation --- we recommend using Chrome or Firefox). A pathway collage can also be exported to a PNG-format image file for use in presentations or publications. The image will be generated with a resolution comparable to that of the display at the time the image is created (up to some maximum), therefore, the highest-quality images are obtained if the collage is displayed at a high zoom level when exporting.

For more information on Pathway Collages, see the Pathway Tools Website User Guide or the help documentation within the Pathway Collage Viewer itself.

Monday, November 16, 2015

Everything you always wanted to know about the Enzyme Commission Part II

In this blog we will discuss a few more aspects of the Enzyme Commission and its classification work that were not covered in the previous blog.

Scope of Enzyme Classification

The classification system used by the EC aims to cover enzymes that fall under one of the following six broad categories:

Class 1: Oxidoreductases

Class 2: Transferases

Class 3: Hydrolases

Class 4: Lyases

Class 5: Isomerases

Class 6: Ligases

As you can see, transporters are not covered by the EC list unless they also catalyze an additional reaction that falls under one of these categories (e.g. the phosphoenolpyruvate-dependent phosphotransferase transporters known as PTS). While peptidases fit under class 3, the Enzyme Commission has limited the classification of peptidases in recent years due to the difficulty in drafting reactions that accurately describe the peptidase specificity.

Principles of Classification

Each top class contains several subclasses. For example, Class 4 contains the subclasses 4.1 carbon-carbon lyases, 4.2 carbon-oxygen lyases, 4.3 carbon-nitrogen lyases, etc. The subclasses, in turn, contain sub-subclasses, e.g. 4.1.1, carboxy-lyases. The sub-subclass in which an enzyme resides defines the first three fields in the enzyme’s EC number. The fourth and last field is simply a serial number within that sub-subclass.

The subclasses and sub-subclasses sometime contain the numbers 98 and 99. In general, when both of those numbers exist under the same parent class, 98 is reserved for well-characterized enzymes that do not fit the other subclasses, while 99 indicates some uncertainty about the enzyme (for example, when the identity of an electron acceptor is not known).

The principles of classification are too complex to describe here. They are described in detail at http://enzyme-database.org/rules.php.

Most of the enzymes fit well in one of the existing sub-subclasses. However, some enzymes catalyze complex reactions that do not fit any particular class. In other cases an enzyme might fit in more than one class. In these cases the commission members need to discuss the issue and decide, and occasionally a new sub-subclass is defined.

What Is The Process of Classifying An Enzyme?

Members of the Enzyme Commission create new entries using an online system Called DraftEnz, which was developed by A. McDonald. The members define the exact sub-subclass to which the enzyme belongs, and the entry receives at this point a temporary internal serial number (e.g. 3.1.3.d). The new entry is reviewed by the other members of the commission, who may suggest modifications to any part of the entry. When a member is satisfied with the entry, he or she may vote for it, and when an entry has received at least two non-author votes, it is ready to move to the next stage, which is internal review.

When a sufficient number of new entries have received the necessary votes, a batch of new entries is moved to internal review, at which time they can be viewed at a dedicated web page, and receive their final serial numbers. All the members of the commission are requested to review them. The internal review process ensures that all members get to review all entries, and problems that were not caught earlier are likely to be spotted.

After one month at internal review, the entries are moved to public review. At this stage the entries are visible to the public at the ExplorEnz website by clicking on the tab “New/Amended Enzymes”. The entries are kept at this stage for another month to allow sufficient time for the community to provide feedback. Once the entries clear this stage, they are moved to ExplorEnz and become official.

Some Statistics

In addition to creating new entries, the commission often revises older entries to reflect newer information that has been generated after the entries were created. Existing entries can be revised, deleted, and sometimes transferred to a different EC number. Entries are transferred if new information shows that the reaction catalyzed by the enzyme is different than what was previously thought, requiring the classification under a different sub-subclass, or if new information shows that the enzyme is identical to an enzyme that is classified under a different EC number.

Currently there are 5638 entries in the EC list of enzymes. This number does not include 664 entries that have been transferred and 303 entries that have been deleted.

Since 2010 the commission has created or modified 2221 entries. This is an impressive number for a small group of volunteers, but it is probably a drop in the bucket considering the vast number of well-characterized enzymes that have not been classified yet.

What You Can Do to Help

If you would like to help, it is straight forward to create a new EC entry! You do not even have to suggest the sub-subclass (although you can if you would like). Take a look at a few of the EC entries to get familiar with the format. Then, go to http://enzyme-database.org/forms.php and fill out the form for a new submission. Just make sure you read the information at the beginning of the form, which explains what the requirements are.

Wednesday, November 4, 2015

Everything you always wanted to know about the Enzyme Commission

If you have used BioCyc, you probably noticed that many reactions have EC numbers printed next to them. EC numbers are everywhere – in the primary literature, in annotated genomes, in databases, in online encyclopedias. Where do they come from and what exactly do they mean?

A Bit of History

In the early days the naming of enzymes was not systematic. As a result, many different enzymes were given the same name and, on the other hand, several different names were assigned to the same enzyme. Many of the names were not particularly helpful; for example, the enzyme now known as EC 1.6.99.1, NADPH dehydrogenase, was originally named “old yellow enzyme”.

To sort out the mess, Dixon and Webb introduced a classification system in their 1958 book “Enzymes”, which was based on the reaction catalyzed by the enzyme. Although it was rather limited, it provided the foundation for the current classification system. At about the same time, the International Union of Biochemistry has decided to form an official international commission on enzymes to develop a better classification and naming system. The first full report of the commission was published in 1965, using a six-category system that is still used today. Although this is not the place to describe classification principles, in general each enzyme receives a unique four-component identification number that not only identifies it, but also provides insight into the enzymatic activity of the enzyme. Each EC entry provides additional information such as lists of names and synonyms, references, and often commentary. Full details about the principles of the classification system can be found at http://enzyme-database.org/rules.php and https://en.wikipedia.org/wiki/Enzyme_Commission_number.

The Present

Fast forward 50 years, and the Enzyme Commission (EC) is still going strong. The importance and usefulness of the EC numbers has only increased with time. With the explosion in sequencing volume, having an accurate genome annotation has become critical, and EC numbers provide a well-defined, non-ambiguous method for annotation of enzyme function. Software packages such as Pathway Tools make the most out of this information, assigning the appropriate reactions to the annotated genes based on their EC numbers when building metabolic networks for newly-sequenced genomes. The content of the enzyme list, which used to be published in books, is made available through two online databases that are updated several times a year. A searchable MySQL version of the database, including downloadble data in multiple formats, is available at the ExplorEnz database at http://enzyme-database.org. More than 5600 enzymes are currently classified, and hundreds are added each year.

Who is the EC?

The Enzyme Commission is now part of the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN). It consists of a small number of experts who volunteer their time to the project. Active members (listed alphabetically) include K. Axelsen (Switzerland), R. Cammack (UK), R. Caspi (USA), M. Kotera (Japan), A. McDonald (Ireland), G.P. Moss (UK), D. Schomburg (Germany), I. Schomburg (Germany), and K.F. Tipton (Ireland). The commission members are using an online curation system that was developed by A. McDonald, called ExplorEnz. Members of the committee continue to classify new enzymes, modify existing entries as new information becomes available, and extend or modify the classification rules to accommodate new challenges.

If you would like to request a new EC entry for an enzyme that hasn’t been classified yet, or submit an error or update report about an existing entry, submission forms are available at http://enzyme-database.org/forms.php. Since MetaCyc curator R. Caspi is a member of the EC, you are also welcome to send EC-related questions or comments to biocyc-support@AI.SRI.COM.

Additional Information

Dixon, M. and Webb, E.C. (1958), Enzymes. Longmans Green, London, pp. 183–227.
Tipton, K. and Boyce, S. (2000) History of the enzyme nomenclature system. Bioinformatics, 16, 34-40.
McDonald, A.G., Boyce, S. and Tipton, K.F. (2009) ExplorEnz: the primary source of the IUBMB enzyme list. Nucleic Acids Res, 37, D593-597.
McDonald, A.G. and Tipton, K.F. (2014) Fifty-five years of enzyme classification: advances and difficulties. Febs J, 281, 583-592.

Friday, April 24, 2015

A New Curated BioCyc Database for Clostridium difficile

Peptoclostridium (Clostridium) difficile (commonly nicknamed “Cdiff”) is a spore-forming bacterium that causes serious healthcare-associated infections. In the United States alone, it is estimated that Cdiff infections were responsible for more than 29,000 deaths in 2011¹. Antibiotic resistance and recurrent infections are common problems in treating Cdiff infections.

The BioCyc collection currently contains twelve Clostridium/Peptoclostridium difficile databases; all of them can be easily accessed from a new home page, http://cdifficile.biocyc.org/. We chose the database for a strain commonly used in the laboratory, Peptoclostridium difficile 630, for a pilot project to update the genome annotation and to add literature curation.

Querying Databases by Organism Properties

The latest release (version 19.0) of BioCyc includes PGDBs for 5500 different organisms, and we expect that number to grow with every future release. With such numbers, unless you already have a specific species and strain in mind, it becomes impractical to browse through the complete list of organisms. We already allow users of the BioCyc website to select organisms specifically by name or taxonomic class. We describe here extensions to that selection process that enable users to search for organisms based on a larger set of properties of the organism, such when and where the sample was collected and what kind of environment it lives in.

Procedure for Creating Metabolic Models from Sequenced Genomes

In the past, construction of quantitative metabolic flux models has been an extremely time-consuming process, requiring 12-18 months to create a bacterial model. One of our main goals in designing the MetaFlux module for creating metabolic models within Pathway Tools has been to speed up this process by automating as many of its steps as possible, and by providing software power tools for debugging metabolic models (a viewpoint that was put forward by our colleague Jeremy Zucker). We can now create metabolic models using MetaFlux in approximately 1 month.

This blog surveys our recommended procedure for creating metabolic models from sequenced genomes using Pathway Tools.

Metabolic Modeling to Predict Organism Phenotypes

Here we explore one of the major applications of steady-state metabolic modeling: the prediction of organism growth rates under varying perturbations. The two most common perturbations studied with metabolic models are variations in the nutrients available to the organism (e.g., changes in carbon source, nitrogen source, and oxygen availability), and the presence of gene knockouts. These two perturbations can be combined since the effects of gene knockouts can be modeled under different nutrient mixes.

Metabolic Modeling for Validation of Genome Annotations

A major advance in bioinformatics in the last decade is the rapidity with which we can now create quantitative metabolic models from sequenced genomes. In this and future blog posts we will examine several applications of metabolic modeling. This post introduces metabolic modeling, considers its use for validation of genome annotations, and proposes that construction of metabolic models can form a routine part of the genome annotation process.

Searching for Metabolic Routes in Pathway Tools

The Metabolic Route Search Problem

Consider the problem of performing an in-depth exploration of the metabolic network of an organism that you study, to compare alternative paths within that network whereby the organism can transform a starting metabolite into an ending metabolite. What are the lengths and properties of these alternative pathways?

Consider now a broader problem, namely the metabolic-engineering problem of finding the most efficient modification to the biochemical network of an organism to allow the organism to synthesize a new metabolite from a feedstock compound. One aspect of "most efficient" is minimize the number of reactions added from an external database of known reactions.

RouteSearch [1] is a Pathway Tools component that solves both of the preceding problems by computing optimal metabolic routes, that is, an optimal series of biochemical reactions that connects start and goal compounds, given various cost parameters to control the optimality of the routes found. RouteSearch can display several of the best routes it finds using an interactive graphical web page. When RouteSearch is used for metabolic engineering, it uses the MetaCyc database as its external reaction database.

In computing optimality, RouteSearch takes into account the conservation of nonhydrogen atoms from the start compound to the goal compound. Perhaps surprisingly, it is possible to devise reaction paths that conserve no atoms from start to goal compound! The more atoms that are conserved, the more efficient the transformation from start to goal. To compute the number of conserved atoms, RouteSearch uses precomputed atom mappings of reactions that are available in MetaCyc [2]. An atom mapping of a reaction gives a one to one correspondence of each nonhydrogen atom from reactants to products.

RouteSearch is available only in Web mode in Pathway Tools (since version 17.0, March 2013). It is also available at BioCyc.org but without the possibility to add reactions from MetaCyc (that mode is available only for locally installed versions of Pathway Tools). More details on how to use MetaCyc with RouteSearch are given in the following section.

Propagating Updates from MetaCyc -- Nearly Effortless Improvements to your PGDB!

New versions of Pathway Tools are released every six months, with the current version being 18.5. Included in each new release is an updated version of the MetaCyc database. The curators at SRI are constantly working to improve MetaCyc, both adding new information, and fixing errors in the existing data. Here are some of the kinds of changes you can expect to see with each new release:

New compounds, reactions, enzymes and pathways
Addition of compound structures to existing compounds that previously lacked them
Fixes to errors and other improvements to existing compound structures
Updates to reaction equations to fix errors and so that they balance and are correctly protonated
Updates to pathways to fix errors and incorporate new knowledge
Updates and addition of EC Numbers, names, literature citations, links to other databases, and textual summaries.

Any new PGDB you create will have its reactions, pathways and metabolites imported from the most recent version of MetaCyc and therefore benefit from all the latest changes. But what about your old PGDBs? If they were created with an older version of MetaCyc, then more and more of their information will become outdated over time. When you open an old PGDB in a new version of Pathway Tools, you will be asked to upgrade it. Upgrading applies schema changes that are necessary in order for the PGDB to be able to operate with the new software, but it will not incorporate the MetaCyc data updates. To incorporate MetaCyc updates into your PGDB, you must invoke the Tools → Propagate MetaCyc Data Updates command for each of your PGDBs.

Why should I use this tool?

Aside from the obvious benefits of fixing errors in your PGDB, there are several reasons why it is a good idea to propagate MetaCyc updates with every new release.

It will allow more of your reactions to balance, improving atom mapping and Route Search computations.
It will greatly improve the process and results of building metabolic models via flux-balance analysis -- unbalanced reactions are automatically excluded from metabolic models by MetaFlux.
It will facilitate apples-to-apples comparisons with other PGDBs built with (or updated to) the latest version of MetaCyc.

BioCyc and Pathway Tools Blog

Thursday, March 10, 2016

Introducing Pathway Collages...

Monday, November 16, 2015

Everything you always wanted to know about the Enzyme Commission Part II

Wednesday, November 4, 2015

Everything you always wanted to know about the Enzyme Commission

Friday, April 24, 2015

A New Curated BioCyc Database for Clostridium difficile

Wednesday, April 15, 2015

Querying Databases by Organism Properties

Friday, March 6, 2015

Procedure for Creating Metabolic Models from Sequenced Genomes

Thursday, February 26, 2015

Metabolic Modeling to Predict Organism Phenotypes

Friday, January 30, 2015

Metabolic Modeling for Validation of Genome Annotations

Thursday, January 22, 2015

Searching for Metabolic Routes in Pathway Tools

The Metabolic Route Search Problem

Friday, December 5, 2014

Propagating Updates from MetaCyc -- Nearly Effortless Improvements to your PGDB!

Why should I use this tool?

Thursday, March 10, 2016

Monday, November 16, 2015

Wednesday, November 4, 2015

Friday, April 24, 2015

Wednesday, April 15, 2015

Friday, March 6, 2015

Thursday, February 26, 2015

Friday, January 30, 2015

Thursday, January 22, 2015

The Metabolic Route Search Problem

Friday, December 5, 2014

Why should I use this tool?

Subscribe To