Friday, December 5, 2014

Propagating Updates from MetaCyc -- Nearly Effortless Improvements to your PGDB!

New versions of Pathway Tools are released every six months, with the current version being 18.5. Included in each new release is an updated version of the MetaCyc database. The curators at SRI are constantly working to improve MetaCyc, both adding new information, and fixing errors in the existing data. Here are some of the kinds of changes you can expect to see with each new release:
  • New compounds, reactions, enzymes and pathways
  • Addition of compound structures to existing compounds that previously lacked them
  • Fixes to errors and other improvements to existing compound structures
  • Updates to reaction equations to fix errors and so that they balance and are correctly protonated
  • Updates to pathways to fix errors and incorporate new knowledge
  • Updates and addition of EC Numbers, names, literature citations, links to other databases, and textual summaries.
Any new PGDB you create will have its reactions, pathways and metabolites imported from the most recent version of MetaCyc and therefore benefit from all the latest changes.  But what about your old PGDBs? If they were created with an older version of MetaCyc, then more and more of their information will become outdated over time. When you open an old PGDB in a new version of Pathway Tools, you will be asked to upgrade it. Upgrading applies schema changes that are necessary in order for the PGDB to be able to operate with the new software, but it will not incorporate the MetaCyc data updates. To incorporate MetaCyc updates into your PGDB, you must invoke the Tools → Propagate MetaCyc Data Updates command for each of your PGDBs.

Why should I use this tool?

Aside from the obvious benefits of fixing errors in your PGDB, there are several reasons why it is a good idea to propagate MetaCyc updates with every new release.
  • It will allow more of your reactions to balance, improving atom mapping and Route Search computations.
  • It will greatly improve the process and results of building metabolic models via flux-balance analysis -- unbalanced reactions are automatically excluded from metabolic models by MetaFlux.
  • It will facilitate apples-to-apples comparisons with other PGDBs built with (or updated to) the latest version of MetaCyc.



What does this tool do?

Every compound, reaction and pathway in your PGDB is compared with the corresponding object in MetaCyc. Where there are differences between the two, the differences are classified according to type. The user is presented with a dialog that itemizes each difference by type and offers the opportunity to examine them and propagate the MetaCyc data either individually or en masse. If there is no corresponding object in MetaCyc, a search is conducted for a suitable object to merge it with, if any, and this option too is presented to the user.


A dialog summarizing the differences between a PGDB and MetaCyc.  In general, clicking on a Propagate All button applies all the MetaCyc updates for that category.  Clicking on a Select for Update button opens a detail dialog, like the one shown below.

A detail dialog for compound structures, allowing the user to select a subset to be updated.

Why does this tool not run automatically when I upgrade my PGDB?

There are two possible reasons why an object might be different between your PGDB and MetaCyc: either the object has been updated in MetaCyc, or it has been updated by you in your PGDB. There are two possible reasons why an object might exist in your PGDB but not in MetaCyc: either it has been created by you, or it has been deleted (or merged with another object) in MetaCyc. The software cannot distinguish between these cases, and we do not want to risk overwriting any manual curation you might have done. Thus, even when using this tool, changes are not propagated automatically -- the user must approve them (although it is very easy to approve large classes of changes with one click). Because of the manual oversight required, we cannot include this process in the automated upgrade operation.

Will this tool bring in new compounds, reactions and pathways from MetaCyc?

No new pathways will be imported into your PGDB from MetaCyc. If you would like that to happen, invoke the Pathologic Refine → Rescore Pathways command instead (or, preferably, as well). The only new compounds and reactions that will be imported into your PGDB by this tool will be those belonging to updated pathways and reactions.

How can I get more information?

Operation of this tool is described in more detail in the Pathway Tools User Guide, Section 7.10 Updating a PGDB to Incorporate Updates from MetaCyc.

No comments:

Post a Comment