A continuing research project is dedicated to development of mathematical and software infrastructure in support of post genomics research in systems biology. One near-term objective of the project is to contribute to deeper understanding of the organizational principles of biological networks. A distinguishing theme of this project is a focus on scalable methods of robustness and theoretically sound methods of the use of experimental data to validate (or invalidate) models; this theme stands in contrast to the heretofore prevalent theme of relying purely on simulation.

A central goal of modeling and simulation in systems biology is to connect molecular mechanisms to network functions to questions of biomedical relevance. Unfortunately, many of the most critical questions involve events that are extremely rare at the level of individual cells in an organism, yet are catastrophic to the organism as a whole. Consequently, simulation methods that may be adequate for studying generic or typical behavior are inadequate for use in exploring worst-case scenarios, which are computationally intractable using conventional methods. In an effort to overcome this limitation of conventional methods, the present project is extending best practice software tools and algorithms for robustness analysis that have become standards in engineering to models of biological relevance, which are typically nonlinear, hybrid, uncertain, and stochastic. This effort includes integration of formal inference methods from previously fragmented theories in computer science with those of control and dynamical systems.

The theoretical framework being developed in this project represents an unprecedented opportunity to create a system for analysis and validation (or invalidation) of models and for iterative experimentation on models that may be of a large-scale, stochastic, nonlinear, nonequilibrium, and mixed continuous and discrete nature with multiple time and spatial scales. The remarkable quality of the theoretical framework is that it can be used to prove conjectures regarding such complex, difficult-to-analyze models. Examples of conjectures that can be proven are (1) a given model cannot fit the experimental data, no matter what parameters are used and (2) a given model is robust, no matter how its parameters are varied. Heretofore, there has been no way of proving such conjectures except in cases of much simpler models. The combination of the capability of proving such conjectures and sophisticated robustness analysis methods is what is needed to make it possible to analyze realistic biological models and relate them to experimental data to help answer the question, “What is the next experiment that would best differentiate among the current alternative hypotheses?”

An especially notable recent product of this continuing development effort is the Systems Biology Markup Language (SBML), which is a machine-readable language that provides a format, based on Extensible Markup Language (XML), for representing models in such a way that they can be executed within, and exchanged among, different software systems to communicate and exchange the models. By utilizing SBML to define their input and output formats, different software tools can all operate on an identical representation of a model, removing opportunities for errors in translation and assuring a common starting point for analyses and simulations.

SBML can encode models representing biochemical entities (species) linked by reactions to form networks. An important principle is that a model is decomposed into explicitly labeled constituent elements, the set of which resembles a verbose rendition of chemical-reaction equations. The representation deliberately does not cast the model directly into a set of differential equations or other specific interpretation of the model. This explicit, modeling framework-agnostic decomposition makes it easier for a software tool to interpret the model and translate the SBML form into whatever internal form the tool uses. The formalisms in SBML enable modeling of a wide range of biological phenomena, including (and not limited to) metabolism, cell signaling, and gene regulation. SBML affords significant flexibility and power by making it possible to define arbitrary formulae for rates of change of variables and to express other constraints mathematically.

The software infrastructure of SBML includes libSBML, which is an embedded software library that provides an application programming interface (API) for working with SBML in the C, C++, Java, PERL, MATLAB, Lisp, and Python programming languages. Programmers can embed libSBML in their application programs, saving themselves the work of implementing their own parsing, manipulation, and validation software. libSBML is open-source software written in C and C++ and is highly portable. It is currently supported on the Linux, Solaris, MacOS X, and Microsoft Windows operating systems.

This work was done by John C. Doyle and Michael Hucka of California Institute of Technology for the Air Force Research Laboratory

This Brief includes a Technical Support Package (TSP).
Enlightened Multiscale Simulation of Biochemical Networks

(reference AFRL-0058) is currently available for download from the TSP library.

Don't have an account? Sign up here.