To make it easy to build datasets, the ObjTables software can generate template Excel workbooks for schemas with a table of contents, skeletons for the tables and columns, inline help, dropdown menus, and Excel validation.
ObjTables enables users to leverage Excel as a graphical interface for viewing and editing complex datasets. ObjTables Excel datasets have the following features:
The ObjTables software leverages Git to make it easy to build datasets iteratively, revision datasets, and track their provenance, including when each revision was made, who made it, and why it was made.
To make it easy to build schemas iteratively, the ObjTables software can revision schemas, as well as migrate datasets between different versions of schemas (e.g., adding, removing, and renaming tables and columns).
ObjTables makes it easy to validate and debug datasets at multiple levels:
To help users build large datasets, the ObjTables software can merge datasets by identifying common objects, joining them, and concatenating their relationships to other objects. To help users break down datasets into smaller, more manageable pieces, the ObjTables software can split datasets by cutting relationships and identifying all of the resulting connected subsets of the dataset.
To help users compare and review changes to datasets, the ObjTables software can determine if datasets are semantically equal and identify their differences.
The ObjTables Python package makes it easy to find objects in datasets and use Python to conduct complex analyses of datasets such as numerical simulations.
To make it easy to create files suitable for supplementary materials of journal articles, ObjTables can pretty print datasets with tables of contents, formatted table titles and column headings, and inline help.
To help users understand schemas, ObjTables can generate UML diagrams of schemas.
Many fields aim to understand how behaviors emerge from complex networks. This often requires integrating diverse data about different parts of the network. For example, systems biology aims to understand how cellular behavior emerges from genotype, often using genomics, biochemical, and other data. Excel is a popular tool for merging data because it's flexible and easy to use. However, Excel only supports a few data types, and Excel has limited support for multi-dimensional data. In addition, it is difficult to debug and analyze Excel workbooks.
By combining Excel with schemas, ObjTables makes it easy to build, validate, and analyze complex datasets: (a) users can use Excel to assemble diverse data into tables, (b) users can quickly define schemas for their data, and (c) users can use these schemas to validate their data and parse their data into data structures suitable for further analysis in languages such as Python. For example, we have used ObjTables to build integrated datasets of the biochemistry of Mycoplasma pneumoniae and H1 human embryonic stem cells.
ObjTables also makes it easy to build datasets iteratively over time by helping users revision data with Git and migrate their data as they revise their schemas.
New areas of science often require new types of data and new kinds of models. In turn, this often requires new formats to capture these data and models and new software for working with these formats, including new tools for parsing and validating data and models described in these formats. Creating these formats is often an obstacle for new domains that have limited resources. Evolving these formats as new approaches emerge is also challenging because this often requires updating the software tools for the format and converting old files to the revised format.
ObjTables addresses this issue by making it easy to define schemas for domain-specific data and providing software tools for parsing, manipulating, and validating data encoded in these schemas. For example, we have used ObjTables to create, WC-KB , a format for the experimental omics, biochemical, and physiological data needed to model cellular biochemistry. We have also used ObjTables to create, WC-Lang , a format for whole-cell models of all of the biochemical activity in a cell. Creating these formats required minimal code.
Although supplementary materials often contain valuable data, supplementary materials are underutilized because they are often provided in custom formats that are difficult to understand, parse, and re-use.
ObjTables addresses this issue by enabling authors to publish materials in a tabular format which can easily be read by humans and computers: (a) ObjTables enables authors to pretty print their data with tables of contents and inline help, (b) ObjTables enables authors to provide schemas for parsing their data, and (c) ObjTables enables readers to use these schemas to parse and analyze published data with minimal effort. Together, this makes it easier for authors to publish supplementary materials that are easy for others to re-use for additional studies.
Researchers often need to send their collaborators new datasets and models that cannot be described in any existing format. This often requires collaborators to write custom codes to parse these custom datasets and models. The substantial effort needed to write these codes is a frequent barrier to collaboration.
ObjTables makes it easier to share re-usable data and models with collaborators by (a) enabling researchers to rigorously describe the structure of their data or model with a schema, (b) enabling researchers to capture metadata about their data or model, (c) providing researchers software tools for validating their data, and (d) enabling collaborators to use these schemas to parse data from their colleagues quickly.
We welcome contributions to ObjTables! To contribute, please submit a GitHub pull request or contact us by email .