Overview

ObjTables is a toolkit for using schemas to model collections of tables that represent complex datasets, combining the ease of use of Excel with the rigor and power of schemas.

Benefits

  • Use collections of tables (e.g., an Excel workbook) to represent complex data consisting of multiple related objects of multiple types (e.g., rows of worksheets), each with multiple attributes (e.g., columns).
  • Use complex data types (e.g., numbers, strings, numerical arrays, symbolic mathematical expressions, chemical structures, biological sequences, etc.) within tables.
  • Use Excel as a graphical interface for viewing and editing complex datasets.
  • Use embedded tables and grammars to encode relational information into columns and groups of columns of tables.
  • Define clear schemas for tabular datasets.
  • Use schemas to rigorously validate tabular datasets.
  • Use schemas to parse tabular datasets into data structures for further analysis in languages such as Python.
  • Compare, merge, split, revision, and migrate tabular datasets.

Components of the ObjTables toolkit

  • Format for schemas for tabular datasets.
  • Numerous data types.
  • Format for tabular datasets.
  • Software tools for working with tabular datasets.
  • Python package for additional flexibility and more complex analyses.

Use cases

  • Integrating heterogeneous data.
  • Collaboratively and iteratively building datasets and models.
  • Defining formats for new types of data and models.
  • Publishing re-usable supplementary materials to journal articles.
  • Sharing re-usable data with colleagues.

Features

Create Excel templates for building complex datasets

To make it easy to build datasets, the ObjTables software can generate template Excel workbooks for schemas with a table of contents, skeletons for the tables and columns, inline help, dropdown menus, and Excel validation.

Use Excel as a GUI for viewing and editing complex datasets

ObjTables enables users to leverage Excel as a graphical interface for viewing and editing complex datasets. ObjTables Excel datasets have the following features:

  • Table of contents: Datasets can include a worksheet that describes the data represented by each worksheet and provides hyperlinks to each worksheet.
  • Formatted class titles: Each worksheet includes a title bar that describes the data captured by the worksheet. The title bars are formatted, frozen, and protected from editing.
  • Formatted attribute headings: Each workbook includes headings for each column and group of columns. The headings are formatted, auto-filtered, frozen, and protected from editing.
  • Inline help for attributes: ObjTables uses Excel comments to embed help information about each attribute into it's heading.
  • Select menus for enumerations and relationships: ObjTables provides dropdown menus for each attribute that represents an enumeration, a one-to-one relationship, or a many-to-one relationship.
  • Instant validation: ObjTables uses Excel to validate several basic properties of attributes. Note, due to the limitations of Excel, this provides limited validation. The ObjTables software provides more extensive validation.
  • Hidden extra rows and columns: To help users focus on their data, ObjTables hides all empty rows and columns.
  • Protection from unintentional editing: To help users avoid mistakes, ObjTables protects worksheets.

Iteratively build and revision complex datasets

The ObjTables software leverages Git to make it easy to build datasets iteratively, revision datasets, and track their provenance, including when each revision was made, who made it, and why it was made.

Iteratively build schemas and migrate complex datasets

To make it easy to build schemas iteratively, the ObjTables software can revision schemas, as well as migrate datasets between different versions of schemas (e.g., adding, removing, and renaming tables and columns).

Rigorously validate and quickly debug complex datasets

ObjTables makes it easy to validate and debug datasets at multiple levels:

  • Attribute validation: Validations of individual attributes can be defined declaratively. More complex validations can be defined by implementing using Python package.
  • Instance validation: Users can use the Python package to implement custom instance-level validation by customizing the `obj_tables.Model.validate` method of each class.
  • Class-level validation: Most attributes can be constrained to have unique values across all instances. Python modules that implement schemas can also capture tuples of attributes that must be unique across all instances of a class. See the Python documentation for more information.

Merge and split datasets

To help users build large datasets, the ObjTables software can merge datasets by identifying common objects, joining them, and concatenating their relationships to other objects. To help users break down datasets into smaller, more manageable pieces, the ObjTables software can split datasets by cutting relationships and identifying all of the resulting connected subsets of the dataset.

Compare/difference datasets

To help users compare and review changes to datasets, the ObjTables software can determine if datasets are semantically equal and identify their differences.

Query and analyze complex datasets

The ObjTables Python package makes it easy to find objects in datasets and use Python to conduct complex analyses of datasets such as numerical simulations.

Pretty print datasets for publication

To make it easy to create files suitable for supplementary materials of journal articles, ObjTables can pretty print datasets with tables of contents, formatted table titles and column headings, and inline help.

Visualize schemas

To help users understand schemas, ObjTables can generate UML diagrams of schemas.

Components of the ObjTables toolkit

Tabular format for schemas for datasets

ObjTables schemas capture the format of each table, including the name and data type of each column, which cells represent relationships among the entries in the tables, and constraints on the value of each cell. ObjTables supports three modes of encoding relationships into cells in tables.
  • Columns for relationships among objects represented by entries in tables: Relationships from one (primary) object to other (related) objects can be captured by (a) incorporating a column that represents a unique key for each related object into the table that represents the related objects and (b) encoding the keys for the related objects as a comma-separated list into a column in the table that represents the primary objects.
  • Embedded tables for *-to-one relationships: To help users encode complex datasets into a minimal number of tables, ObjTables can also encode instances of related classes into groups of columns. ObjTables uses merged headings to distinguish these columns.
  • Embedded grammars for relationships: To help users encode complex datasets into a minimal number of tables, grammars can be used to encode instances of related classes into a single column. These grammars can be defined declaratively in EBNF format using Lark .

Numerous data types

ObjTables provides numerous data types, including for mathematics, science, chemoinformatics, and genomics.

Tabular format for datasets

The format includes syntax for declaring which cells represent each table, instance, and attribute; declaring which entries represent metadata such as the date that a table was updated; and declaring which entries represent comments.

Software tools for working with tabular datasets

ObjTables includes a web application, a REST API, a command-line program, and a Python package for working with datasets. These tools can be used to pretty print, validate, compare, revision, and migrate datasets.

Python package for additional flexibility

For more flexibility, the Python package can be used to incorporate custom data types, define custom validation, query, and analyze datasets.

Software tools

Web app

A web app is available at objtables.org/app.

REST API

A REST API is available at objtables.org/api.

Command-line program

A command-line program is available from PyPI .

Python package

A Python package is available from PyPI .

The Python package provides more flexibility than the web app, REST API, and CLI for custom data types and custom validation. The Python package is also best-suited for analyzing datasets.

Docker image

A Dockerfile for building a Docker image is available from GitHub .

Source code

The source code is available from GitHub .

Use cases

ObjTables was designed for uses cases where users need to quickly view, edit, validate, analyze, and share complex data.

Building, validating and analyzing complex datasets and models

Many fields aim to understand how behaviors emerge from complex networks. This often requires integrating diverse data about different parts of the network. For example, systems biology aims to understand how cellular behavior emerges from genotype, often using genomics, biochemical, and other data. Excel is a popular tool for merging data because it's flexible and easy to use. However, Excel only supports a few data types, and Excel has limited support for multi-dimensional data. In addition, it is difficult to debug and analyze Excel workbooks.

By combining Excel with schemas, ObjTables makes it easy to build, validate, and analyze complex datasets: (a) users can use Excel to assemble diverse data into tables, (b) users can quickly define schemas for their data, and (c) users can use these schemas to validate their data and parse their data into data structures suitable for further analysis in languages such as Python. For example, we have used ObjTables to build integrated datasets of the biochemistry of Mycoplasma pneumoniae and H1 human embryonic stem cells.

ObjTables also makes it easy to build datasets iteratively over time by helping users revision data with Git and migrate their data as they revise their schemas.

Defining formats for new types of data and models

New areas of science often require new types of data and new kinds of models. In turn, this often requires new formats to capture these data and models and new software for working with these formats, including new tools for parsing and validating data and models described in these formats. Creating these formats is often an obstacle for new domains that have limited resources. Evolving these formats as new approaches emerge is also challenging because this often requires updating the software tools for the format and converting old files to the revised format.

ObjTables addresses this issue by making it easy to define schemas for domain-specific data and providing software tools for parsing, manipulating, and validating data encoded in these schemas. For example, we have used ObjTables to create, WC-KB , a format for the experimental omics, biochemical, and physiological data needed to model cellular biochemistry. We have also used ObjTables to create, WC-Lang , a format for whole-cell models of all of the biochemical activity in a cell. Creating these formats required minimal code.

Publishing re-usable supplementary materials

Although supplementary materials often contain valuable data, supplementary materials are underutilized because they are often provided in custom formats that are difficult to understand, parse, and re-use.

ObjTables addresses this issue by enabling authors to publish materials in a tabular format which can easily be read by humans and computers: (a) ObjTables enables authors to pretty print their data with tables of contents and inline help, (b) ObjTables enables authors to provide schemas for parsing their data, and (c) ObjTables enables readers to use these schemas to parse and analyze published data with minimal effort. Together, this makes it easier for authors to publish supplementary materials that are easy for others to re-use for additional studies.

Sharing re-usable data and models

Researchers often need to send their collaborators new datasets and models that cannot be described in any existing format. This often requires collaborators to write custom codes to parse these custom datasets and models. The substantial effort needed to write these codes is a frequent barrier to collaboration.

ObjTables makes it easier to share re-usable data and models with collaborators by (a) enabling researchers to rigorously describe the structure of their data or model with a schema, (b) enabling researchers to capture metadata about their data or model, (c) providing researchers software tools for validating their data, and (d) enabling collaborators to use these schemas to parse data from their colleagues quickly.

Examples, tutorials, documentation, and help

Examples

The documentation contains several example schemas and datasets .

CLI and Python package installation

Installation instructions for the command-line program and Python package are available at docs.karrlab.org. A Dockerfile for building an Ubuntu Linux image with ObjTables is available from the ObjTables Git repository .

Tutorials for the Python package

A Jupyter notebook with interactive tutorials is available at sandbox.karrlab.org .

Docs for the schema and dataset formats

Documentation for the formats for schemas and the formats for datasets is available at objtables.org/docs.

Docs for the REST API

Documentation for the REST API is available at objtables.org/api.

Docs for the command-line program

Documentation for the command-line program is available inline by running obj-tables --help.

Docs for the Python package

An introduction to the Python package is available at objtables.org/docs. Detailed documentation is available at docs.karrlab.org.

Further help

Please contact the Karr Lab with any questions.

Contributing to ObjTables

We welcome contributions to ObjTables! To contribute, please submit a GitHub pull request or contact us by email .

About ObjTables

License

ObjTables is released under the MIT license .

Citing ObjTables

Coming soon!

Questions/comments

Please contact the Karr Lab with any questions or comments.

Team

ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, US and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the Institut National de la Recherche Agronomique in Jouy en Josas, FR.

  • Jonathan Karr
  • Arthur Goldberg
  • Wolfram Liebermeister
  • John Sekar
  • Bilal Shaikh

Funding

ObjTables was supported by a National Institute of Health P41 award , a National Institute of Health MIRA R35 award , and a National Science Foundation INSPIRE award .