Components¶

The data captured from different sources in the Tetra Data Platform share many commonalities. For example, many data sources contain metadata about the users involved in producing the data, or systems metadata like the name of the vendor who provides that system. These commonalities lead to reusable patterns across IDSs, which are modeled as components.

Components are parts of IDSs which can be reused across multiple IDSs. Typically, a component is a schema for an object, including the properties, data types, descriptions and other schema metadata for a self-contained concept like users.

With components, an IDS can be created as a combination of components along with any system-specific fields which aren’t captured by a component. This simplifies IDS design through reuse. This also simplifies down-stream data access, because any integration which accesses data from a component can be applied to multiple IDSs.

Components correspond with Recommended Labels and Key Context Terms¶

Many Tetra Data components represent the same concepts available in Tetra’s Recommended Labels and Key Context Terms available in the TetraConnect Hub. For example, when initially onboarding instrument data into the Tetra Data Lake, the sample_id label may be used to add context to the data. When this data is harmonized as Tetra Data, its IDS may contain the corresponding Sample component’s ID field to capture the same concept.

Terminology¶

Common component¶

Components which are shared across IDSs for standardization of consumption and analysis across different, heterogeneous datasets (e.g. for utilization measurements, improved search retrieval on platform, data integrity for times and values, etc.)

Examples are included on this page.

Domain-specific component¶

Components which are used for standardization across different data sets within a specific grouping or analysis (e.g. within chromatography analysis or methods, plate reader results, etc.)

These domain-specific components are available in the package ts-ids-components. For example: chromatography methods and peaks, and plate reader components.

Schema-specific field¶

Data elements specific to a data source (instrument or software-specific parameters, etc.)

These fields are defined for each IDS separately and are independent of the components available in ts-ids-core and ts-ids-components. For example: Waters Empower contains a concept eCord which is specific to Empower.

Using common components programmatically¶

Components can be imported from ts_ids_core, ts_ids_components, or any other package containing IDS components (for example, a custom components package created by a Self-Service Pipelines user of the Tetra Data Platform).

For example, to use the ts_ids_core.schema.User component class, import it and include it in an IDS element:

from typing import List

from ts_ids_core.base.ids_element import IdsElement
from ts_ids_core.schema import User

class MyIDS(IdsElement):
    users: List[User]

The sections below document the common components included with ts-ids-core. For domain-specific components, see the documentation for ts-ids-components.

Note

The following sections are Tetra Data conventions for using components. The benefit of following these conventions is it will help strike consistency between IDSs. However, with the exception of DataCube, when inheriting from IdsSchema these conventions will not be enforced and it is up to the user’s discretion if they want to follow these conventions. The DataCube field must be defined as it is stated below as this a special case in which the platform requires a specific definition.

DataCube¶

datacubes is a top-level field which must be defined as an array of the DataCube component, or a class which inherits from that component and extends it or modifies the dimensionality and number of measures. For example: datacubes: List[DataCube].

Note that there are platform requirements which apply to datacubes which are met by using this component. When extending the datacubes schema, keep these in mind:

datacubes[*].dimensions[*].scale type can only be number or ["number", "null"]
Dimensions relate to measure in an “outside-in” relationship. i.e. accessing data from measures follows datacubes[*].measures[*].value[dim_0_idx][dim_1_idx]...[dim_n_idx]
You have to specify minItems and maxItems for measures and dimensions like the example below to describe the dimensionality of your datacubes. The number of items in the datacube has to be fixed by minItems and maxItems, namely minItems and maxItems have to be the same value and equal, otherwise Athena won’t work. This is a known limitation.
All the data cubes should have the same structure. Namely, the same number of measures and dimensions. For example, if there are two datacubes, they can NOT be a 2-dimensional cube and a 3-dimensional cube

Example datacubes:

datacubes component - Example

"datacubes": [{
  "name": "3D chromatogram",
  "description": "More information about the data cube. (Optional)",
  "id": "optional identifier",
  "measures": [{
    "name": "intensity",
    "unit": "ArbitraryUnit",
    "value": [
      [111, 112, 113, 114, 115],
      [221, 222, 223, 224, 225],
      [331, 332, 333, 334, 335]
    ]
  }],
  "dimensions": [{
    "name": "wavelength",
    "unit": "Nanometer",
    "scale": [180, 190, 200]
  }, {
      "name": "time",
      "unit": "MinuteTime",
      "scale": [1, 2, 3, 4, 5]
  }]
}]

You can check the public-facing document for more details on datacubes:

How IDS JSONs will be indexed into SQL tables: https://developers.tetrascience.com/docs/sql-tables
datacubes overview: https://developers.tetrascience.com/docs/understanding-data-cubes

Advanced Material¶

datacubes - Advanced Material

The data cube structure is designed to maximize storage efficiency. It will be converted to a CSV/Parquet that looks like the following

wavelength, time, intensity
1 111
2 112
3 113
1 221
2 222
3 223
1 331
2 332
3 333

The conversion to CSV/Parquet will make the size the file larger, and here is the comparison for a 2-dimensional data cube of size N * M, for example, intensity vs wavelength & time

In the IDS JSON, the size is proportional to O(N _ M) + O(M) + O(N) ~ O(N _ M), when N and M are large

In the CSV/Parquet structure, the size is proportional to O(3 _ N _ M)

Thus expect a 3 times increase in file size. But of course, Parquet/CSV has higher potential to compress.

DataCubeMetadata¶

DataCubeMetadata is an alternative component for storing DataCube data and metadata. Dimension and measure metadata is stored in the IDS instance, and a file_id field references the file ID of a file in the Tetra Data Lake which stores the dimension and measure values, such as a Parquet file.

By separating DataCube metadata and data between separate files in the data lake, the IDS JSON has a smaller file size, making it easier to work with for large data sets.

To define an IDS which uses the DataCubeMetadata model, create a top level IDS field datacube_metadata using this component, for example:

from typing import ClassVar, List, Literal

from ts_ids_core.annotations import Required
from ts_ids_core.base.ids_element import SchemaExtraMetadataType
from ts_ids_core.base.ids_field import IdsField
from ts_ids_core.schema import DataCubeMetadata, IdsSchema

class DemoIdsSchema(IdsSchema):
    schema_extra_metadata: ClassVar[SchemaExtraMetadataType] = {
        "$id": "https://ids.tetrascience.com/common/example-demo/v1.0.0/schema.json",
        "$schema": "http://json-schema.org/draft-07/schema#",
    }

    ids_namespace: Required[Literal["common"]] = IdsField(
        default="common", alias="@idsNamespace"
    )
    ids_type: Required[Literal["example-demo"]] = IdsField(
        default="example-demo", alias="@idsType"
    )
    ids_version: Required[Literal["v1.0.0"]] = IdsField(
        default="v1.0.0", alias="@idsVersion"
    )

    datacube_metadata: List[DataCubeMetadata]

Consider using Parquet as the file format to store datacube data. Parquet is a column-oriented data file format designed for efficient data storage and retrieval. The Parquet format is compatible with a range of software and programming languages, which makes it easy to integrate with.

Populating `DataCubeMetadata` in a task script¶

Create a Parquet file.
Write the Parquet file to the data lake using context.write_file, which returns a file pointer.
Create a DataCubeMetadata instance, using the Parquet file pointer’s fileId for the file_id field.
Assign the DataCubeMetadata instance to the datacube_metadata field in the top level of the IDS instance.

The following example shows a DataCubeMetadata instance being populated in the DemoIdsSchema class above.

from ts_ids_core.schema import DataCubeMetadata, DimensionMetadata, MeasureMetadata

instance = DemoIdsSchema(
    datacube_metadata=[
        DataCubeMetadata(
            index=0,
            name="Example Datacube",
            dimensions=[
                DimensionMetadata(
                    name="wavelength",
                    unit="Nanometer",
                ),
                DimensionMetadata(
                    name="time",
                    unit="SecondTime",
                ),
            ],
            measures=[
                MeasureMetadata(
                    name="intensity",
                    unit="ArbitraryUnit",
                ),
            ],
            file_id="f0cd3e85-0e20-42ec-9a87-9774422c51c5",
        )
    ]
)

The above example data as JSON

{
  "@idsType": "example-demo",
  "@idsVersion": "v1.0.0",
  "@idsNamespace": "common",
  "datacube_metadata": [
    {
      "index": 0,
      "name": "Example Datacube",
      "measures": [
        {
          "name": "intensity",
          "unit": "ArbitraryUnit"
        }
      ],
      "dimensions": [
        {
          "name": "wavelength",
          "unit": "Nanometer"
        },
        {
          "name": "time",
          "unit": "SecondTime"
        }
      ],
      "file_id": "f0cd3e85-0e20-42ec-9a87-9774422c51c5"
    }
  ]
}

Accessing DataCube data stored in Parquet files¶

The Parquet file can be downloaded directly from the Tetra Data Lake and used with software or languages which have support for reading Parquet files.

To download Parquet files, the file_id field can be used with the retrieve file API endpoint.

Systems¶

systems is a top-level field which must be defined as an array of the System component, or a class which inherits from that component and extends it. For example: systems: List[System].

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "Metadata regarding the equipment, software, and firmware used in a run of an\ninstrument or experiment.",
  "properties": {
    "vendor": {
      "description": "The instrument vendor or manufacturer, like 'PerkinElmer' or 'Agilent'.",
      "type": [
        "string",
        "null"
      ]
    },
    "model": {
      "description": "A specific model instrument type from a vendor.",
      "type": [
        "string",
        "null"
      ]
    },
    "type": {
      "description": "Indicates the type of instrument that's generating data.",
      "type": [
        "string",
        "null"
      ]
    }
  },
  "required": [
    "vendor",
    "model",
    "type"
  ],
  "type": "object"
}

The following common components, Firmware and Software, are commonly used to extend the ts_ids_core.schema.system.System component.

Additionally, it is common to include a field named components to describe the hardware components which are part of the system. Each instrument’s components schemas are different so this field does not have a corresponding common component.

Firmware¶

firmware is a top-level field which must be defined as an array of the Firmware component, or a class which inherits from that component and extends it.

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "System firmware metadata.",
  "properties": {
    "name": {
      "description": "Firmware name.",
      "type": [
        "string",
        "null"
      ]
    },
    "version": {
      "description": "Firmware version.",
      "type": [
        "string",
        "null"
      ]
    }
  },
  "required": [
    "name",
    "version"
  ],
  "type": "object"
}

Software¶

software is a top-level field which must be defined as an array of the Software component, or a class which inherits from that component and extends it.

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "Software application that most recently handled the data (file) or the application\nthe data (file) is intended for. For example, applications can include Electronic\nLab Notebooks (ELN), Instrument Control Software (ICS), Chromatography Data Systems\n(CDS), or instrument-specific analysis software.",
  "properties": {
    "name": {
      "description": "Software name.",
      "type": [
        "string",
        "null"
      ]
    },
    "version": {
      "description": "Software version.",
      "type": [
        "string",
        "null"
      ]
    }
  },
  "required": [
    "name",
    "version"
  ],
  "type": "object"
}

Users¶

users is a top-level field which must be included in every Tetra Data IDS. This field must be defined as an array of the User component, or a class which inherits from that component and extends it. For example: users: List[Users].

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "Metadata of the user executing a run.",
  "properties": {
    "id": {
      "description": "Unique identifier assigned to a user.",
      "type": [
        "string",
        "null"
      ]
    },
    "name": {
      "description": "User name.",
      "type": [
        "string",
        "null"
      ]
    },
    "type": {
      "description": "User type like 'admin', 'manager', 'power user', 'standard user'. This information is usually from the instrument software",
      "type": [
        "string",
        "null"
      ]
    }
  },
  "type": "object"
}

Projects¶

projects is a top-level field which must be defined as an array of the ProjectAttributes component, or a class which inherits from that component and extends it. For example: projects: List[ProjectAttributes].

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A set of fields which uniquely identify and describe a particular initiative and methodologies used to produce the data in a given IDS. These attributes are commonly found in ELN and LIMS applications and allow users to organize data to associate related datasets.",
  "properties": {
    "project": {
      "$ref": "#/definitions/Project",
      "description": "Project metadata."
    },
    "experiment": {
      "$ref": "#/definitions/Experiment",
      "description": "Experiment metadata."
    },
    "assay": {
      "$ref": "#/definitions/Assay",
      "description": "Assay metadata."
    }
  },
  "type": "object",
  "definitions": {
    "Assay": {
      "additionalProperties": false,
      "description": "An Assay is an analytical measurement procedure that produces a detectable signal, allowing a process to be qualified and quantified.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to an assay.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "A human-readable name given to the assay.",
          "type": [
            "string",
            "null"
          ]
        },
        "description": {
          "description": "A human-readable description given to the assay",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Experiment": {
      "additionalProperties": false,
      "description": "An Experiment is a scientific procedure to investigate a specific hypothesis or a research question. The primary and derived scientific data is used to test the hypothesis, or to provide insight into a particular process. An Experimental entry typically contains additional context, such as purpose, materials, method, and conclusions.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a specific experiment conducted within a project. Most often generated within an electronic laboratory notebook (ELN).",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "A human-readable name given to the experiment.",
          "type": [
            "string",
            "null"
          ]
        },
        "description": {
          "description": "A human-readable description given to the experiment.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Project": {
      "additionalProperties": false,
      "description": "A Project is a scientific or business program or initiative. A Project ID can be used to associate with the entire set of primary and derived scientific data from every experiment performed to advance a particular initiative, such as the development of an assay or a drug product.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a project.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "A human-readable name given to the project.",
          "type": [
            "string",
            "null"
          ]
        },
        "description": {
          "description": "A human-readable description given to the project.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    }
  }
}

Runs¶

runs is a top-level field which must be defined as an array of the Run component, or a class which inherits from that component and extends it. For example: runs: List[Run].

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A Run refers to a discrete period of time in which a performed process generates one or more data points for either a single or several related samples or generates a physical product. A Run typically refers to a particular execution of an instrument.",
  "properties": {
    "id": {
      "description": "Unique identifier assigned to a specific run (execution) of an experiment.",
      "type": [
        "string",
        "null"
      ]
    },
    "name": {
      "description": "Name assigned to a specific run (execution) of an experiment.",
      "type": [
        "string",
        "null"
      ]
    },
    "logs": {
      "description": "Log messages recorded during a specific run (execution) of an experiment.",
      "items": {
        "type": "string"
      },
      "type": "array"
    }
  },
  "type": "object"
}

Samples¶

samples is a top-level field which must be included in every Tetra Data IDS. This field must be defined as an array of the Sample component. For example: samples: List[Sample].

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A Sample is a discrete entity being observed in an experiment. For example, Samples may be characterized for product quality and stability, or be measured for research purposes.",
  "properties": {
    "id": {
      "description": "Unique identifier assigned to a sample.",
      "type": [
        "string",
        "null"
      ]
    },
    "name": {
      "description": "Sample name.",
      "type": [
        "string",
        "null"
      ]
    },
    "barcode": {
      "description": "Barcode assigned to a sample.",
      "type": [
        "string",
        "null"
      ]
    },
    "batch": {
      "$ref": "#/definitions/Batch"
    },
    "set": {
      "$ref": "#/definitions/Set",
      "description": "Sample set."
    },
    "location": {
      "$ref": "#/definitions/Location",
      "description": "Sample location information."
    },
    "compound": {
      "$ref": "#/definitions/Compound",
      "description": "Sample compound information."
    },
    "properties": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/Property"
      },
      "description": "Sample properties."
    },
    "labels": {
      "description": "Sample labels.",
      "items": {
        "$ref": "#/definitions/Label"
      },
      "type": "array"
    }
  },
  "type": "object",
  "definitions": {
    "Batch": {
      "additionalProperties": false,
      "description": "A Batch is the result of a single manufacturing run for a drug product that is made as specified groups or amounts,  within a specific time frame from the same raw materials that is intended to have uniform character and quality, within specified limits.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a batch.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Batch name",
          "type": [
            "string",
            "null"
          ]
        },
        "barcode": {
          "description": "Barcode assigned to a batch",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Compound": {
      "additionalProperties": false,
      "description": "A Compound is a specific chemical or biochemical structure or substance that is being investigated. A Compound may be any drug substance, drug product intermediate, or drug product across small molecules, and cell and gene therapy (CGT).",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a compound.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Compound name.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Holder": {
      "additionalProperties": false,
      "description": "A sample container such as a microplate or a vial.",
      "properties": {
        "name": {
          "description": "Holder name.",
          "type": [
            "string",
            "null"
          ]
        },
        "type": {
          "description": "Holder type.",
          "type": [
            "string",
            "null"
          ]
        },
        "barcode": {
          "description": "Barcode assigned to a holder.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Label": {
      "additionalProperties": false,
      "description": "A Label associated with a sample, along with metadata about the label including\nthe source of the label and times associated with the label such as when it was\ncreated or looked up.",
      "properties": {
        "source": {
          "$ref": "#/definitions/Source",
          "description": "Sample label data source information."
        },
        "name": {
          "description": "Sample label name.",
          "type": "string"
        },
        "value": {
          "description": "Sample label value.",
          "type": "string"
        },
        "time": {
          "$ref": "#/definitions/SampleTime",
          "description": "Time associated with the sample label."
        }
      },
      "required": [
        "source",
        "name",
        "value",
        "time"
      ],
      "type": "object"
    },
    "Location": {
      "additionalProperties": false,
      "description": "The Location of the sample within the holder, such as the location of a well in a microplate.",
      "properties": {
        "position": {
          "description": "Raw position string.",
          "type": [
            "string",
            "null"
          ]
        },
        "row": {
          "description": "Row index of sample location in a plate or holder.",
          "type": [
            "number",
            "null"
          ]
        },
        "column": {
          "description": "Column index of sample location in a plate or holder.",
          "type": [
            "number",
            "null"
          ]
        },
        "index": {
          "description": "Index of sample location flattened to a single dimension.",
          "type": [
            "number",
            "null"
          ]
        },
        "holder": {
          "$ref": "#/definitions/Holder",
          "description": "Sample holder information"
        }
      },
      "type": "object"
    },
    "Property": {
      "additionalProperties": false,
      "description": "A property has a name and a value of any type, with metadata about the\nproperty including the source of the property and times associated with it\nsuch as when the property was created or looked up.",
      "properties": {
        "source": {
          "$ref": "#/definitions/Source",
          "description": "Sample property data source information."
        },
        "name": {
          "description": "Sample Property name.",
          "type": "string"
        },
        "value": {
          "description": "The original string value of the property.",
          "type": "string"
        },
        "value_data_type": {
          "$ref": "#/definitions/ValueDataType",
          "description": "This is the type of the original value."
        },
        "string_value": {
          "description": "If string_value has a value, then numerical_value, numerical_value_unit, and boolean_value all have to be null.",
          "type": [
            "string",
            "null"
          ]
        },
        "numerical_value": {
          "description": "If numerical_value has a value, then string_value and boolean_value both have to be null.",
          "type": [
            "number",
            "null"
          ]
        },
        "numerical_value_unit": {
          "description": "Unit for the numerical value.",
          "type": [
            "string",
            "null"
          ]
        },
        "boolean_value": {
          "description": "If boolean_value has a value, then numerical_value, numerical_value_unit, and string_value all have to be null.",
          "type": [
            "boolean",
            "null"
          ]
        },
        "time": {
          "$ref": "#/definitions/SampleTime",
          "description": "Time associated with the sample property."
        }
      },
      "required": [
        "source",
        "name",
        "value",
        "value_data_type",
        "string_value",
        "numerical_value",
        "numerical_value_unit",
        "boolean_value",
        "time"
      ],
      "type": "object"
    },
    "RawSampleTime": {
      "additionalProperties": false,
      "description": "The base model for time associated with a specific sample.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Raw sample data lookup time.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "lookup"
      ],
      "type": "object"
    },
    "SampleTime": {
      "additionalProperties": false,
      "description": "A model for experiment sample datetime values converted to a standard ISO format\nand their respective raw datetime values in the primary data.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Raw sample data lookup time.",
          "type": [
            "string",
            "null"
          ]
        },
        "raw": {
          "$ref": "#/definitions/RawSampleTime",
          "description": "Raw sample time values from primary data."
        }
      },
      "required": [
        "lookup"
      ],
      "type": "object"
    },
    "Set": {
      "additionalProperties": false,
      "description": "A group of Samples.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a set.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Set name.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Source": {
      "additionalProperties": false,
      "description": "The Source of information, such as a data file or a sample database.",
      "properties": {
        "name": {
          "description": "Source name.",
          "type": [
            "string",
            "null"
          ]
        },
        "type": {
          "description": "Source type.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "name",
        "type"
      ],
      "type": "object"
    },
    "ValueDataType": {
      "description": "Allowed data type values.",
      "enum": [
        "string",
        "number",
        "boolean"
      ],
      "type": "string"
    }
  }
}

The following Parameter component is used within the Samples definition under samples[*].properties[*].

Parameter¶

For user-defined parameters or dynamically typed primary data, like scouting_variables in AKTA, custom_fields in Empower, sample properties, etc. Use the Parameter component. This component allows modeling of a value in the primary data in which its data type is inconsistent but one of "string", "number", or "boolean".

Expand to see the JSON Schema for this component.

"<your field name>": {
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "key": {
        "type": "string",
        "description": "this is the property name"
      },
      "value": {
        "type": "string",
        "description": "the original string value of the parameter from the raw file"
      },
      "value_data_type": {
        "type": "string",
        "enum": ["string", "number", "boolean"],
        "description": "this is the true type of the original value"
      },
      "string_value": {
        "type": ["string", "null"],
        "description": "if string_value has a value, then numerical_value, numerical_value_unit and boolean_value have to be null"
      },
      "numerical_value": {
        "type": ["number", "null"],
        "description": "if numerical_value has a value, then string_value and boolean_value have to be null"
      },
      "numerical_value_unit": {
        "type": ["string", "null"]
      },
      "boolean_value": {
        "type": ["boolean", "null"],
        "description": "if boolean_value has a value, then numerical_value, numerical_value_unit and string_value have to be null"
      }
    }
  }
}

Example:

Parameter component - Example

Consider the raw value "128 mg/ml". Using the Parameter component would capture the data as follows:

"<location of parameter>": [
  {
    "key": "concentration",
    "value": "128 mg/ml",
    "value_data_type": "number",
    "string_value": null,
    "numerical_value": 128,
    "numerical_value_unit": "MilligramPerMilliliter",
    "boolean_value": null
  }
]

Time¶

time fields use the non-extendable Time component. This non-top-level field can be used to capture common datetime values in both ISO 8601 compliant and raw formats found in primary data.

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A model for datetime values converted to a standard ISO format and their\nrespective raw datetime values in the primary data.",
  "properties": {
    "start": {
      "description": "Process/experiment/task start time.",
      "type": [
        "string",
        "null"
      ]
    },
    "created": {
      "description": "Data created time.",
      "type": [
        "string",
        "null"
      ]
    },
    "stop": {
      "description": "Process/experiment/task stop/finish time.",
      "type": [
        "string",
        "null"
      ]
    },
    "duration": {
      "description": "Process/experiment/task duration.",
      "type": [
        "string",
        "null"
      ]
    },
    "last_updated": {
      "description": "Data last updated time of a file/method.",
      "type": [
        "string",
        "null"
      ]
    },
    "acquired": {
      "description": "Data acquired/exported/captured time.",
      "type": [
        "string",
        "null"
      ]
    },
    "modified": {
      "description": "Data last modified/edited time.",
      "type": [
        "string",
        "null"
      ]
    },
    "lookup": {
      "description": "Data lookup time.",
      "type": [
        "string",
        "null"
      ]
    },
    "raw": {
      "$ref": "#/definitions/RawTime",
      "description": "Raw time values from primary data."
    }
  },
  "type": "object",
  "definitions": {
    "RawTime": {
      "additionalProperties": false,
      "description": "The base model for capturing common time fields found in primary data.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Data lookup time.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    }
  }
}

SampleTime¶

Similar to the above Time component, SampleTime is used in the time definition for Samples. It is equivalent to the Time component, except the SampleTime lookup time fields are required. This non-top-level definition of time is intended for use in the above Samples component.

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A model for experiment sample datetime values converted to a standard ISO format\nand their respective raw datetime values in the primary data.",
  "properties": {
    "start": {
      "description": "Process/experiment/task start time.",
      "type": [
        "string",
        "null"
      ]
    },
    "created": {
      "description": "Data created time.",
      "type": [
        "string",
        "null"
      ]
    },
    "stop": {
      "description": "Process/experiment/task stop/finish time.",
      "type": [
        "string",
        "null"
      ]
    },
    "duration": {
      "description": "Process/experiment/task duration.",
      "type": [
        "string",
        "null"
      ]
    },
    "last_updated": {
      "description": "Data last updated time of a file/method.",
      "type": [
        "string",
        "null"
      ]
    },
    "acquired": {
      "description": "Data acquired/exported/captured time.",
      "type": [
        "string",
        "null"
      ]
    },
    "modified": {
      "description": "Data last modified/edited time.",
      "type": [
        "string",
        "null"
      ]
    },
    "lookup": {
      "description": "Raw sample data lookup time.",
      "type": [
        "string",
        "null"
      ]
    },
    "raw": {
      "$ref": "#/definitions/RawSampleTime",
      "description": "Raw sample time values from primary data."
    }
  },
  "required": [
    "lookup"
  ],
  "type": "object",
  "definitions": {
    "RawSampleTime": {
      "additionalProperties": false,
      "description": "The base model for time associated with a specific sample.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Raw sample data lookup time.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "lookup"
      ],
      "type": "object"
    }
  }
}

FilePointer¶

Some RAW files contain large data sets that cannot or should not be stored in an IDS instance, such as associated image files or large proprietary files. Instead, these files can be referenced within the IDS file through a pointer. A pointer always contains the following fields and information:

{
  "fileId": "<the uuid for each file in ts data lake, uuid>",
  "fileKey": "<s3 file key>",
  "version": "<s3 file version number>",
  "bucket": "<the s3 data lake bucket>",
  "type": "s3file"
}

Usually you won’t manually define the pointer. A file pointer can be returned after successful API calls like context.write_file.

The structure of this information is critical for two reasons:

Interacting with the platform within task-scripts and protocols:
- For protocol artifacts, a file pointer is returned when calling workflow.getContext('inputFile') in script.js. This pointer can then be passed into the task-script for file reading.
- It can be passed to context.read_file for reading from AWS S3. This can be helpful for reading raw files in task-scripts.
- A file pointer is returned when a successful call is made by context.write_file or context.add_attributes. This return value can be helpful in a task-script when writing an IDS to the platform and tagging that IDS with metadata.
These keys are used to identify files that are pointed to when converting IDS to ADF.

RelatedFile¶

related_files is a top-level field which must be defined as an array of the RelatedFile component. It cannot be extended. For example: related_files: List[RelatedFile]

The RelatedFile component uses the above FilePointer component along with file metadata. Usually the files are parsed from the same raw file like images, graphs, and parquet files that are too big or unsuitable to be stored in an IDS instance.

Expand to see the JSON Schema for this component.

{
  "additionalProperties": false,
  "description": "A reference to a file related to this IDS stored on the Tetra Data Platform.",
  "properties": {
    "name": {
      "description": "File name.",
      "type": [
        "string",
        "null"
      ]
    },
    "path": {
      "description": "File path.",
      "type": [
        "string",
        "null"
      ]
    },
    "size": {
      "$ref": "#/definitions/ValueUnit",
      "description": "File size."
    },
    "checksum": {
      "$ref": "#/definitions/Checksum",
      "description": "File checksum."
    },
    "pointer": {
      "$ref": "#/definitions/Pointer",
      "description": "File pointer to location on TDP."
    }
  },
  "required": [
    "pointer"
  ],
  "type": "object",
  "definitions": {
    "Checksum": {
      "additionalProperties": false,
      "description": "Checksum value and algorithm associated with a file.",
      "properties": {
        "value": {
          "description": "Checksum string value.",
          "type": "string"
        },
        "algorithm": {
          "description": "Checksum algorithm, e.g. 'md5', 'sha256'.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "value",
        "algorithm"
      ],
      "type": "object"
    },
    "Pointer": {
      "additionalProperties": false,
      "description": "A pointer stores the location metadata of the file on TDP.",
      "properties": {
        "fileKey": {
          "description": "AWS S3 file key.",
          "type": "string"
        },
        "version": {
          "description": "AWS S3 file version number.",
          "type": "string"
        },
        "bucket": {
          "description": "AWS S3 bucket.",
          "type": "string"
        },
        "type": {
          "description": "Type of the file, e.g. 's3file', 'parquet'.",
          "type": "string"
        },
        "fileId": {
          "description": "File ID (UUID) in TDP.",
          "type": "string"
        }
      },
      "required": [
        "fileKey",
        "version",
        "bucket",
        "type",
        "fileId"
      ],
      "type": "object"
    },
    "ValueUnit": {
      "additionalProperties": false,
      "description": "A quantity, represented by a value with a unit.",
      "properties": {
        "value": {
          "description": "A numerical value.",
          "type": [
            "number",
            "null"
          ]
        },
        "unit": {
          "description": "Unit for the numerical value.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "value",
        "unit"
      ],
      "type": "object"
    }
  }
}

Modifier¶

Sometimes values can come with a modifier such as “greater than”. For example, an array of values from an instrument could contain ["<1.0E-12", "6.026e-07"]. This data can be represented using the Modifier component, where the value and its optional modifier are stored in separate fields.

Example:

Modifier component - Example

[
  {
    "value": 1.0e-12,
    "modifier": "<"
  },
  {
    "value": 6.026e-7,
    "modifier": null
  }
]

And the schema is:

{
  "additionalProperties": false,
  "description": "A model to capture the numeric value and prefix (modifier) for a prefixed numeric string (e.g. '>1.0').",
  "properties": {
    "value": {
      "description": "Modifier value.",
      "type": "number"
    },
    "modifier": {
      "$ref": "#/definitions/ModifierType",
      "description": "Modifier type."
    }
  },
  "type": "object",
  "definitions": {
    "ModifierType": {
      "description": "An enumeration of observed modifiers in the primary data.",
      "enum": [
        "<",
        ">",
        "<=",
        ">=",
        null
      ],
      "type": [
        "string",
        "null"
      ]
    }
  }
}

Components¶

Components correspond with Recommended Labels and Key Context Terms¶

Terminology¶

Common component¶

Domain-specific component¶

Schema-specific field¶

Using common components programmatically¶

DataCube¶

Advanced Material¶

DataCubeMetadata¶

Populating DataCubeMetadata in a task script¶

Accessing DataCube data stored in Parquet files¶

Systems¶

Firmware¶

Software¶

Users¶

Projects¶

Runs¶

Samples¶

Parameter¶

Time¶

SampleTime¶

FilePointer¶

RelatedFile¶

Modifier¶

Populating `DataCubeMetadata` in a task script¶