Tutorial

This guide shows the steps of creating a simple Intermediate Data Schema (IDS) using ts-ids-core.

For a shorter example to get up and running, see Quickstart. This page introduces the main features of working with ts-ids-core, with links to other pages which go into more detail for specific features like Defining a programmatic IDS and Working with programmatic IDS instances.

We recommend reading through the following background readings to get a good sense of the underlying functionality of ts-ids-core.

Background Reading

IDS classes are derived from ts_ids_core.base.ids_element.IdsElement classes, which are themselves derived from pydantic.BaseModel. As such, each programmatic IDS class is essentially a data class. Users should have a good grasp on pydantic before defining IDS programmatically. Fortunately the pydantic documentation is excellent – but extensive. Users are advised to read the following sections of the pydantic documentation or at least read the examples in the following sections prior to creating Programmatic IDSs. Beyond this background reading, we encourage users to get familiar with the vast features of Pydantic to make full use of your IDS models. ts-ids-core does not restrict Pydantic’s functionality, rather it enforces using it in a way where the resulting schema(s) are compatible with the TetraScience platform.

Pydantic introduction

Models

Note

Every section below can be skimmed; these sections are just those most relevant to writing IDS.

Fields

Types

Validators

Integrations

Instrument data

For this tutorial, we’ll create an IDS for sensor readings exported from a hypothetical measurement device which exports data in the following CSV format. We will store this data in a file called tutorial_example_data.csv.

name,voltage/V
meas01,0.10
meas02,0.23

The code in the sections below can be copied and run in a Python environment where ts-ids-core is installed.

Defining an IDS element

IDSs are made by combining IDS elements, which each represent an object in the schema. To define an IDS element, create a class which inherits from IdsElement.

Each field needs a name and a type annotation, and additional metadata like description can optionally be added using the IdsField function.

from ts_ids_core.base.ids_element import IdsElement
from ts_ids_core.base.ids_field import IdsField

class SensorResult(IdsElement):
    name: str
    voltage: float = IdsField(description="Example description")

This defines a basic object which can store a single name and voltage value from the example data.

For more information about defining fields, such as creating fields which can be null, required fields, or fields with a constant value, see Defining a programmatic IDS.

Using components

Components are IDS elements which are intended to be reused across multiple IDSs because they define a common group of metadata or data fields.

In the class above, voltage has the type float. But the voltage column of the CSV contains more information which we would now like to capture: it has a measurement unit, and the raw string representation can carry additional information which we want to capture (a trailing 0).

We could create a new IdsElement class to store these additional fields, but in this case there is a common component we can use: the voltage field can be changed to a RawValueUnit. This component contains the fields raw_value, value and unit, which will let us store the original raw voltage value from the csv as a string in the raw_value field, the parsed numeric voltage value as a float in the value field, and the unit extracted from the column header in the unit field.

from ts_ids_core.base.ids_element import IdsElement
from ts_ids_core.schema.value_unit import RawValueUnit

class SensorResult(IdsElement):
    name: str
    voltage: RawValueUnit

Using components in this way leads to concise IDS definitions where we can reuse common schema designs instead of having to recreate similar schemas from scratch. This has the additional benefit that downstream applications may also reuse logic for handling IDS data containing these components.

For more common components which can be used in any schema, see the Components section.

For domain-specific components which apply to domains like Chromatography or Plate Readers, see ts-ids-components.

Defining a complete IDS

Now we can put the SensorResult element into a complete IDS.

To create a complete IDS, create a class which inherits from IdsSchema. In this class, we need to define some top-level metadata which is used by the Tetra Data Platform and other applications to identify this IDS: schema_extra_metadata, ids_namespace, ids_type and ids_version.

Then we can add a field containing a list of SensorResult objects (one for each row of the CSV data), and the IDS definition is complete:

from typing import ClassVar, List, Literal

from ts_ids_core.annotations import Required
from ts_ids_core.base.ids_element import IdsElement, SchemaExtraMetadataType
from ts_ids_core.schema import IdsField, IdsSchema, RawValueUnit

class SensorResult(IdsElement):
    """A result of a sensor measurement."""

    name: str
    voltage: RawValueUnit

class DemoIDS(IdsSchema):
    """A demonstration schema."""

    schema_extra_metadata: ClassVar[SchemaExtraMetadataType] = {
        "$id": "https://ids.tetrascience.com/private-demo/demo-sensor/v1.0.0/schema.json",
        "$schema": "http://json-schema.org/draft-07/schema#",
    }

    ids_namespace: Required[Literal["private-demo"]] = IdsField(
        default="private-demo", alias="@idsNamespace"
    )
    ids_type: Required[Literal["demo-sensor"]] = IdsField(
        default="demo-sensor", alias="@idsType"
    )
    ids_version: Required[Literal["v1.0.0"]] = IdsField(
        default="v1.0.0", alias="@idsVersion"
    )

    results: List[SensorResult]

Exporting JSON Schema

To export the IDS to JSON Schema used by the Tetra Data Platform, you can use the IdsElement method model_json_schema. For example:

import json
from pathlib import Path
from typing import Any, Dict

model_schema: Dict[str, Any] = DemoIDS.model_json_schema()

json_schema = json.dumps(model_schema, indent=2)

output_path = Path(__file__).parent.joinpath("schema.json")
output_path.write_text(json_schema)

This creates a file called schema.json containing the full JSON Schema, shown below.

Expand to show output
{
  "$id": "https://ids.tetrascience.com/private-demo/demo-sensor/v1.0.0/schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "additionalProperties": false,
  "description": "A demonstration schema.",
  "properties": {
    "@idsType": {
      "const": "demo-sensor",
      "type": "string"
    },
    "@idsVersion": {
      "const": "v1.0.0",
      "type": "string"
    },
    "@idsNamespace": {
      "const": "private-demo",
      "type": "string"
    },
    "results": {
      "items": {
        "$ref": "#/definitions/SensorResult"
      },
      "type": "array"
    }
  },
  "required": [
    "@idsType",
    "@idsVersion",
    "@idsNamespace"
  ],
  "type": "object",
  "definitions": {
    "RawValueUnit": {
      "additionalProperties": false,
      "description": "A value with a unit, including the raw representation of the value from the primary data.",
      "properties": {
        "value": {
          "description": "A numerical value.",
          "type": [
            "number",
            "null"
          ]
        },
        "unit": {
          "description": "Unit for the numerical value.",
          "type": [
            "string",
            "null"
          ]
        },
        "raw_value": {
          "description": "The raw, untransformed value from the primary data.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "value",
        "unit",
        "raw_value"
      ],
      "type": "object"
    },
    "SensorResult": {
      "additionalProperties": false,
      "description": "A result of a sensor measurement.",
      "properties": {
        "name": {
          "type": "string"
        },
        "voltage": {
          "$ref": "#/definitions/RawValueUnit"
        }
      },
      "type": "object"
    }
  }
}

JSON Schema may be similarly exported via the export-schema script. For further info, see How to export schema.json from a programmatic IDS.

Creating an IDS instance

We can now create some code which takes the reads the primary data (CSV file) and transforms it into an instance of DemoIDS. This is a small example of the kind of code which runs in task scripts on the Tetra Data Platform to transform raw data to IDS data.

import csv
from pathlib import Path

project_root = Path(__file__).parent

def create_instance(input_file: Path) -> DemoIDS:
    """Transform a CSV containing name and voltage columns into an IDS instance."""
    results = []

    with open(input_file, "r") as f:
        records = csv.DictReader(f)

        # Create a `SensorResult` from each row of the csv
        for record in records:
            results.append(
                SensorResult(
                    name=record["name"],
                    voltage=RawValueUnit(
                        # With real data, `float` would fail for non-numeric values
                        value=float(record["voltage/V"]),
                        raw_value=record["voltage/V"],
                        unit="Volt",
                    ),
                )
            )
    # Put all the results into the `DemoIDS`
    return DemoIDS(results=results)

# Create an IDS instance from example data stored in a csv file
instance = create_instance(project_root / "tutorial_example_data.csv")

# We can now access values which have been defined in the instance
assert instance.results[0].voltage.value == 0.1

# We can see how this data looks as a dict by calling `model_dump`
assert instance.model_dump() == {
    "@idsNamespace": "private-demo",
    "@idsType": "demo-sensor",
    "@idsVersion": "v1.0.0",
    "results": [
        {
            "name": "meas01",
            "voltage": {"value": 0.1, "raw_value": "0.10", "unit": "Volt"},
        },
        {
            "name": "meas02",
            "voltage": {"value": 0.23, "raw_value": "0.23", "unit": "Volt"},
        },
    ],
}

For more information see Working with programmatic IDS instances.