Welcome to ts-ids-core’s documentation¶

Intermediate Data Schema (IDS) are schemas used to create queryable, harmonized Tetra Data which can be stored and accessed via the Tetra Data Platform (TDP). Using protocols published to TDP, RAW or primary data produced by a source system, typically an instrument or software, is processed to conform to the structure of a predefined IDS.

To extend the flexibility for users to harmonize data of their choice, TetraScience provides a way to publish your own protocols, task scripts, and IDSs using self-service pipelines (SSP). To create an end-to-end pipeline which can parse and transform RAW data and then make the data accessible using TDP functionality, Athena and elasticsearch, you must create protocol, task script and IDS artifacts. An IDS artifact must include an IDS definition; to create an IDS you must identify the structure of the RAW source system data and create a TDP compliant JSON schema which models the raw data in a format which is more easily consumed by downstream applications. If done manually, this process of creating an IDS can be time-consuming, error-prone, and risks incompatibility with TDP.

ts-ids-core provides a programmatic way of defining IDSs. An IDS defined using the ts-ids-core can be exported to IDS JSON (jsonschema v7) and is thus compatible with the TDP.

Predefined classes which are called “common components”, are defined within ts-ids-core and can be imported to help you build reusable IDS definitions. More about components can be found here. For domain specific components, see ts-ids-components.

Version¶

v2.1.0

Install¶

Note

ts-ids-core and ts-ids-components are privately hosted packages by TetraScience. When installing packages from more than one source which includes an untrusted source like PyPI, there is a risk of a security vulnerability called Dependency Confusion. Before you install ts-ids-core, please review the Installation Security section to protect you and your organization from Dependency Confusion attacks.

Once your package resolution solution is in place and you have to steps in place to prevent Dependency Confusion, install ts-ids-core like any other package. For example, using a package manager like poetry:

poetry add ts-ids-core

Quickstart¶

To define your own programmatic IDS, inherit from one of the top-level IDS classes in the ts_ids_core.schema.ids_schema module:

IdsSchema - A top-level class which contains the required metadata fields for each IDS (@idsNamespace, @idsType, @idsVersion, $id, and $schema) and provides the minimum required validation to make an IDS compliant with the TetraScience platform
TetraDataSchema - A top-level class which inherits from IdsSchema, providing all the functionality listed above and also enforces internal TetraScience modeling conventions

In addition to defining IDS metadata fields, in the example below we add a field named “samples” that uses the predefined component, Sample.

from typing import ClassVar, List, Literal

from ts_ids_core.annotations import Required
from ts_ids_core.base.ids_element import SchemaExtraMetadataType
from ts_ids_core.base.ids_field import IdsField
from ts_ids_core.schema import IdsSchema, Sample

class DemoIdsSchema(IdsSchema):
    #: The type hint `SchemaExtraMetadataType` is required.
    schema_extra_metadata: ClassVar[SchemaExtraMetadataType] = {
        "$id": "https://ids.tetrascience.com/my_namespace/demo_ids/v1.0.0/schema.json",
        "$schema": "http://json-schema.org/draft-07/schema#",
    }

    ids_namespace: Required[Literal["my_namespace"]] = IdsField(
        default="my_namespace", alias="@idsNamespace"
    )
    ids_type: Required[Literal["my_unique_ids_name"]] = IdsField(
        default="my_unique_ids_name", alias="@idsType"
    )
    ids_version: Required[Literal["v1.0.0"]] = IdsField(
        default="v1.0.0", alias="@idsVersion"
    )

    samples: List[Sample]

In addition to Sample, other standard IDS components such as System, User and DataCube can be found in ts_ids_core.schema.

That’s it! You just defined an IDS class. To export the IDS to JSON Schema used by the TetraScience platform, you can use the IdsElement method model_json_schema. For example:

import json
from typing import Any, Dict

model_schema: Dict[str, Any] = DemoIdsSchema.model_json_schema()

json_schema = json.dumps(model_schema, indent=2)

print(json_schema)

Expand to show output

{
  "$id": "https://ids.tetrascience.com/my_namespace/demo_ids/v1.0.0/schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "additionalProperties": false,
  "properties": {
    "@idsType": {
      "const": "my_unique_ids_name",
      "type": "string"
    },
    "@idsVersion": {
      "const": "v1.0.0",
      "type": "string"
    },
    "@idsNamespace": {
      "const": "my_namespace",
      "type": "string"
    },
    "samples": {
      "items": {
        "$ref": "#/definitions/Sample"
      },
      "type": "array"
    }
  },
  "required": [
    "@idsType",
    "@idsVersion",
    "@idsNamespace"
  ],
  "type": "object",
  "definitions": {
    "Batch": {
      "additionalProperties": false,
      "description": "A Batch is the result of a single manufacturing run for a drug product that is made as specified groups or amounts,  within a specific time frame from the same raw materials that is intended to have uniform character and quality, within specified limits.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a batch.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Batch name",
          "type": [
            "string",
            "null"
          ]
        },
        "barcode": {
          "description": "Barcode assigned to a batch",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Compound": {
      "additionalProperties": false,
      "description": "A Compound is a specific chemical or biochemical structure or substance that is being investigated. A Compound may be any drug substance, drug product intermediate, or drug product across small molecules, and cell and gene therapy (CGT).",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a compound.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Compound name.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Holder": {
      "additionalProperties": false,
      "description": "A sample container such as a microplate or a vial.",
      "properties": {
        "name": {
          "description": "Holder name.",
          "type": [
            "string",
            "null"
          ]
        },
        "type": {
          "description": "Holder type.",
          "type": [
            "string",
            "null"
          ]
        },
        "barcode": {
          "description": "Barcode assigned to a holder.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Label": {
      "additionalProperties": false,
      "description": "A Label associated with a sample, along with metadata about the label including\nthe source of the label and times associated with the label such as when it was\ncreated or looked up.",
      "properties": {
        "source": {
          "$ref": "#/definitions/Source",
          "description": "Sample label data source information."
        },
        "name": {
          "description": "Sample label name.",
          "type": "string"
        },
        "value": {
          "description": "Sample label value.",
          "type": "string"
        },
        "time": {
          "$ref": "#/definitions/SampleTime",
          "description": "Time associated with the sample label."
        }
      },
      "required": [
        "source",
        "name",
        "value",
        "time"
      ],
      "type": "object"
    },
    "Location": {
      "additionalProperties": false,
      "description": "The Location of the sample within the holder, such as the location of a well in a microplate.",
      "properties": {
        "position": {
          "description": "Raw position string.",
          "type": [
            "string",
            "null"
          ]
        },
        "row": {
          "description": "Row index of sample location in a plate or holder.",
          "type": [
            "number",
            "null"
          ]
        },
        "column": {
          "description": "Column index of sample location in a plate or holder.",
          "type": [
            "number",
            "null"
          ]
        },
        "index": {
          "description": "Index of sample location flattened to a single dimension.",
          "type": [
            "number",
            "null"
          ]
        },
        "holder": {
          "$ref": "#/definitions/Holder",
          "description": "Sample holder information"
        }
      },
      "type": "object"
    },
    "Property": {
      "additionalProperties": false,
      "description": "A property has a name and a value of any type, with metadata about the\nproperty including the source of the property and times associated with it\nsuch as when the property was created or looked up.",
      "properties": {
        "source": {
          "$ref": "#/definitions/Source",
          "description": "Sample property data source information."
        },
        "name": {
          "description": "Sample Property name.",
          "type": "string"
        },
        "value": {
          "description": "The original string value of the property.",
          "type": "string"
        },
        "value_data_type": {
          "$ref": "#/definitions/ValueDataType",
          "description": "This is the type of the original value."
        },
        "string_value": {
          "description": "If string_value has a value, then numerical_value, numerical_value_unit, and boolean_value all have to be null.",
          "type": [
            "string",
            "null"
          ]
        },
        "numerical_value": {
          "description": "If numerical_value has a value, then string_value and boolean_value both have to be null.",
          "type": [
            "number",
            "null"
          ]
        },
        "numerical_value_unit": {
          "description": "Unit for the numerical value.",
          "type": [
            "string",
            "null"
          ]
        },
        "boolean_value": {
          "description": "If boolean_value has a value, then numerical_value, numerical_value_unit, and string_value all have to be null.",
          "type": [
            "boolean",
            "null"
          ]
        },
        "time": {
          "$ref": "#/definitions/SampleTime",
          "description": "Time associated with the sample property."
        }
      },
      "required": [
        "source",
        "name",
        "value",
        "value_data_type",
        "string_value",
        "numerical_value",
        "numerical_value_unit",
        "boolean_value",
        "time"
      ],
      "type": "object"
    },
    "RawSampleTime": {
      "additionalProperties": false,
      "description": "The base model for time associated with a specific sample.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Raw sample data lookup time.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "lookup"
      ],
      "type": "object"
    },
    "Sample": {
      "additionalProperties": false,
      "description": "A Sample is a discrete entity being observed in an experiment. For example, Samples may be characterized for product quality and stability, or be measured for research purposes.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a sample.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Sample name.",
          "type": [
            "string",
            "null"
          ]
        },
        "barcode": {
          "description": "Barcode assigned to a sample.",
          "type": [
            "string",
            "null"
          ]
        },
        "batch": {
          "$ref": "#/definitions/Batch"
        },
        "set": {
          "$ref": "#/definitions/Set",
          "description": "Sample set."
        },
        "location": {
          "$ref": "#/definitions/Location",
          "description": "Sample location information."
        },
        "compound": {
          "$ref": "#/definitions/Compound",
          "description": "Sample compound information."
        },
        "properties": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/Property"
          },
          "description": "Sample properties."
        },
        "labels": {
          "description": "Sample labels.",
          "items": {
            "$ref": "#/definitions/Label"
          },
          "type": "array"
        }
      },
      "type": "object"
    },
    "SampleTime": {
      "additionalProperties": false,
      "description": "A model for experiment sample datetime values converted to a standard ISO format\nand their respective raw datetime values in the primary data.",
      "properties": {
        "start": {
          "description": "Process/experiment/task start time.",
          "type": [
            "string",
            "null"
          ]
        },
        "created": {
          "description": "Data created time.",
          "type": [
            "string",
            "null"
          ]
        },
        "stop": {
          "description": "Process/experiment/task stop/finish time.",
          "type": [
            "string",
            "null"
          ]
        },
        "duration": {
          "description": "Process/experiment/task duration.",
          "type": [
            "string",
            "null"
          ]
        },
        "last_updated": {
          "description": "Data last updated time of a file/method.",
          "type": [
            "string",
            "null"
          ]
        },
        "acquired": {
          "description": "Data acquired/exported/captured time.",
          "type": [
            "string",
            "null"
          ]
        },
        "modified": {
          "description": "Data last modified/edited time.",
          "type": [
            "string",
            "null"
          ]
        },
        "lookup": {
          "description": "Raw sample data lookup time.",
          "type": [
            "string",
            "null"
          ]
        },
        "raw": {
          "$ref": "#/definitions/RawSampleTime",
          "description": "Raw sample time values from primary data."
        }
      },
      "required": [
        "lookup"
      ],
      "type": "object"
    },
    "Set": {
      "additionalProperties": false,
      "description": "A group of Samples.",
      "properties": {
        "id": {
          "description": "Unique identifier assigned to a set.",
          "type": [
            "string",
            "null"
          ]
        },
        "name": {
          "description": "Set name.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "type": "object"
    },
    "Source": {
      "additionalProperties": false,
      "description": "The Source of information, such as a data file or a sample database.",
      "properties": {
        "name": {
          "description": "Source name.",
          "type": [
            "string",
            "null"
          ]
        },
        "type": {
          "description": "Source type.",
          "type": [
            "string",
            "null"
          ]
        }
      },
      "required": [
        "name",
        "type"
      ],
      "type": "object"
    },
    "ValueDataType": {
      "description": "Allowed data type values.",
      "enum": [
        "string",
        "number",
        "boolean"
      ],
      "type": "string"
    }
  }
}

JSON Schema may be similarly exported via the export-schema script. For further info, see How to export schema.json from a programmatic IDS.

Installation security¶

TetraScience uses JFrog Artifactory to host internal packages and also resolve PyPI packages. We expose a JFrog repository, ts-pypi-external, for customers to install TetraScience packages that are not published to public package indexes such as PyPI. ts-pypi-external contains the packages which we will share with customers, which are added to the repository on a case by case basis. Both ts-ids-core and ts-ids-components are available in ts-pypi-external. Customers are given access by receiving a repository URL, username and authentication token generated by TetraScience. Please contact your customer success representative to request the information and credentials if you have not already received them.

In order to start installing packages from ts-pypi-external, it’s important you understand Dependency Confusion which is described in PEP 708 and in this helpful article. When installing from a private package repository as well as a public package repository, any time a package with the same name appears in both repositories, it’s possible for an installer like pip to install an untrusted package from the public repository instead of the intended trusted package from the private repository. This problem needs to be considered when integrating with ts-pypi-external.

PEP 708 recommends creating a single package repository which mirrors packages from all desired sources, and then have users only install packages from that single repository:

For private repositories that host private projects, it is recommended that you mirror the public
projects that your users depend on into your own repository, taking care not to let a public project
merge with a private project, and tell your users to use the --index-url option to use only your repository.

Package managers provide ways to configure credentials and specify package sources in their configuration files. The details of which sources and credentials need to be configured will depend on your package management solution.

The goal of integrating ts-pypi-external into your packaging workflow should be to only allow downloading TetraScience hosted packages from ts-pypi-external or your private package index which mirrors ts-pypi-external and no other sources.

Some examples of package manager source configurations follow:

Poetry

You can set a source as the priority source where it will first try to resolve a package from the specified source and then will fall back to PyPI. If you have a private package index, then you can set this as the source and restrict downloading any Tetra hosted packages from PyPI within your private index settings. If you do not have a private package index, then you can restrict a package to a specific source as documented here.

Pipenv

Similar to poetry, pipenv allows you to specify package indexes. As mentioned above, if you have a private package index, you can set this as the source and restrict downloading any Tetra hosted packages from PyPI within your private index settings. If you do not have a private package index, then you can restrict a package to a specific source using the index specifier shown in the page linked above.

Pip

Unfortunately, pip does not provide package restriction configurations similar to pipenv and poetry. You can specify downloading with multiple sources using both the --index-url and --extra-index-url options. But, you are unable to restrict a specific dependency to a given source as you can do with source and index for poetry and pipenv respectively. If you do not have a private package index, then it is not recommended to use pip as you cannot completely remove the risk of dependency confusion. If you do have a private package index, then we recommend configuring the package index settings to never mirror ts-ids-core or ts-ids-components from PyPI, then only install packages from that private index.

License¶

License information can be found here: License.