Open Morphology Standard (OMS) v1.0.0

Scope

OMS v1.0.0 standardizes per-plate packaging for cell morphology imaging (e.g., Cell Painting). It defines a Minimal Viable Set (MVS) of required metadata to train a useful virtual cell, and a broader set of Nice-To-Have attributes that enhance cross-lab generalization, evaluation, and generative control.

1. Folder Layout (per plate)

plate_<ID>/
  manifest.jsonl                 # GENERATED on upload (canonical file list + hashes; see §6)
  plate_metadata.json            # REQUIRED (see §3.1 MVS)
  wells.csv                      # REQUIRED (see §3.2 MVS)
  sites.csv                      # REQUIRED (see §3.3 MVS)
  raw/                           # REQUIRED, raw images (see §1)
    well_A01/site_1/channel_DNA.tif
    ...
  qc_metrics.csv                 # GENERATED on upload (read-only)
  qc_summary.json                # GENERATED on upload (read-only)
  LICENSE_CC-BY-4.0.txt         # GENERATED on upload (informational copy)
  LICENSE_ODC-BY-1.0.txt        # GENERATED on upload (informational copy)

Images must live under raw/; directory names must follow the convention well_<A01>/site_<n>/channel_<NAME>.

2. Image Data

Plain TIFF sidecar: if required facts are absent from headers, provide image_metadata.csv keyed by file path with: pixel_size_um,image_width_px,image_height_px,bit_depth,z_planes,z_step_um,channel_name.

3. Controls & Replicates

MVS does not require specific counts of controls or replicates to accept a dataset. However, their presence improves modeling and evaluation. See capability flags in §5.

4. Metadata

Each subsection lists the MVS (required) fields first, followed by Nice-To-Have fields. Missing optional fields remain unknown; the platform will not impute values.

4.1 Plate-level (plate_metadata.json)

MVS (Required)

Nice-To-Have

4.2 Well-level (wells.csv)

MVS (Required Columns)

Conditional Requirements

Nice-To-Have Columns

When label_kind = perturbation: perturbation_name, dose_value, dose_unit, time_after_treatment_h, replicate_group_id, vehicle, comments

When label_kind = perturbation AND perturbation_type = compound: vendor,catalog_no,lot_no,smiles,inchikey

When label_kind = perturbation AND perturbation_type = crispr: target_gene_symbol,target_gene_id,sgRNA_sequence,genome_build,target_locus,pam

When label_kind = control: replicate_group_id, vehicle, comments

Constraints

well_id must match the plate regex: 96-well ^[A-H](0[1-9]|1[0-2])$ · 384-well ^[A-P](0[1-9]|1[0-9]|2[0-4])$ · 1536-well ^[A-Z]{2}(0[1-9]|[1-5][0-9]|6[0-4])$.

4.3 Site-level (sites.csv)

MVS (Required Columns)

Nice-To-Have Columns

Uniqueness & Coverage

Machine schema (Table Schema JSON for validators)

{
  "fields": [
    {"name":"site_id","type":"integer","constraints":{"minimum":1}},
    {"name":"well_id","type":"string"},
    {"name":"channel_name","type":"string","constraints":{"enum":["DNA","ER","Mito","Actin","RNA","Golgi"]}},
    {"name":"z_index","type":"integer","constraints":{"minimum":0}},
    {"name":"file_path","type":"string"},
    {"name":"exposure_ms","type":"number"},
    {"name":"binning","type":"integer","constraints":{"enum":[1,2,4]}},
    {"name":"stage_x_um","type":"number"},
    {"name":"stage_y_um","type":"number"}
  ],
  "primaryKey":["well_id","site_id","channel_name","z_index"]
}

Machine schema (JSON Schema for row validation)

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "wells.csv row (MVS)",
  "type": "object",
  "additionalProperties": true,
  "properties": {
    "well_id": {"type":"string"},
    "label_kind": {"type":"string", "enum":["control","perturbation"]},
    "control_type": {"type":"string", "enum":["negative","positive"]},
    "perturbation_type": {"type":"string", "enum":["compound","crispr","orf","sirna","vehicle","other"]},
    "perturbation_id": {"type":"string"}
  },
  "required": ["well_id","label_kind"],
  "allOf": [
    {"if": {"properties": {"label_kind": {"const":"control"}}},
     "then": {"required": ["control_type"]}},
    {"if": {"properties": {"label_kind": {"const":"perturbation"}}},
     "then": {"required": ["perturbation_type","perturbation_id"]}}
  ]
}

5. JSON Schemas (MVS)

Plate-level MVS schema (v1.0.0) — only MVS fields are required. Nice-To-Have fields may be present and will be validated if provided.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "OMS v1.0.0 plate_metadata.json (MVS)",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "schema_version": { "type": "string", "const": "1.0.0" },
    "plate_id": { "type": "string" },
    "cell_line": { "type": "string" },
    "plate_format": { "type": "integer", "enum": [96,384,1536] },
    "sites_per_well": { "type": "integer", "minimum": 1 },
    "image_format": { "type": "string", "enum": ["OME-TIFF","OME-ZARR","TIFF"] },
    "channels_present": {
      "type": "array",
      "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
      "minItems": 1
    },
    "pixel_size_um": { "type": "number" },

    "channel_order": { "type": "array", "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] } },
    "channel_metadata": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "name": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
          "ex_nm": { "type": "integer" },
          "em_nm": { "type": "integer" },
          "bit_depth": { "type": "integer", "enum": [8,12,16,32] }
        },
        "required": ["name","ex_nm","em_nm","bit_depth"]
      }
    },
    "z_planes": { "type": "integer", "minimum": 1 },
    "z_step_um": { "type": "number" },
    "objective_magnification": { "type": "number" },
    "objective_na": { "type": "number" },
    "image_width_px": { "type": "integer" },
    "image_height_px": { "type": "integer" },
    "microscope_make": { "type": "string" },
    "microscope_model": { "type": "string" },
    "camera_model": { "type": "string" },
    "exposure_policy": { "type": "string", "enum": ["fixed","auto"] },
    "fixative": { "type": "string", "enum": ["PFA","methanol","other"] },
    "experiment_datetime": { "type": "string", "format": "date-time" },
    "notes": { "type": "string" }
  },
  "required": [
    "schema_version","plate_id","cell_line",
    "image_format","plate_format","sites_per_well",
    "channels_present","pixel_size_um"
  ]
}

6. QC & Capability Flags (Platform-Computed)

QC artifacts are computed on upload and are read-only. In addition to metrics and a summary, the platform surfaces capability flags that summarize what the dataset can support.

Capability Flags

Booleans and enums, derived strictly from provided data (no imputation):

7. Validation Rules

Acceptance (MVS)

Rejection conditions

Note: The platform does not impute or assign defaults. Unknown remains unknown; capability flags reflect only what is present.

8. Manifest & Dataset Root

Auto-generated on upload: the platform generates a canonical manifest.jsonl over every file and computes a Merkle dataset_root from it. These artifacts make the package tamper-proof.

{"path":"raw/well_A01/site_1/channel_DNA.tif","size":4213340,"sha256":"9b2f...","mime":"image/tiff","role":"raw","uri":"s3://bucket/key","versionId":"<optional>"}

Allowed roles: role ∈ {raw, qc}.

Canonicalization: UTF-8 (no BOM), one JSON object per line; keys ordered as shown; lines sorted by path; newline \n. Leaf hash: H(0x00 || line); node hash: H(0x01 || left || right); hash = SHA-256. Odd leaf promotion (no duplication).

Verification on S3: With dataset_root from attestation, anyone can fetch the manifest, recompute the root, and verify each file’s sha256 after download.

9. Licensing

All files are licensed under CC-BY-4.0 and ODC-BY-1.0. Auto-generated copies are included in the package.

Downloads

Wells (CSV)

Sites (CSV)

Plate metadata

Image sidecar (plain TIFF only)

Demo plates

Replace placeholder images with your actual OME‑TIFF/OME‑Zarr/TIFF files following the defined structure. manifest.jsonl, QC files, and attestation are generated by the platform on upload.