Open Morphology Standard (OMS) v1.0.0

Scope

OMS v1.0.0 standardizes per-plate packaging for cell morphology imaging (e.g., Cell Painting). It defines a Minimal Viable Set (MVS) of required metadata to train a useful virtual cell, and a broader set of Nice-To-Have attributes that enhance cross-lab generalization, evaluation, and generative control.

1. Folder Layout (per plate)

plate_<ID>/
  manifest.jsonl                 # GENERATED on upload (canonical file list + hashes; see §6)
  plate_metadata.json            # REQUIRED (see §3.1 MVS)
  wells.csv                      # REQUIRED (see §3.2 MVS)
  sites.csv                      # REQUIRED (see §3.3 MVS)
  raw/                           # REQUIRED, raw images (see §1)
    well_A01/site_1/channel_DNA.tif
    ...
  qc_metrics.csv                 # GENERATED on upload (read-only)
  qc_summary.json                # GENERATED on upload (read-only)
  LICENSE_CC-BY-4.0.txt         # GENERATED on upload (informational copy)
  LICENSE_ODC-BY-1.0.txt        # GENERATED on upload (informational copy)

Images must live under raw/; directory names must follow the convention well_<A01>/site_<n>/channel_<NAME>.

2. Image Data

Formats: OME-TIFF or OME-Zarr (NGFF) preferred; plain TIFF allowed if required facts are provided via sidecar.
Structure: raw/well_<A01>/site_<n>/channel_<NAME>.tif (or NGFF layout under raw/).
Channel Names (controlled): DNA, ER, Mito, Actin, RNA, Golgi.
Scale: pixel_size_um MUST be known (from headers or sidecar). No assumptions.
Mixing formats: Not allowed within a plate.

Plain TIFF sidecar: if required facts are absent from headers, provide image_metadata.csv keyed by file path with: pixel_size_um,image_width_px,image_height_px,bit_depth,z_planes,z_step_um,channel_name.

3. Controls & Replicates

MVS does not require specific counts of controls or replicates to accept a dataset. However, their presence improves modeling and evaluation. See capability flags in §5.

Recommended (Nice-To-Have): ≥4 negative control wells, ≥2 positive control wells, and ≥3 replicates per condition.
Declaration: Use wells.csv to label control wells and replicate groupings when available.

4. Metadata

Each subsection lists the MVS (required) fields first, followed by Nice-To-Have fields. Missing optional fields remain unknown; the platform will not impute values.

4.1 Plate-level (`plate_metadata.json`)

MVS (Required)

schema_version (string; const 1.0.0)
plate_id (string, globally unique)
cell_line (string; free text allowed)
plate_format (enum: 96 | 384 | 1536)
sites_per_well (int ≥ 1)
image_format (enum: OME-TIFF | OME-ZARR | TIFF)
channels_present (array of allowed names)
pixel_size_um (number; may be read from headers or sidecar)

Nice-To-Have

channel_order (array of allowed names)
channel_metadata (per-channel {name, ex_nm, em_nm, bit_depth})
z_planes (int), z_step_um (number)
objective_magnification, objective_na
image_width_px, image_height_px
microscope_make, microscope_model, camera_model
exposure_policy (enum: fixed | auto)
fixative (enum: PFA | methanol | other)
experiment_datetime (ISO 8601)
notes (short string)

4.2 Well-level (`wells.csv`)

MVS (Required Columns)

well_id
label_kind ∈ {control, perturbation}

Conditional Requirements

If label_kind = control: require control_type ∈ {negative, positive}.
If label_kind = perturbation: require perturbation_type ∈ {compound, crispr, orf, sirna, vehicle, other} and perturbation_id.

Nice-To-Have Columns

When label_kind = perturbation: perturbation_name, dose_value, dose_unit, time_after_treatment_h, replicate_group_id, vehicle, comments

When label_kind = perturbation AND perturbation_type = compound: vendor,catalog_no,lot_no,smiles,inchikey

When label_kind = perturbation AND perturbation_type = crispr: target_gene_symbol,target_gene_id,sgRNA_sequence,genome_build,target_locus,pam

When label_kind = control: replicate_group_id, vehicle, comments

Constraints

well_id must match the plate regex: 96-well ^[A-H](0[1-9]|1[0-2])$ · 384-well ^[A-P](0[1-9]|1[0-9]|2[0-4])$ · 1536-well ^[A-Z]{2}(0[1-9]|[1-5][0-9]|6[0-4])$.

4.3 Site-level (`sites.csv`)

MVS (Required Columns)

site_id (integer ≥1)
well_id (string)
channel_name (enum: DNA, ER, Mito, Actin, RNA, Golgi)
z_index (integer, 0-based; for single-plane data set to 0)
file_path (string; relative under raw/; TIFF/OME-TIFF: endswith .tif/.tiff and match pattern raw/well_<WELL>/site_<site_id>/channel_<channel_name>.tif; OME-Zarr: path contains .zarr and points to the correct NGFF group)

Nice-To-Have Columns

exposure_ms
binning
stage_x_um
stage_y_um

Uniqueness & Coverage

Primary key: (well_id, site_id, channel_name, z_index) must be unique.
Coverage: For every well_id in wells.csv and every site_id ∈ [1..sites_per_well] and each channel in channels_present, there MUST be at least one row. If plate_metadata.z_planes is provided, rows must cover all z_index ∈ [0..z_planes-1] with no gaps; if omitted, uniqueness of z_index per (well,site,channel) is enforced but completeness is not.

Machine schema (Table Schema JSON for validators)

{
  "fields": [
    {"name":"site_id","type":"integer","constraints":{"minimum":1}},
    {"name":"well_id","type":"string"},
    {"name":"channel_name","type":"string","constraints":{"enum":["DNA","ER","Mito","Actin","RNA","Golgi"]}},
    {"name":"z_index","type":"integer","constraints":{"minimum":0}},
    {"name":"file_path","type":"string"},
    {"name":"exposure_ms","type":"number"},
    {"name":"binning","type":"integer","constraints":{"enum":[1,2,4]}},
    {"name":"stage_x_um","type":"number"},
    {"name":"stage_y_um","type":"number"}
  ],
  "primaryKey":["well_id","site_id","channel_name","z_index"]
}

Machine schema (JSON Schema for row validation)

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "wells.csv row (MVS)",
  "type": "object",
  "additionalProperties": true,
  "properties": {
    "well_id": {"type":"string"},
    "label_kind": {"type":"string", "enum":["control","perturbation"]},
    "control_type": {"type":"string", "enum":["negative","positive"]},
    "perturbation_type": {"type":"string", "enum":["compound","crispr","orf","sirna","vehicle","other"]},
    "perturbation_id": {"type":"string"}
  },
  "required": ["well_id","label_kind"],
  "allOf": [
    {"if": {"properties": {"label_kind": {"const":"control"}}},
     "then": {"required": ["control_type"]}},
    {"if": {"properties": {"label_kind": {"const":"perturbation"}}},
     "then": {"required": ["perturbation_type","perturbation_id"]}}
  ]
}

5. JSON Schemas (MVS)

Plate-level MVS schema (v1.0.0) — only MVS fields are required. Nice-To-Have fields may be present and will be validated if provided.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "OMS v1.0.0 plate_metadata.json (MVS)",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "schema_version": { "type": "string", "const": "1.0.0" },
    "plate_id": { "type": "string" },
    "cell_line": { "type": "string" },
    "plate_format": { "type": "integer", "enum": [96,384,1536] },
    "sites_per_well": { "type": "integer", "minimum": 1 },
    "image_format": { "type": "string", "enum": ["OME-TIFF","OME-ZARR","TIFF"] },
    "channels_present": {
      "type": "array",
      "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
      "minItems": 1
    },
    "pixel_size_um": { "type": "number" },

    "channel_order": { "type": "array", "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] } },
    "channel_metadata": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "name": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
          "ex_nm": { "type": "integer" },
          "em_nm": { "type": "integer" },
          "bit_depth": { "type": "integer", "enum": [8,12,16,32] }
        },
        "required": ["name","ex_nm","em_nm","bit_depth"]
      }
    },
    "z_planes": { "type": "integer", "minimum": 1 },
    "z_step_um": { "type": "number" },
    "objective_magnification": { "type": "number" },
    "objective_na": { "type": "number" },
    "image_width_px": { "type": "integer" },
    "image_height_px": { "type": "integer" },
    "microscope_make": { "type": "string" },
    "microscope_model": { "type": "string" },
    "camera_model": { "type": "string" },
    "exposure_policy": { "type": "string", "enum": ["fixed","auto"] },
    "fixative": { "type": "string", "enum": ["PFA","methanol","other"] },
    "experiment_datetime": { "type": "string", "format": "date-time" },
    "notes": { "type": "string" }
  },
  "required": [
    "schema_version","plate_id","cell_line",
    "image_format","plate_format","sites_per_well",
    "channels_present","pixel_size_um"
  ]
}

6. QC & Capability Flags (Platform-Computed)

QC artifacts are computed on upload and are read-only. In addition to metrics and a summary, the platform surfaces capability flags that summarize what the dataset can support.

qc_metrics.csv: per-well (and optional per-site) metrics with method provenance.
qc_summary.json: thresholds, pass/fail counts, rollups, and capability flags.

Capability Flags

Booleans and enums, derived strictly from provided data (no imputation):

has_negative_controls, has_positive_controls
has_replicates (≥2 wells per condition)
has_dose (dose columns present for ≥1 condition)
has_timecourse (time columns present)
has_zstack (any z_index > 0 or z_planes > 1)
has_channel_metadata (spectral/bit depth provided)
has_instrument_meta (any of: objective, NA, make/model, exposure policy)
format (enum: OME-TIFF | OME-ZARR | TIFF)
channels (list of present channels)

7. Validation Rules

Acceptance (MVS)

All MVS fields present and valid (see §3).
All required files exist; sites.csv covers every (well_id, site_id, channel) expected by wells.csv, sites_per_well, and channels_present; if z_planes is provided, coverage must include all z_index ∈ [0..z_planes-1], otherwise at least one z_index per (well_id,site_id,channel) with unique z_indexs.
pixel_size_um is known (from headers or sidecar).
No mixing of image_format within a plate.
Channel names are from the controlled vocabulary.
For OME‑Zarr, minimal NGFF integrity: .zattrs has multiscales; each level has .zarray.

Rejection conditions

Missing MVS fields, or invalid well_id pattern for the plate format.
Any file_path in sites.csv does not exist or is not listed in manifest.jsonl, or checksum mismatch.
channels_present declares a channel that never appears in sites.csv.
Duplicate rows for the same (well_id, site_id, channel_name, z_index).

Note: The platform does not impute or assign defaults. Unknown remains unknown; capability flags reflect only what is present.

8. Manifest & Dataset Root

Auto-generated on upload: the platform generates a canonical manifest.jsonl over every file and computes a Merkle dataset_root from it. These artifacts make the package tamper-proof.

{"path":"raw/well_A01/site_1/channel_DNA.tif","size":4213340,"sha256":"9b2f...","mime":"image/tiff","role":"raw","uri":"s3://bucket/key","versionId":"<optional>"}

Allowed roles: role ∈ {raw, qc}.

Canonicalization: UTF-8 (no BOM), one JSON object per line; keys ordered as shown; lines sorted by path; newline \n. Leaf hash: H(0x00 || line); node hash: H(0x01 || left || right); hash = SHA-256. Odd leaf promotion (no duplication).

Verification on S3: With dataset_root from attestation, anyone can fetch the manifest, recompute the root, and verify each file’s sha256 after download.

9. Licensing

All files are licensed under CC-BY-4.0 and ODC-BY-1.0. Auto-generated copies are included in the package.

Downloads

Demo plates

Minimal demo plate (OME‑TIFF): demo_plate_ometiff.zip
Minimal demo plate (OME‑Zarr): demo_plate_omezarr.zip

Replace placeholder images with your actual OME‑TIFF/OME‑Zarr/TIFF files following the defined structure. manifest.jsonl, QC files, and attestation are generated by the platform on upload.

Scope

1. Folder Layout (per plate)

2. Image Data

3. Controls & Replicates

4. Metadata

4.1 Plate-level (plate_metadata.json)

MVS (Required)

Nice-To-Have

4.2 Well-level (wells.csv)

MVS (Required Columns)

Conditional Requirements

Nice-To-Have Columns

Constraints

4.3 Site-level (sites.csv)

MVS (Required Columns)

Nice-To-Have Columns

Uniqueness & Coverage

Machine schema (Table Schema JSON for validators)

Machine schema (JSON Schema for row validation)

5. JSON Schemas (MVS)

6. QC & Capability Flags (Platform-Computed)

Capability Flags

7. Validation Rules

Acceptance (MVS)

Rejection conditions

8. Manifest & Dataset Root

9. Licensing

Downloads

Wells (CSV)

Sites (CSV)

Plate metadata

Image sidecar (plain TIFF only)

Demo plates

4.1 Plate-level (`plate_metadata.json`)

4.2 Well-level (`wells.csv`)

4.3 Site-level (`sites.csv`)