OMS v1.0.0 standardizes per-plate packaging for cell morphology imaging (e.g., Cell Painting). It defines a Minimal Viable Set (MVS) of required metadata to train a useful virtual cell, and a broader set of Nice-To-Have attributes that enhance cross-lab generalization, evaluation, and generative control.
plate_<ID>/
manifest.jsonl # GENERATED on upload (canonical file list + hashes; see §6)
plate_metadata.json # REQUIRED (see §3.1 MVS)
wells.csv # REQUIRED (see §3.2 MVS)
sites.csv # REQUIRED (see §3.3 MVS)
raw/ # REQUIRED, raw images (see §1)
well_A01/site_1/channel_DNA.tif
...
qc_metrics.csv # GENERATED on upload (read-only)
qc_summary.json # GENERATED on upload (read-only)
LICENSE_CC-BY-4.0.txt # GENERATED on upload (informational copy)
LICENSE_ODC-BY-1.0.txt # GENERATED on upload (informational copy)
Images must live under raw/; directory names
must follow the convention well_<A01>/site_<n>/channel_<NAME>.
raw/well_<A01>/site_<n>/channel_<NAME>.tif (or
NGFF layout under raw/).pixel_size_um MUST be known (from headers or sidecar). No assumptions.
Plain TIFF sidecar: if required facts are absent from headers, provide
image_metadata.csv keyed by file path with:
pixel_size_um,image_width_px,image_height_px,bit_depth,z_planes,z_step_um,channel_name.
MVS does not require specific counts of controls or replicates to accept a dataset. However, their presence improves modeling and evaluation. See capability flags in §5.
wells.csv to label control wells and replicate groupings when
available.Each subsection lists the MVS (required) fields first, followed by Nice-To-Have fields. Missing optional fields remain unknown; the platform will not impute values.
plate_metadata.json)schema_version (string; const 1.0.0)plate_id (string, globally unique)cell_line (string; free text allowed)plate_format (enum: 96 | 384 | 1536)sites_per_well (int ≥ 1)image_format (enum: OME-TIFF | OME-ZARR | TIFF)channels_present (array of allowed names)pixel_size_um (number; may be read from headers or sidecar)channel_order (array of allowed names)channel_metadata (per-channel {name, ex_nm, em_nm, bit_depth})z_planes (int), z_step_um (number)objective_magnification, objective_naimage_width_px, image_height_pxmicroscope_make, microscope_model, camera_modelexposure_policy (enum: fixed | auto)fixative (enum: PFA | methanol | other)experiment_datetime (ISO 8601)notes (short string)wells.csv)well_idlabel_kind ∈ {control, perturbation}label_kind = control: require control_type ∈ {negative, positive}.label_kind = perturbation: require perturbation_type ∈ {compound, crispr, orf,
sirna, vehicle, other} and perturbation_id.When label_kind = perturbation:
perturbation_name, dose_value, dose_unit, time_after_treatment_h, replicate_group_id, vehicle, comments
When label_kind = perturbation AND perturbation_type = compound:
vendor,catalog_no,lot_no,smiles,inchikey
When label_kind = perturbation AND perturbation_type = crispr:
target_gene_symbol,target_gene_id,sgRNA_sequence,genome_build,target_locus,pam
When label_kind = control: replicate_group_id, vehicle, comments
well_id must match the plate regex: 96-well ^[A-H](0[1-9]|1[0-2])$ · 384-well
^[A-P](0[1-9]|1[0-9]|2[0-4])$ · 1536-well ^[A-Z]{2}(0[1-9]|[1-5][0-9]|6[0-4])$.
sites.csv)site_id (integer ≥1)well_id (string)channel_name (enum: DNA, ER, Mito, Actin, RNA, Golgi)z_index (integer, 0-based; for single-plane data set to 0)file_path (string; relative under raw/; TIFF/OME-TIFF: endswith .tif/.tiff and match pattern raw/well_<WELL>/site_<site_id>/channel_<channel_name>.tif; OME-Zarr: path contains .zarr and points to the correct NGFF group)
exposure_msbinningstage_x_umstage_y_um(well_id, site_id, channel_name, z_index) must be unique.well_id in wells.csv and every
site_id ∈ [1..sites_per_well] and each channel in channels_present, there MUST be at
least one row. If plate_metadata.z_planes is provided, rows must cover all
z_index ∈ [0..z_planes-1] with no gaps; if omitted, uniqueness of z_index
per (well,site,channel) is enforced but completeness is not.{
"fields": [
{"name":"site_id","type":"integer","constraints":{"minimum":1}},
{"name":"well_id","type":"string"},
{"name":"channel_name","type":"string","constraints":{"enum":["DNA","ER","Mito","Actin","RNA","Golgi"]}},
{"name":"z_index","type":"integer","constraints":{"minimum":0}},
{"name":"file_path","type":"string"},
{"name":"exposure_ms","type":"number"},
{"name":"binning","type":"integer","constraints":{"enum":[1,2,4]}},
{"name":"stage_x_um","type":"number"},
{"name":"stage_y_um","type":"number"}
],
"primaryKey":["well_id","site_id","channel_name","z_index"]
}
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "wells.csv row (MVS)",
"type": "object",
"additionalProperties": true,
"properties": {
"well_id": {"type":"string"},
"label_kind": {"type":"string", "enum":["control","perturbation"]},
"control_type": {"type":"string", "enum":["negative","positive"]},
"perturbation_type": {"type":"string", "enum":["compound","crispr","orf","sirna","vehicle","other"]},
"perturbation_id": {"type":"string"}
},
"required": ["well_id","label_kind"],
"allOf": [
{"if": {"properties": {"label_kind": {"const":"control"}}},
"then": {"required": ["control_type"]}},
{"if": {"properties": {"label_kind": {"const":"perturbation"}}},
"then": {"required": ["perturbation_type","perturbation_id"]}}
]
}
Plate-level MVS schema (v1.0.0) — only MVS fields are required. Nice-To-Have fields may be present and will be validated if provided.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "OMS v1.0.0 plate_metadata.json (MVS)",
"type": "object",
"additionalProperties": false,
"properties": {
"schema_version": { "type": "string", "const": "1.0.0" },
"plate_id": { "type": "string" },
"cell_line": { "type": "string" },
"plate_format": { "type": "integer", "enum": [96,384,1536] },
"sites_per_well": { "type": "integer", "minimum": 1 },
"image_format": { "type": "string", "enum": ["OME-TIFF","OME-ZARR","TIFF"] },
"channels_present": {
"type": "array",
"items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
"minItems": 1
},
"pixel_size_um": { "type": "number" },
"channel_order": { "type": "array", "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] } },
"channel_metadata": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"name": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] },
"ex_nm": { "type": "integer" },
"em_nm": { "type": "integer" },
"bit_depth": { "type": "integer", "enum": [8,12,16,32] }
},
"required": ["name","ex_nm","em_nm","bit_depth"]
}
},
"z_planes": { "type": "integer", "minimum": 1 },
"z_step_um": { "type": "number" },
"objective_magnification": { "type": "number" },
"objective_na": { "type": "number" },
"image_width_px": { "type": "integer" },
"image_height_px": { "type": "integer" },
"microscope_make": { "type": "string" },
"microscope_model": { "type": "string" },
"camera_model": { "type": "string" },
"exposure_policy": { "type": "string", "enum": ["fixed","auto"] },
"fixative": { "type": "string", "enum": ["PFA","methanol","other"] },
"experiment_datetime": { "type": "string", "format": "date-time" },
"notes": { "type": "string" }
},
"required": [
"schema_version","plate_id","cell_line",
"image_format","plate_format","sites_per_well",
"channels_present","pixel_size_um"
]
}
QC artifacts are computed on upload and are read-only. In addition to metrics and a summary, the platform surfaces capability flags that summarize what the dataset can support.
qc_metrics.csv: per-well (and optional per-site) metrics with method provenance.qc_summary.json: thresholds, pass/fail counts, rollups, and capability flags.Booleans and enums, derived strictly from provided data (no imputation):
has_negative_controls, has_positive_controlshas_replicates (≥2 wells per condition)has_dose (dose columns present for ≥1 condition)has_timecourse (time columns present)has_zstack (any z_index > 0 or z_planes > 1)has_channel_metadata (spectral/bit depth provided)has_instrument_meta (any of: objective, NA, make/model, exposure policy)format (enum: OME-TIFF | OME-ZARR | TIFF)channels (list of present channels)sites.csv covers every (well_id, site_id,
channel) expected by wells.csv, sites_per_well, and channels_present;
if z_planes is provided, coverage must include all z_index ∈ [0..z_planes-1],
otherwise at least one z_index per (well_id,site_id,channel) with
unique z_indexs.pixel_size_um is known (from headers or sidecar).image_format within a plate..zattrs has multiscales; each level has
.zarray.well_id pattern for the plate format.file_path in sites.csv does not exist or is not listed in
manifest.jsonl, or checksum mismatch.channels_present declares a channel that never appears in sites.csv.well_id, site_id, channel_name,
z_index).
Note: The platform does not impute or assign defaults. Unknown remains unknown; capability flags reflect only what is present.
Auto-generated on upload: the platform generates a canonical manifest.jsonl over
every file and computes a Merkle dataset_root from it. These artifacts make the
package tamper-proof.
{"path":"raw/well_A01/site_1/channel_DNA.tif","size":4213340,"sha256":"9b2f...","mime":"image/tiff","role":"raw","uri":"s3://bucket/key","versionId":"<optional>"}
Allowed roles: role ∈ {raw, qc}.
Canonicalization: UTF-8 (no BOM), one JSON object per line; keys ordered as shown; lines
sorted by path; newline \n. Leaf hash: H(0x00 || line); node hash:
H(0x01 || left || right); hash = SHA-256. Odd leaf promotion (no duplication).
Verification on S3: With dataset_root from attestation, anyone can fetch the
manifest, recompute the root, and verify each file’s sha256 after download.
Replace placeholder images with your actual OME‑TIFF/OME‑Zarr/TIFF
files following the defined structure. manifest.jsonl, QC files, and attestation are generated by
the platform on upload.