OMS v1.0.0 standardizes per-plate packaging for cell morphology imaging (e.g., Cell Painting). It defines a Minimal Viable Set (MVS) of required metadata to train a useful virtual cell, and a broader set of Nice-To-Have attributes that enhance cross-lab generalization, evaluation, and generative control.
plate_<ID>/ manifest.jsonl # GENERATED on upload (canonical file list + hashes; see §6) plate_metadata.json # REQUIRED (see §3.1 MVS) wells.csv # REQUIRED (see §3.2 MVS) sites.csv # REQUIRED (see §3.3 MVS) raw/ # REQUIRED, raw images (see §1) well_A01/site_1/channel_DNA.tif ... qc_metrics.csv # GENERATED on upload (read-only) qc_summary.json # GENERATED on upload (read-only) LICENSE_CC-BY-4.0.txt # GENERATED on upload (informational copy) LICENSE_ODC-BY-1.0.txt # GENERATED on upload (informational copy)
Images must live under raw/
; directory names
must follow the convention well_<A01>/site_<n>/channel_<NAME>
.
raw/well_<A01>/site_<n>/channel_<NAME>.tif
(or
NGFF layout under raw/
).pixel_size_um
MUST be known (from headers or sidecar). No assumptions.
Plain TIFF sidecar: if required facts are absent from headers, provide
image_metadata.csv
keyed by file path with:
pixel_size_um,image_width_px,image_height_px,bit_depth,z_planes,z_step_um,channel_name
.
MVS does not require specific counts of controls or replicates to accept a dataset. However, their presence improves modeling and evaluation. See capability flags in §5.
wells.csv
to label control wells and replicate groupings when
available.Each subsection lists the MVS (required) fields first, followed by Nice-To-Have fields. Missing optional fields remain unknown; the platform will not impute values.
plate_metadata.json
)schema_version
(string; const 1.0.0)plate_id
(string, globally unique)cell_line
(string; free text allowed)plate_format
(enum: 96 | 384 | 1536)sites_per_well
(int ≥ 1)image_format
(enum: OME-TIFF | OME-ZARR | TIFF)channels_present
(array of allowed names)pixel_size_um
(number; may be read from headers or sidecar)channel_order
(array of allowed names)channel_metadata
(per-channel {name, ex_nm, em_nm, bit_depth}
)z_planes
(int), z_step_um
(number)objective_magnification
, objective_na
image_width_px
, image_height_px
microscope_make
, microscope_model
, camera_model
exposure_policy
(enum: fixed | auto)fixative
(enum: PFA | methanol | other)experiment_datetime
(ISO 8601)notes
(short string)wells.csv
)well_id
label_kind
∈ {control, perturbation}label_kind = control
: require control_type
∈ {negative, positive}.label_kind = perturbation
: require perturbation_type
∈ {compound, crispr, orf,
sirna, vehicle, other} and perturbation_id
.When label_kind = perturbation
:
perturbation_name, dose_value, dose_unit, time_after_treatment_h, replicate_group_id, vehicle, comments
When label_kind = perturbation AND perturbation_type = compound
:
vendor,catalog_no,lot_no,smiles,inchikey
When label_kind = perturbation AND perturbation_type = crispr
:
target_gene_symbol,target_gene_id,sgRNA_sequence,genome_build,target_locus,pam
When label_kind = control
: replicate_group_id, vehicle, comments
well_id
must match the plate regex: 96-well ^[A-H](0[1-9]|1[0-2])$
· 384-well
^[A-P](0[1-9]|1[0-9]|2[0-4])$
· 1536-well ^[A-Z]{2}(0[1-9]|[1-5][0-9]|6[0-4])$
.
sites.csv
)site_id (integer ≥1)
well_id (string)
channel_name (enum: DNA, ER, Mito, Actin, RNA, Golgi)
z_index (integer, 0-based; for single-plane data set to 0)
file_path (string; relative under raw/; TIFF/OME-TIFF: endswith .tif/.tiff and match pattern raw/well_<WELL>/site_<site_id>/channel_<channel_name>.tif; OME-Zarr: path contains .zarr and points to the correct NGFF group)
exposure_ms
binning
stage_x_um
stage_y_um
(well_id, site_id, channel_name, z_index)
must be unique.well_id
in wells.csv
and every
site_id ∈ [1..sites_per_well]
and each channel in channels_present
, there MUST be at
least one row. If plate_metadata.z_planes
is provided, rows must cover all
z_index ∈ [0..z_planes-1]
with no gaps; if omitted, uniqueness of z_index
per (well,site,channel) is enforced but completeness is not.{ "fields": [ {"name":"site_id","type":"integer","constraints":{"minimum":1}}, {"name":"well_id","type":"string"}, {"name":"channel_name","type":"string","constraints":{"enum":["DNA","ER","Mito","Actin","RNA","Golgi"]}}, {"name":"z_index","type":"integer","constraints":{"minimum":0}}, {"name":"file_path","type":"string"}, {"name":"exposure_ms","type":"number"}, {"name":"binning","type":"integer","constraints":{"enum":[1,2,4]}}, {"name":"stage_x_um","type":"number"}, {"name":"stage_y_um","type":"number"} ], "primaryKey":["well_id","site_id","channel_name","z_index"] }
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "wells.csv row (MVS)", "type": "object", "additionalProperties": true, "properties": { "well_id": {"type":"string"}, "label_kind": {"type":"string", "enum":["control","perturbation"]}, "control_type": {"type":"string", "enum":["negative","positive"]}, "perturbation_type": {"type":"string", "enum":["compound","crispr","orf","sirna","vehicle","other"]}, "perturbation_id": {"type":"string"} }, "required": ["well_id","label_kind"], "allOf": [ {"if": {"properties": {"label_kind": {"const":"control"}}}, "then": {"required": ["control_type"]}}, {"if": {"properties": {"label_kind": {"const":"perturbation"}}}, "then": {"required": ["perturbation_type","perturbation_id"]}} ] }
Plate-level MVS schema (v1.0.0) — only MVS fields are required. Nice-To-Have fields may be present and will be validated if provided.
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "OMS v1.0.0 plate_metadata.json (MVS)", "type": "object", "additionalProperties": false, "properties": { "schema_version": { "type": "string", "const": "1.0.0" }, "plate_id": { "type": "string" }, "cell_line": { "type": "string" }, "plate_format": { "type": "integer", "enum": [96,384,1536] }, "sites_per_well": { "type": "integer", "minimum": 1 }, "image_format": { "type": "string", "enum": ["OME-TIFF","OME-ZARR","TIFF"] }, "channels_present": { "type": "array", "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] }, "minItems": 1 }, "pixel_size_um": { "type": "number" }, "channel_order": { "type": "array", "items": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] } }, "channel_metadata": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "enum": ["DNA","ER","Mito","Actin","RNA","Golgi"] }, "ex_nm": { "type": "integer" }, "em_nm": { "type": "integer" }, "bit_depth": { "type": "integer", "enum": [8,12,16,32] } }, "required": ["name","ex_nm","em_nm","bit_depth"] } }, "z_planes": { "type": "integer", "minimum": 1 }, "z_step_um": { "type": "number" }, "objective_magnification": { "type": "number" }, "objective_na": { "type": "number" }, "image_width_px": { "type": "integer" }, "image_height_px": { "type": "integer" }, "microscope_make": { "type": "string" }, "microscope_model": { "type": "string" }, "camera_model": { "type": "string" }, "exposure_policy": { "type": "string", "enum": ["fixed","auto"] }, "fixative": { "type": "string", "enum": ["PFA","methanol","other"] }, "experiment_datetime": { "type": "string", "format": "date-time" }, "notes": { "type": "string" } }, "required": [ "schema_version","plate_id","cell_line", "image_format","plate_format","sites_per_well", "channels_present","pixel_size_um" ] }
QC artifacts are computed on upload and are read-only. In addition to metrics and a summary, the platform surfaces capability flags that summarize what the dataset can support.
qc_metrics.csv
: per-well (and optional per-site) metrics with method provenance.qc_summary.json
: thresholds, pass/fail counts, rollups, and capability flags.Booleans and enums, derived strictly from provided data (no imputation):
has_negative_controls
, has_positive_controls
has_replicates
(≥2 wells per condition)has_dose
(dose columns present for ≥1 condition)has_timecourse
(time columns present)has_zstack
(any z_index > 0
or z_planes > 1
)has_channel_metadata
(spectral/bit depth provided)has_instrument_meta
(any of: objective, NA, make/model, exposure policy)format
(enum: OME-TIFF | OME-ZARR | TIFF)channels
(list of present channels)sites.csv
covers every (well_id
, site_id
,
channel) expected by wells.csv
, sites_per_well
, and channels_present
;
if z_planes
is provided, coverage must include all z_index ∈ [0..z_planes-1]
,
otherwise at least one z_index
per (well_id
,site_id
,channel) with
unique z_index
s.pixel_size_um
is known (from headers or sidecar).image_format
within a plate..zattrs
has multiscales
; each level has
.zarray
.well_id
pattern for the plate format.file_path
in sites.csv
does not exist or is not listed in
manifest.jsonl
, or checksum mismatch.channels_present
declares a channel that never appears in sites.csv
.well_id
, site_id
, channel_name
,
z_index
).
Note: The platform does not impute or assign defaults. Unknown remains unknown; capability flags reflect only what is present.
Auto-generated on upload: the platform generates a canonical manifest.jsonl
over
every file and computes a Merkle dataset_root from it. These artifacts make the
package tamper-proof.
{"path":"raw/well_A01/site_1/channel_DNA.tif","size":4213340,"sha256":"9b2f...","mime":"image/tiff","role":"raw","uri":"s3://bucket/key","versionId":"<optional>"}
Allowed roles: role
∈ {raw
, qc
}.
Canonicalization: UTF-8 (no BOM), one JSON object per line; keys ordered as shown; lines
sorted by path
; newline \n
. Leaf hash: H(0x00 || line)
; node hash:
H(0x01 || left || right)
; hash = SHA-256. Odd leaf promotion (no duplication).
Verification on S3: With dataset_root
from attestation, anyone can fetch the
manifest, recompute the root, and verify each file’s sha256
after download.
Replace placeholder images with your actual OME‑TIFF/OME‑Zarr/TIFF
files following the defined structure. manifest.jsonl
, QC files, and attestation are generated by
the platform on upload.