We standardize cell data, publish it openly, and tie model usage to on‑chain rewards for the datasets that trained it. Ship plates, earn attribution, and help a biology foundation model learn faster.
Think “flight simulator” for cells: explore interventions safely before committing bench time.
A strict‑where‑it‑matters schema so morphology + perturbation data “just ingests”. Unit of exchange is the plate
(optionally bundled with per‑cell features). Captures provenance, perturbations, acquisition, processing trace, QC, and lineage with controlled vocabularies for channel roles & dose units.
Public index of OMS bundles pinned to decentralized storage. Each dataset has a verifiable on‑chain attestation of content hashes, license, and beneficiary, enabling transparent lineage & attribution.
Read / validate / export OMS, auto‑normalize channel semantics, compute hashes & QC, generate manifests, and interact with the registry + training pipeline. Works as a CLI or library to layer onto existing LIMS.
A continuously trained, multimodal foundation model of cellular state. Encoders fuse imaging + omics + perturbation context into a shared latent space; a dynamics module simulates dose‑time response; decoders reconstruct modalities; uncertainty heads flag out‑of‑distribution inputs.
Use a hosted endpoint (SLAs, compliance) or run open weights locally. Every served model version publishes a manifest of training datasets + proportional contribution weights for transparent reward calculation.
Biology is multimodal, perturbation‑centric, temporal, sparse, and metadata‑sensitive. Generic text/vision models lack explicit dose/time conditioning, causal awareness, acquisition context, and calibrated uncertainty required for experimental decisioning.
Rewards are simple: only included datasets in a served model version accrue proportional payouts. Attribution is grounded in content hashes; each checkpoint stores a vector of dataset contribution weights (e.g. gradient / usage share) that parameterize distribution.
┌───────────┐ ┌──────────────┐ │ Data │ ───────────▶ │ Registry │ └───────────┘ └──────────────┘ ▲ │ │ │ │ ▼ ┌───────────┐ ┌──────────────┐ │ Revenue │ ◀─────────── │ Virtual Cell │ └───────────┘ └──────────────┘
What is a Virtual Cell?
A biology‑native foundation model: integrates modalities into a calibrated cell‑state, predicts perturbation dose‑time responses, attributes pathways, and quantifies uncertainty.
Why standardization first?
Without shared provenance / perturbation / channel semantics, models overfit lab artifacts and cannot generalize. OMS makes morphology + context interoperable.
Why crypto?
To make attribution & payouts automatic, tamper‑evident, and portable. Hashes anchor dataset identity; smart contracts distribute usage‑based rewards. Crucially, on‑chain incentives create a positive flywheel: datasets that are included and contribute materially to served models receive transparent, proportional rewards and reputation signals, which encourages higher‑quality submissions, better metadata, and sustained participation that improves model performance over time.
What earns?
Datasets whose features materially contribute to a served model (inclusion & weighted usage), not mere uploads. Transparent manifests show proportions.
How do you handle quality?
QC metrics & standardized flags are stored, allowing weighting and exclusion. Low‑quality regions lower contribution weight.
How are datasets licensed?
All datasets contributed to the registry are licensed for reuse with attribution: we default to CC‑BY for individual dataset content and ODC‑BY for dataset/collection database rights. Licensing metadata is stored with each manifest so reuse terms and attribution obligations are transparent.
What about Privacy / PII?
Contributors must remove or redact personally identifiable information before submission. Manifests support redaction flags and feature‑only bundles (no raw images). We provide guidance and tooling to help anonymize data; sensitive fields can be omitted from public manifests or held under restricted access where required.