How CellDAO Works: A Walkthrough of the Stack

CellDAO is designed to standardize, align, and continuously learn from high-value cellular datasets. Below is an end-to-end view of how raw experimental output becomes part of an evolving Virtual Cell.


1. Data Generation & Local Packaging

Labs perform experiments (imaging, perturbation screens, single-cell omics). Alongside raw files, they export a preliminary metadata bundle: plate maps, channel descriptions, perturbation annotations, acquisition logs.


2. OMS Manifest Construction

A local tooling CLI ingests the preliminary bundle and produces an OMS manifest:

Validation runs, flagging missing or ambiguous fields for user correction.


3. Feature Extraction & Processing Trace

Containerized pipelines perform segmentation and feature extraction (e.g., cell morphology vectors). Each step appends to the processing trace:

Outputs: per-cell or per-well feature matrices, QC flags, lineage links back to raw assets.


4. Submission & Integrity Anchoring

The dataset (raw assets optional, processed features mandatory) plus manifest is prepared:

  1. Content hashes computed (raw subset, features, manifest).
  2. An integrity anchor (hash of hash set) optionally registered on-chain as a contribution record.
  3. Dataset registered in a discovery index with search facets.

5. Ingestion & Standardization Layer

On the network side:


6. Virtual Cell Training Pipeline

A scheduled (or continuous) training job:

Model checkpoints record dataset contribution proportions (for attribution or incentives).


7. Inference & Simulation Services

APIs expose:


8. Active Learning Loop

The system ranks candidate experiments by expected information gain:

Suggested experiments feed back to labs, closing the loop.


9. Incentive & Governance Layer (Optional / Progressive)

If a DAO / token layer is enabled:


10. Observability & Audit

Dashboards track:


Summary Flow Diagram (Conceptual)

Raw Data -> Local OMS Packaging -> Feature Extraction + Trace -> Submission & Integrity Anchor -> Ingestion Standardization -> Virtual Cell Training -> Inference & Simulation -> Active Learning Suggestions -> (Optional) Incentive Distribution & Governance


Conclusion

CellDAO operationalizes a virtuous cycle: standardized data fuels better models; better models propose informative experiments; new data refines the system. The stack is intentionally modular so labs can adopt components incrementally while moving toward a fully integrated Virtual Cell ecosystem.