VeriBiota Profile Catalog v0.1 – Spec¶
This catalog defines what VeriBiota profiles judge across the stack. Profiles are grouped by implementation tier.
| Profile | Status | Domain | Consumers |
|---|---|---|---|
| global_affine_v1 | Implemented (proof-backed anchors) | alignment | OGN, Helix |
| edit_script_v1 | Implemented (proof-backed anchors) | edit | Helix, OGN |
| edit_script_normal_form_v1 | Implemented (proof-backed anchors) | edit | Helix |
| prime_edit_plan_v1 | Implemented (contract-checked; anchors reserved) | CRISPR/Prime | Helix |
| pair_hmm_bridge_v1 | Implemented (contract-checked; anchors reserved) | HMM/variant | OGN |
| read_set_conservation_v1 | Schema/manifest only (no check route yet) |
pipeline | OGN |
| vcf_normalization_v1 | Tier 1 (contract-checked; anchors reserved) | variant | OGN, external |
| offtarget_score_sanity_v1 | Schema/manifest only (no check route yet) |
CRISPR | Helix |
| snapshot_signature_v1 | Emitted provenance record (proof-backed anchor) | provenance | Helix, VeriBiota |
Proof status note: “proof-backed anchors” means the referenced theorem IDs are non-placeholder Lean theorems in Biosim/VeriBiota/Theorems.lean. “anchors reserved” means the profile is implemented and tested, but one or more referenced theorem IDs are still placeholders today.
Conventions¶
- Profile name:
snake_case_vN(e.g.,global_affine_v1). - JSON schema:
schemas/<domain>/<profile>.schema.json. - Golden fixtures:
Tests/profiles/<profile>/*.json. - Manifest:
profiles/manifest.jsonmaps profile name → schema hash, theorem IDs. - Verdict JSON shape:
{
"profile": "<name>",
"profile_version": "X.Y.Z",
"status": "passed | failed",
"theorems": ["VB_..."],
"instance": { "...summary..." },
"engine": {
"veribiota_version": "...",
"lean_version": "...",
"build_id": "..."
}
}
- Exit codes:
0profile holds (passed);2profile checked but failed;1malformed input or internal error.
Tier 0 – Implemented Profiles¶
1. global_affine_v1 (Status: Implemented, contract locked)¶
- Purpose: Guarantee that an implementation’s global affine alignment result matches the formal DP specification and trace semantics.
- Intended consumers: OGN core aligner (CPU, GPU); any external tool claiming “VeriBiota-verified global alignment”.
- Inputs (conceptual):
seqA,seqBstrings;scoring{match,mismatch,gap_open,gap_extend};witnesswithscore(int) andtracelist of steps{ op: "M" | "I" | "D" }. - Verified properties:
- Trace validity: path from (0,0) to (|A|,|B|); no out-of-bounds or illegal transitions.
- Score consistency:
traceScore(seqA, seqB, scoring, trace) = witness.score. - DP optimality/equivalence: DP recurrence satisfied;
dpScore(seqA, seqB, scoring) = specGlobalAffine(seqA, seqB, scoring) = witness.score. - Instance summary: verdict includes
seqA_length,seqB_length,trace_length. - Theorem anchors:
VB_ALIGN_CORE_001(DP equals spec score);VB_ALIGN_CORE_002(trace from DP achievesdpScore).
2. edit_script_v1 (Status: Implemented, contract locked)¶
- Purpose: Guarantee that a claimed edit script coherently transforms
seqAintoseqB. - Intended consumers: Helix edit planner; any tool generating explicit edit scripts (CRISPR, Prime, patching).
- Inputs (conceptual):
seqA,seqB;editslist of{ type: "ins" | "del" | "sub", pos: int, len: int (for del/sub), payload: str (for ins/sub) }. - Verified properties:
- Well-defined application: edits apply sequentially without index under/overflow; positions and lengths respect current string state.
- Final result equality: applying edits left-to-right yields exactly
seqB. - Basic coherence: no degenerate no-op edits (unless later normalized); no overlapping ambiguous edits.
- Instance summary:
seqA_length,seqB_length,edit_count, andapplied_okboolean. - Theorem anchors:
VB_EDIT_001– edit script application is total and well-defined.
3. edit_script_normal_form_v1 (Status: Implemented, proof-anchored)¶
- Purpose: Provide a canonical normalized form so different tools agree on “the same edit”.
- Intended consumers: Helix planner; any diff/patch layer in OGN; future audit/tracking.
- Inputs:
seqA,seqB;editsarbitrary script; optionallynormalized_editsclaimed normal form. - Verified properties:
- Normalization correctness:
applyEdits(seqA, edits) = applyEdits(seqA, normalized_edits) = seqB. - Normal form invariants: no adjacent merges possible; minimal representation under chosen ordering (e.g., leftmost, shortest); no redundant
subthat is a match. - Idempotence: re-normalizing
normalized_editsyields the same script. - Theorem anchors:
VB_EDIT_001– base script correctness;VB_EDIT_002– normalization preserves semantics and is idempotent.
Implemented profiles with reserved theorem anchors¶
4. prime_edit_plan_v1 (Status: Implemented, anchors reserved)¶
- Purpose: Verify that a Prime Editing plan (guide, RT template, nicking strategy) corresponds to a desired net edit script.
- Intended consumers: Helix prime editor module (simulated); future recommendations, constraints, AI models.
- Inputs (conceptual):
ref_seq: reference context window.target_seq: intended edited sequence (local window).plan: pegRNA fields (spacer, PAM, RT template, PBS), optional nicking guide, cut positions,net_edit_script(reuse edit script schema) fromref_seqtotarget_seq.- Verified properties:
- Plan-to-edit linkage:
net_edit_scriptencodes exactly the delta betweenref_seqandtarget_seq; sites of change align with RT template design. - Internal consistency: PBS and RT lengths within allowed bounds; PAM positions consistent with spacer location.
- Relationship to
edit_script_v1: callsedit_script_v1on (ref_seq,target_seq,net_edit_script). - Theorem anchors:
VB_PE_001– reserved anchor for prime plan net-edit linkage;VB_EDIT_001reused for base script correctness.
5. pair_hmm_bridge_v1 (Status: Implemented, anchors reserved)¶
- Purpose: Tie DP-based alignment scoring to Pair-HMM likelihood scoring under a specified parameter mapping (small instances).
- Intended consumers: OGN variant calling stack; any HMM-based likelihood computation referencing DP.
- Inputs:
seqA,seqB;dp_scoringmatch/mismatch/gap penalties;hmm_paramstransition and emission probabilities (log space);dp_witnessDP score (optional trace);hmm_witnesslog-likelihood or Viterbi score. - Verified properties:
- Parameter mapping correctness: DP gap open/extend and mismatch penalties correspond to log transition/emission probabilities via the mapping.
- Score equivalence within tolerance: for finite sequences,
dp_scoreequals (or within rounding epsilon of)hmm_scoreunder the mapping. - Structural equivalence (optional): Viterbi path encoded by HMM corresponds to a valid DP alignment path.
- Theorem anchors:
VB_HMM_001– reserved anchor for mapping lemma (DP parameters ↔ HMM parameters);VB_HMM_002– reserved anchor for small-instance equivalence theorem.
6. read_set_conservation_v1 (Status: Schema/manifest only, planned)¶
- Purpose: Verify that a pipeline stage conserves the multiset of reads and sample labels modulo defined operations (e.g., marking duplicates, sorting).
- Intended consumers: OGN end-to-end pipeline; any ETL/transform stage in the stack.
- Inputs:
input_summary(read count, sample IDs, checksums per-sample/lane, optional k-mer sketch);output_summarysame structure;allowed_opsflags (e.g., sort, tag, mark-dup). - Verified properties:
- Count and identity preservation: for non-dropping stages, total counts match; sample IDs preserved without cross-sample mixing.
- Checksum or sketch consistency: aggregate hashes or sketches consistent with allowed operations; if dropping low-quality reads, the allowed pattern is codified.
- Theorem anchors (planned):
VB_PIPE_001– multiset preservation under pure reordering;VB_PIPE_002– constrained drop rules.
7. vcf_normalization_v1 (Status: Tier 1 – contract-checked, anchors reserved)¶
- Purpose: Guarantee that VCF entries are normalized (left-aligned, minimal representation) without changing variant meaning.
- Inputs: hashes/IDs of pre- and post-normalization VCFs; optional reference FASTA hash; per-variant records describing original vs normalized locus/ref/alt and operations applied.
- Verified properties:
- Variant preservation: canonicalized original equals canonicalized normalized (no gain/loss of variants).
- Left-alignment/minimality: normalized record matches canonical form (no shared prefix/suffix remains).
- Ref/alt consistency: ref/alt remain valid after trimming and stay aligned to the reference context.
- Idempotence: re-normalizing an already normalized variant yields the same variant.
- Theorem anchors (Tier 1, reserved today):
VB_VCF_001– normalization preserves semantics;VB_VCF_002– normalization is idempotent/unique in window.
Tier 2 – Future Profiles (Defined for alignment now)¶
8. offtarget_score_sanity_v1¶
- Purpose: Sanity-check off-target scoring fields (CRISPR/Prime design) for monotonicity and boundedness.
- Inputs:
siteslist with mismatch counts, distances to genomic landmarks, reported scores/probabilities. - Verified properties:
- Monotonicity: more mismatches must not yield higher risk scores than strictly fewer mismatches in the same context.
- Bounded probabilities: probabilities sum to at most 1 across mutually exclusive outcomes.
- Basic consistency: no NaNs, infinities, or out-of-range values.
- Theorem anchors (planned):
VB_OFF_001– monotonicity lemma over discrete mismatch counts.
9. snapshot_signature_v1 (Provenance binding)¶
- Purpose: Emit a provenance record binding a verification run to its input hash, schema hash/id, theorem IDs, and build metadata.
- Important: despite the name,
snapshot_signature_v1is not a cryptographic signature. Cryptographic signing is handled separately (Ed25519/JWS on checks/certificates). - Inputs:
snapshot_profile,snapshot_hash,schema_hash,schema_id,theorem_ids, build/version/timestamp, plus optionalinstance_summaryand provenance metadata. - Verified properties:
- Binding: the emitted document’s
schema_hash,schema_id, andtheorem_idsmirror the manifest entry used for the run. - Hash recording: the emitted
snapshot_hashrecords the tool-computedsha256:<hex>digest over the exact input bytes supplied for verification. - Theorem anchors:
VB_SIG_001– snapshot signature binds hash + manifest metadata (implemented).
How to Use This Spec¶
- Treat this file as the jurisdiction for what VeriBiota is allowed to verify.
- Wire Tier 0 profiles wherever needed; implement Tier 1 next; keep Tier 2 visible to guide design choices.
- Manifest (
profiles/manifest.json), schemas (schemas/...), and golden fixtures (Tests/profiles/...) should stay in sync with these definitions.