VCF Normalization Profile (vcf_normalization_v1)¶
vcf_normalization_v1 is a Tier 1 semantic profile that verifies VCF normalization steps: variants must remain semantically identical while being left-aligned and minimally represented.
What it checks¶
- Variant preservation: canonicalized original equals canonicalized normalized (no gain/loss of truth).
- Left-alignment/minimality: no redundant shared prefix/suffix remains; locus updates reflect any prefix trimming.
- Ref/alt consistency: ref/alt remain valid after trimming.
- Idempotence: re-normalizing an already normalized variant is a fixed point.
Schema shape (summary)¶
schemas/variant/vcf_normalization_v1.schema.json:
input_vcf_hash,normalized_vcf_hash: content hashes (e.g.,sha256:<hex>).reference_fasta_hash(optional).variants[]entries:locus:chrom,posoriginal:ref,alt[]normalized:pos,ref,alt[]operations: optional list likeleft_trim,right_trim,left_align,split,merge,pad_left,pad_right.summary: optional counts (trimmed/split/invalid).
CLI usage¶
veribiota check vcf vcf_normalization_v1 path/to/input.json --snapshot-out sig.json --compact
Output: JSON verdict with variant_count, normalized_ok_count, all_variants_normalized, hashes, theorem IDs (VB_VCF_001, VB_VCF_002), and engine metadata.
Snapshot: --snapshot-out writes a snapshot_signature_v1 document binding the input hash, schema hash, theorem IDs, and build metadata to the result.
Example input¶
{
"input_vcf_hash": "sha256:aaaaaaaa...aaaaaaaa",
"normalized_vcf_hash": "sha256:bbbbbbbb...bbbbbbbb",
"variants": [
{
"locus": { "chrom": "chr1", "pos": 1 },
"original": { "ref": "AC", "alt": ["A"] },
"normalized": { "pos": 2, "ref": "C", "alt": [""] },
"operations": ["left_trim"]
}
]
}
Run:
veribiota check vcf vcf_normalization_v1 examples/vcf_norm.json --snapshot-out sig.json
- Verdict:
all_variants_normalized: true,status: "passed". - Signature:
sig.jsoninsnapshot_signature_v1format for provenance.