Tutorial 4: LLM Pipeline

Learn how to clean, repair, and validate LLM-generated STL output.

What you’ll learn:

  • Clean raw LLM output with clean()
  • Repair structural issues with repair()
  • Run the full 3-stage pipeline with validate_llm_output()
  • Generate LLM prompt templates with prompt_template()
  • Inspect repair actions

Prerequisites: Tutorial 3: Schema Validation


The Problem

LLMs often produce malformed STL. Common issues include:

  • Using => or --> instead of ->
  • Omitting :: prefix on modifiers
  • Missing brackets around anchor names
  • Unquoted string values
  • Confidence values out of range (e.g., 1.5)
  • Spelling errors in modifier keys (e.g., confience)
  • Wrapping output in code fences or adding prose

The LLM pipeline fixes these automatically through three stages: clean → repair → parse.

Step 1: The Full Pipeline

validate_llm_output() runs all three stages in one call:

from stl_parser import validate_llm_output

raw = "Einstein => Relativity mod(confience=1.5)"

result = validate_llm_output(raw)

print(f"Valid: {result.is_valid}")
print(f"Statements: {len(result.statements)}")
print(f"Repairs: {len(result.repairs)}")
print(f"Cleaned text: {result.cleaned_text}")

for r in result.repairs:
    print(f"  [{r.type}] {r.description}")

The result is an LLMValidationResult with:

  • is_valid — whether the repaired text parsed successfully
  • statements — parsed Statement objects
  • repairs — list of RepairAction objects describing what was fixed
  • cleaned_text — the text after cleaning and repair
  • original_text — the original input

Step 2: Stage 1 — Clean

The clean() function handles text-level normalization:

from stl_parser.llm import clean

raw = """
Here's the STL output:

```stl
[Einstein] => [Relativity] ::mod(
  rule="empirical",
  confidence=0.98
)

This shows the relationship between Einstein and his theory. """

cleaned_text, repairs = clean(raw) print(cleaned_text)

[Einstein] -> [Relativity] ::mod(rule=“empirical”, confidence=0.98)

for r in repairs: print(f” {r.description}”)


Clean handles:
- Extracting STL from code fences
- Stripping prose lines
- Normalizing arrow variants (`=>`, `-->`, `→`) to `->`
- Merging multi-line statements
- Removing markdown escapes

## Step 3: Stage 2 — Repair

The `repair()` function fixes structural issues:

```python
from stl_parser.llm import repair

text = "[A] -> [B] mod(confidence=1.5, confience=0.8)"

repaired_text, repairs = repair(text)
print(repaired_text)

for r in repairs:
    print(f"  [{r.type}] {r.original} -> {r.repaired}")
    print(f"    {r.description}")

Repair handles:

  • Adding missing :: prefix before mod(
  • Adding missing brackets around bare anchor names
  • Clamping out-of-range values (e.g., confidence=1.5confidence=1.0)
  • Fixing common typos in modifier keys
  • Quoting unquoted string values

Step 4: Inspect Repair Actions

Each RepairAction provides detailed information:

from stl_parser import validate_llm_output

result = validate_llm_output("A => B mod(confience=1.5)")

for r in result.repairs:
    print(f"Type:        {r.type}")
    print(f"Line:        {r.line}")
    print(f"Original:    {r.original}")
    print(f"Repaired:    {r.repaired}")
    print(f"Description: {r.description}")
    print()

Step 5: Pipeline with Schema Validation

Add a schema to validate the repaired output against domain constraints:

from stl_parser import validate_llm_output, load_schema

schema = load_schema("docs/schemas/causal.stl.schema")

result = validate_llm_output(
    "[Rain] -> [Flooding] mod(rule=causal, confidence=0.85, strength=0.8)",
    schema=schema
)

print(f"Valid: {result.is_valid}")
print(f"Schema result: {result.schema_result}")

Step 6: Generate LLM Prompt Templates

Use prompt_template() to generate instruction prompts for LLMs:

from stl_parser import prompt_template

# Basic prompt
prompt = prompt_template()
print(prompt)
# Outputs STL syntax instructions suitable as a system message

With a schema, the prompt includes domain-specific constraints:

from stl_parser import prompt_template, load_schema

schema = load_schema("docs/schemas/medical.stl.schema")
prompt = prompt_template(schema=schema)
print(prompt)
# Includes: required fields, anchor patterns, value ranges

Step 7: CLI Usage

# Clean a file with LLM output
stl clean llm_output.txt

# Show repair actions
stl clean llm_output.txt --show-repairs

# Clean and validate against schema
stl clean llm_output.txt --schema docs/schemas/causal.stl.schema

# Save cleaned output
stl clean llm_output.txt --output cleaned.stl

Complete Example

from stl_parser import validate_llm_output, load_schema

# Simulated messy LLM output
raw_llm_output = """
Based on medical knowledge:

```stl
Aspirin => PainRelief mod(
  rule=causal,
  confience=0.88,
  strength=0.80,
  source=doi:10.1234/pharma
)
Smoking --> LungCancer ::mod(
  rule="causal",
  confidence=1.2,
  strength=0.85
)

These are well-established medical relationships. """

Run full pipeline

result = validate_llm_output(raw_llm_output)

print(f”Valid: {result.is_valid}”) print(f”Statements parsed: {len(result.statements)}”) print(f”Repairs applied: {len(result.repairs)}”) print()

Show what was fixed

for r in result.repairs: print(f” [{r.type}] {r.description}”)

print()

Show parsed statements

for stmt in result.statements: print(f” {stmt}”)


---

**Next:** [Tutorial 5: Querying](05-querying.md)