AI-Powered Developer Tools

Ship reliable AI.
Catch bugs faster.

Two precision tools for teams building with LLMs. Detect hallucinations before your users do. Locate bugs before they ship.

Explore Products Get Early Access
HalluGuard AUROC 0.729
BugProbe Behavior Graph
0.729
AUROC on HaluEval 10K
<1s
Per-document detection
0 GPU
Required for inference
7/7
Financial demo accuracy
Products

Two tools. One mission.
Trustworthy AI.

Purpose-built for teams that need to verify LLM outputs and catch bugs at the model level.

HalluGuard

Hallucination detection
that actually works.

Document-grounding approach that catches fabricated claims, altered numbers, and invented facts — without GPU overhead.

Claim-level analysis

Splits summaries into individual claims and verifies each against source documents.

Multi-granularity matching

Keywords, n-grams, numbers, entities, and continuous substrings — all cross-referenced.

Zero GPU required

Pure text matching. Runs on any CPU in under 1 second per document.

Number tampering detection

Catches altered figures like 308.76 → 425.50 via numerical match scoring.

halluguard_demo.py
from halluguard import detect

result = detect(
    source="Revenue: ¥174.1B (+15.71%)",
    summary="Revenue reached ¥215.6B"
)

# result.risk = 0.65 (RED)
# result.flags = ["num_mismatch: 174.1→215.6"]
# result.claim_scores = [0.14]
Risk Level RED — 0.65
Detection Number tampering detected
Latency 0.3ms

Validated at scale.

Tested on HaluEval summarization benchmark with consistent results across sample sizes.

200
AUROC 0.695
1,000
AUROC 0.714
5,000
AUROC 0.714
10,000
AUROC 0.729

Stable performance from 200 to 10,000 samples — no overfitting detected.

Live Demo

Real-world financial document analysis.

End-to-end test with a Chinese public company annual report (585 characters).

Query Model Response HalluGuard Correct?
Revenue & growth rate? ¥174.1B, +15.71% GREEN 0.29
Production volume? 56,221 tons, 46.06% GREEN 0.14
R&D ratio? Dividend? 1.08%, ¥308.76 YELLOW 0.32
Overseas revenue? (not in doc) "Not mentioned" GREEN 0.33
Employee count? (not in doc) "Cannot determine" GREEN 0.31
⚠️ Fabricated: ¥215.6B + overseas ¥38B RED 0.65
⚠️ Fabricated: ¥425.50 dividend + ¥1B buyback RED 0.65
BugProbe

Find bugs by watching
how models think.

WLBS behavior graph analysis traces model internals to pinpoint where code goes wrong — at the activation level.

Behavior graph scanning

Maps model activation patterns to identify anomalous computation paths.

Curvature-based detection

Tracks gradient curvature through layers to spot where the model "hesitates".

Backpropagation analysis

Reverse-traces error signals to locate the exact layer and attention head responsible.

Web dashboard

Visual interface for exploring behavior graphs and drilling into specific bug patterns.

bugprobe_scan.py
from bugprobe import scan

report = scan(
    model="qwen3-8b",
    code="def sort(arr): ...",
    test_input=[[3, 1, 2]]
)

# report.bugs = [
#   Bug(layer=14, head=7,
#       type="off-by-one",
#       confidence=0.89)
# ]
Bugs Found 1 issue
Location Layer 14, Head 7
Confidence 89%

Ready to build
trustworthy AI?

Get early access to HalluGuard and BugProbe. We're onboarding teams now.