Live example

This example uses the built pelion.judgment module. It runs against real LLM APIs and consumes real budget on those APIs.

A copy-pasteable Python example that produces a real verdict from the built frontier-model council. Takes under a minute end to end once the API keys are set.

Prerequisites

Python 3.11 or later. API keys for at least two of the three supported providers (Anthropic, OpenAI, Google).

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

You don’t need all three. The default min_responses is 2, so two providers are sufficient. Skipping a provider just means it won’t participate in the council.

Install

pip install 'pelion[frontier]'

The frontier extra pulls in the Anthropic, OpenAI, and Google SDKs. The core Pelion package is lightweight.

Run this

import asyncio
from pelion.schemas import Question, EvidencePolicy
from pelion.judgment import FrontierModelClient
from pelion.judgment.providers import (
    AnthropicProvider,
    OpenAIProvider,
    GeminiProvider,
)


async def main():
    client = FrontierModelClient(
        providers=[
            AnthropicProvider(),
            OpenAIProvider(),
            GeminiProvider(),
        ],
        min_responses=2,
        per_provider_timeout_s=60.0,
    )

    question = Question(
        version="0.1",
        question_id="0x" + "a" * 64,
        requester="0x" + "b" * 40,
        text="Did humans first land on the Moon in 1969?",
        resolution_criteria="YES iff Apollo 11 landed in 1969.",
        resolution_time=1_577_836_800,
        expiration_time=1_577_923_200,
        evidence_policy=EvidencePolicy(
            allowed_source_categories=["general_knowledge"],
            allowed_domains=[],
            max_age_hours=100_000,
            min_source_count=1,
        ),
        subnet_routing=[0],
        reward="0",
        bond_amount="0",
        submitted_at=1_577_750_400,
    )

    verdict = await client.judge(question)

    print(f"Outcome: {verdict.outcome_label}")
    print(f"Confidence: {verdict.confidence} basis points")
    print(f"Consensus rationale: {verdict.reasoning.consensus_rationale}")
    print(f"Providers that responded: {len(verdict.reasoning.miner_provenance)}")


asyncio.run(main())

Expected output

Something like this (text varies by model):

Outcome: YES
Confidence: 9950 basis points
Consensus rationale: 3/3 providers voted YES.
Providers that responded: 3

Each provider’s individual reasoning is stored in verdict.reasoning.miner_provenance if you want to inspect the per-provider outputs.

Try a harder question

The moon-landing question is easy. Every frontier model knows the answer from training. To see the council handle disagreement, try a genuinely ambiguous question.

question = Question(
    version="0.1",
    question_id="0x" + "c" * 64,
    requester="0x" + "b" * 40,
    text="Is the chicken or the egg evolutionarily first?",
    resolution_criteria="YES iff the chicken predates the egg in evolutionary history.",
    resolution_time=1_577_836_800,
    expiration_time=1_577_923_200,
    evidence_policy=EvidencePolicy(
        allowed_source_categories=["general_knowledge"],
        allowed_domains=[],
        max_age_hours=100_000,
        min_source_count=1,
    ),
    subnet_routing=[0],
    reward="0",
    bond_amount="0",
    submitted_at=1_577_750_400,
)

This question has a defensible biological answer but the criteria are ambiguous. The council may split or return UNRESOLVABLE. Either outcome is legitimate. The verdict’s per-provider reasoning shows how each model handled the ambiguity.

Cost and latency

Per call, you’re paying for one API request to each enabled provider. At current frontier model pricing, this is roughly a few cents per query total. Latency is dominated by the slowest provider, typically 5 to 30 seconds end to end. The per_provider_timeout_s setting caps individual provider latency. If a provider is slower than the timeout, it is dropped and the others continue. min_responses=2 means the verdict still resolves as long as at least two providers respond in time.

What this demonstrates

The code path exercised in this example is the same code path that runs inside a production Pelion miner on Bittensor. The miner wraps FrontierModelClient behind an Axon, but the judgment logic is identical. That means the accuracy you observe running this example is a lower bound on the accuracy of the Pelion subnet. The real subnet adds retrieval, validator scoring, and multi-miner aggregation on top of this base. See Repository and modules for the broader picture of how this fits together.

Introduction

The problem

How Pelion works

Token and treasury

What's built

Reference

Prerequisites

Install

Run this

Expected output

Try a harder question

Cost and latency

What this demonstrates

Introduction

The problem

How Pelion works

Token and treasury

What's built

Reference

Documentation Index

​Prerequisites

​Install

​Run this

​Expected output

​Try a harder question

​Cost and latency

​What this demonstrates

Prerequisites

Install

Run this

Expected output

Try a harder question

Cost and latency

What this demonstrates