Token-Budget Assembly

assemble_for_model() produces a prompt that is guaranteed to fit within a model's context window. It counts tokens accurately with tiktoken, orders sections by priority, and trims only what doesn't fit — no guesswork, no silent overflow.

Why It Matters

Standard assemble() returns the full prompt without checking whether it fits. For large knowledge documents, long examples, or complex constraints, this can exceed the model's context window and cause truncation or errors at the API level — silently, or with a cryptic token-limit error.

assemble_for_model() solves this by:

Counting every section's tokens accurately (per-model with tiktoken)
Including sections in priority order: role → directive → rules → constraints → knowledge → examples
Trimming the lowest-priority content to fit when the budget is tight
Always including the core role and directive

Basic Usage

from mycontext import Context, Guidance, Directive, Constraints

ctx = Context(
    guidance=Guidance(
        role="Senior security engineer",
        goal="Find all exploitable vulnerabilities",
        rules=["Flag every OWASP Top 10 risk", "Suggest concrete fixes"],
    ),
    directive=Directive("Audit this authentication middleware."),
    knowledge="[OWASP Top 10 2023 full text — 12,000 tokens]",
    constraints=Constraints(must_include=["severity rating", "code fix"]),
)

# Fit within gpt-4o-mini's default window
prompt = ctx.assemble_for_model(model="gpt-4o-mini")
print(f"Prompt: {len(prompt)} chars")

# Hard cap at a custom budget (e.g., leave room for response tokens)
prompt = ctx.assemble_for_model(model="gpt-4o", max_tokens=3000)

Section Priority Order

When the budget is tight, sections are included in this order, and the last ones are trimmed first:

Priority	Section	Always included?
1	Role (`guidance.role`)	Yes
2	Directive (task)	Yes
3	Goal	When space allows
4	Rules	When space allows
5	Constraints	When space allows
6	Knowledge	Trimmed first when tight
7	Examples	Trimmed when tight
8	Style / expertise	Trimmed last

The role and directive are always included — they define what the LLM is and what it needs to do. If even these exceed the budget, a ValueError is raised.

Model-Specific Budgets

Pass model to use the model's known context window:

# GPT-4o-mini — 128k tokens
prompt = ctx.assemble_for_model(model="gpt-4o-mini")

# GPT-4o — 128k tokens
prompt = ctx.assemble_for_model(model="gpt-4o")

# Claude 3.5 Sonnet — 200k tokens
prompt = ctx.assemble_for_model(model="claude-3-5-sonnet-20241022")

# Custom budget — useful inside agentic loops where you reserve space for history
prompt = ctx.assemble_for_model(model="gpt-4o", max_tokens=4000)

When tiktoken doesn't know the model, it falls back to cl100k_base encoding (GPT-4 standard).

Token Counting Utilities

The token counting functions are available standalone:

from mycontext.utils.tokens import count_tokens, fits_in_window, token_budget_remaining

# Count tokens for a string + model
n = count_tokens("Hello, world!", model="gpt-4o-mini")  # → 4

# Check if a prompt fits
ok = fits_in_window("Very long text...", model="gpt-4o-mini")  # → True/False

# How many tokens remain after a string
remaining = token_budget_remaining("System prompt text", model="gpt-4o-mini")
# → 127983 (128000 - 17)

# Estimate cost
from mycontext.utils.tokens import estimate_cost_usd
cost = estimate_cost_usd(input_tokens=1000, output_tokens=500, model="gpt-4o-mini")
# → 0.000225 (USD)

With Long Knowledge Documents

assemble_for_model() is especially useful when you inject retrieved documents into knowledge:

from mycontext import Context, Guidance, Directive

# Large retrieved context — 8,000 tokens
retrieved_docs = load_documents(query)

ctx = Context(
    guidance=Guidance(role="Research analyst"),
    directive=Directive("Summarize the key findings from the attached documents."),
    knowledge=retrieved_docs,
)

# Fits exactly within 4k budget, trimming knowledge if needed
prompt = ctx.assemble_for_model(model="gpt-4o-mini", max_tokens=4000)
result = ctx.execute(provider="openai", model="gpt-4o-mini")

Combining with Async Execution

async def analyze_with_budget(docs: str, question: str) -> str:
    ctx = Context(
        guidance=Guidance(role="Expert analyst"),
        directive=Directive(question),
        knowledge=docs,
    )
    # Build budget-aware prompt first
    prompt = ctx.assemble_for_model(model="gpt-4o-mini", max_tokens=8000)
    
    # Execute with the trimmed context
    result = await ctx.aexecute(provider="openai", model="gpt-4o-mini")
    return result.response

Installation

assemble_for_model() requires tiktoken for accurate token counting:

pip install tiktoken
# or
pip install "mycontext-ai[tokens]"

Without tiktoken, the method falls back to a character-based estimate (approximately 4 chars per token). The fallback is safe but less precise.

Reference

Method	Signature	Description
`ctx.assemble_for_model`	`(model, max_tokens?)`	Build token-budget-aware prompt
`count_tokens`	`(text, model)`	Count tokens for a string
`fits_in_window`	`(text, model)`	Check if text fits
`token_budget_remaining`	`(text, model)`	Remaining tokens after `text`
`estimate_cost_usd`	`(input_tokens, output_tokens, model)`	Estimated USD cost

Related:

Why It Matters​

Basic Usage​

Section Priority Order​

Model-Specific Budgets​

Token Counting Utilities​

With Long Knowledge Documents​

Combining with Async Execution​

Installation​

Reference​