Advancements in Constrained Decoding with ChopChop
Published: 2026-02-03
Tags: paper, research, llm, ai, structured-output
Abstract
Overview and discussion of the new paper ChopChip: A Programmable Framework for Semantically Constraining the Output of Language Models.
§ Introduction
Recently I read ChopChop: A
Programmable Framework for Semantically Constraining the Output of
Language ModelsACM Digital Library
The ACM
Digital Library is a comprehensive database of full-text articles and
bibliographic records covering the fields of computing and information
technology, published by the Association for Computing Machinery and its
affiliated organizations.ChopChop: A Programmable Framework for
Semantically Constraining the Output of Language Models from
POPL 2026. I found the paper to be a relatively interesting forwarding
of the current ideas in structured output via JSON schema to more
semantic properties such as typing. Here's the abstract:
Language models (LMs) can generate code but cannot guarantee its correctness—often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure.
We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.
The main interesting aspects of this work are: - ChopChop is programmable - ChopChop generalizably connects token-level constraints to semantic constraints
The .grammar
https://docs.dottxt.ai/
.txt
documentation
.grammar project by
.txt
https://dottxt.ai
.txt is an
AI engineering company focused on structured outputs for LLMs.
.txt also
seems related to this, but it is proprietary.
§ Constrained Decoding
At a high level, generative LLMs work like this:
input text
--( encode )-->
input embedding
--( extrapolate )-->
output embedding
--( decode ) -->
output textThis works surprisingly well for getting sensible natural language responses to natural language queries. But, for many applications, a structured format for the output is required. For example, the output might be used as the input to an automated process which requires input of a very specific format.
Of course, structure comes in many different flavors and levels of
complexity. One fairly simple framework for constrained generation been
widely adopted: structured
output
https://ai.google.dev/
Google
AI API documentation
structured output via
JSON schemashttps://json-schema.org
JSON
Schema enables the confident and reliable use of the JSON data
format.JSON schemas. This
framework allows the decoding of an LLM to be constrained to fit a
particular JSON Schema. For example, to represent a record for a person
that has name, age, and email fields we can use a JSON schema something
like this:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "number" },
"email": { "type": "string" }
},
"required": ["name", "age"]
}Here, the "name" and "email" fields must be
strings, the "age" field must be a number, and only
"name" and "age" are required fields. As per
the name, JSON schemas are schemas only for JSON values. JSON schemas
are schemas for standard JSON values, but the schemas themselves have
several common specification (but Open API 3
https://spec.openapis.org
This
site contains the authoritative HTML renderings of the OpenAPI
Initiative’s specifications and extension registries.
Open API 3 is probably the
most common). There are some more advanced features such as oneOfhttps://json-schema.org
JSON
Schema enables the confident and reliable use of the JSON data
format.oneOf. One
cool thing about JSON schemas is there is a JSON schema that is the
schema for all JSON schemas.
Essentially, JSON schemas are a format for specify a grammar over
JSON-shaped data. This grammar gives rise to a parsing algorithm for
checking if a string is in the grammar. There are standard techniques for designing such parsing
algorithms to efficiently work left-to-righthttps://hackernoon.com/
HackerNoon
is a community-driven online publishing platform that focuses on
technology, software development, blockchain, and artificial
intelligence.standard techniques for designing
such parsing algorithms to efficiently work left-to-right, so
that a string can be considered as a prefix to the entire value, and can
be queried for "is this string a prefix of a valid value?"
Now, to connect this all to text generation with LLMs. Once we have this prefix-checking algorithm, we can use it to filter which tokens that an LLM can decode to. Instead of just choosing from among the most highly weighted tokens as the token to generate next in the output stream, only tokens that Maintain the output stream as a prefix of a valid output are allowed, and we choose from among the highest weight of those tokens. This guarantees that the output stream will be valid according to the schema that determined the prefix checker.
You can also do some more fancy stuff here, like if none of the possible tokens have a very high weight, then we can try backtracking to generate a different trajectory that we might have more luck with.
This prefix checking based on first-order properties like JSON schema validation is great for many applications. I've experimented with a few myself:
- Mtgen
https://mtgen.fly.dev
Mtgen is a platform for generating and sharing AI generated custom Magic: The Gathering cards.Mtgen is a custom Magic the Gathering card generator, which uses structured output to structure each card's details for the sake of controlled image generation and structured querying.
- Visual Novel Engine
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.Visual Novel Engine is a platform for generating and playing interactive visual novels. it generates and extracts a structured representation of the story and the assets involved using structured output.
- pinion-cyoa
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.pinion-cyoa: Pinion CYOA is a platform for generating and playing text-based choose-your-own-adventure-games. It uses structured output to maintain a structured representation of the game state and what options the player is presented with.
- OpenAGI Branching
Agent
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.OpenAGI Branching Agent is a project I built during Computer Use: Hack Night at Github
https://hackersquad.io/
A platform for hackathon organization.Computer Use: Hack Night at Github that allows the agent to navigate a graph the corresponds to a flowchart of tasks. It decides how to navigate using structured output.
- small-language-games
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.small-language-games is a collection of LLM-based experiments with a focus on structured output. For example, it includes an LLM-powered text adventure game
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.LLM-powered text adventure game with structured world state, an LLM-powered UI generator
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.LLM-powered UI generator using an AST-like representation of the UI, and a generative filesystem simulator
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.generative filesystem simulator where the filesystem has a structured representation that the user and the LLM both interact with via structured actions.
- FPV Drone Hackathon
Project
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.FPV Drone Hackathon Project is a project I submitted to the the AI+ Expo Hackathon
https://expo.scsp.ai/hackathon/
Placeholder description for https://expo.scsp.ai/hackathon/
AI+ Expo
Hackathon, in which my team (I and a friend) won 3rd
place. - llm-sim
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.llm-sim: LLM-sim is a simple state-action framework for using an LLM to interpret vague user instructions and structured actions, using structured output.
- latent-state-machine
GitHub
GitHub is a web-based platform that provides hosting for software development and version control using Git, allowing developers to store, manage, and collaborate on code projects.latent-state-machine: Latent State Machine is a demonstration of encoding the an agent's state as a state machine, where the agent navigates the state machine using structured output formatted to the transitions.
However, clearly there is a lot more to be done here because JSON schemas are extremely limited compared to the types of constraints that we would like to have on LLM outputs.
§ Beyond Token-level Analysis
ChopChopACM Digital Library
The ACM
Digital Library is a comprehensive database of full-text articles and
bibliographic records covering the fields of computing and information
technology, published by the Association for Computing Machinery and its
affiliated organizations.ChopChop demonstrates a way to
extend the same approach to prefix checking to more semantic properties
desired of the LLM output.
The extra piece of constrained decoding that ChopChop brings is semantic pruning. All together, ChopChop has a prefix checker that calculates the space of valid ASTs that the output stream is a prefix of, and prunes that space by the semantic pruning predicate. A token is only valid if, upon being appended to the output stream, the space of semantically valid completions is inhabited.
This framework is programmable because you can provide your own grammar and semantic pruning predicate. The framework connects token-level constraints to AST-level constraints by prefix checking for valid ASTs, and then semantically considering those ASTs.
What's so interesting about this approach is that it is a feasible way to enforce any semantic predicate over the output, and ChopChop's implementation shows this can be done relatively efficiently even for complex constraints.
ChopChop demonstrates two semantic pruning examples: - equivalent to a reference program modulo term rewriting - well-typed
§ Implementation
ChopChop uses co-induction to encode the realizability and pruning problem for a potentially infinite set of solutions. So instead of representing the set of possible ASTs as a literal set (or list, or array), it's represented as a coinductive space. For example, for the grammar
data Expr
= Lit String
-- ^ A numeric literal
| Sum Expr Expr
-- ^ A numeric sum operationthe coinductive space of ASTs is encoded as
data ExprSpace
= Empty
-- ^ An empty space
| Union [ExprSpace]
-- ^ A union of several spaces
| Lit Regex
| Sum ExprSpace ExprSpaceThis corresponds to a version space where Union is the
union node and Sum is the join node. This encoding is able
to encode an infinite space of expressions with a finite representation.
It's like an extension of regular expressions to tree-structured data
(in particular, ASTs defined by context free grammars).
For example, the space of all integer literal expressions is
represented by integers:
integers =
let integer = Lit "[0-9]+" in
Union [integer, Sum integers integer]What's very special about these spaces is that we can write
predicates over their finite representations in order to constrain the
encoded infinite space. For example, odds is a predicate
for only the odd-valued expressions of a given space.
odds Expr = Expr
odds (Union xs) = Union (map odds xs)
odds (Lit re) = Lit (re `intersect` "[0-9]*[13579]")
odds (Sum l r) = Sum (odds l) (odds r)Here, odds integers is the space of all odd integers,
finitely represented. However, if we eagerly evaluate
odds integers, Then the evaluation will proceed infinitely
because there is a cycle in the definition of integers. So,
ChopChop uses a specialized solver that tracks previously expanded terms
and creates an explicit cycle when a subterm is revisited, which escapes
the infinite evaluation. There is a theorem that spaces of this form
that can be predicated with finite checking using this memoization
specialization.
There are more details in the
ChopChop paper
https://arxiv.org/abs/1706.03762
Placeholder
description for https://arxiv.org/abs/1706.03762
the ChopChop paper about the
specifics of the design and implementation of the full ChopChop
system.
§ Results
The ChopChop team found that the semantically constrained generation
to be equivalent (up to term rewriting) to a reference program
dramatically improved the success rate of generating behaviorally
correct programs across a few benchmarks with some open source models
(DeepSeek-Coder-6.7b, CodeLlama-7B, CodeLlama-13B) in comparison to
unconstrained generation or even grammar constrained generation. See
Table 1 in the ChopChop
paper
https://arxiv.org/abs/1706.03762
Placeholder
description for https://arxiv.org/abs/1706.03762
the ChopChop paper for the
full results.
With these improvements, there was an overhead on how many tokens need to be churned through to get successful results. For equivalence constrained generation, the vast majority of the time, less than 20 tokens were tried per successful token. For well-typedness constrained generation, the vast majority of the time, less than 60 tokens were tried per successful token. Each try was a query to the realizability solver to decide whether the token was a valid next token or not.
§ Conclusions
What other kinds of predicates can be encoded as semantic pruners in
ChopChop? The ChopChop
paper
https://arxiv.org/abs/1706.03762
Placeholder
description for https://arxiv.org/abs/1706.03762
The ChopChop paper mentions in
the Related Work section that abstract interpretation where there are
finitely many abstract predicates can be encoded as abstract finite tree
automata (AFTA)
https://arxiv.org/abs/1706.03762
Placeholder
description for https://arxiv.org/abs/1706.03762
abstract interpretation where there are
finitely many abstract predicates can be encoded as abstract finite tree
automata (AFTA), which can be converted to a finitely
represented coinductive predicates for use as pruners for ChopChop. And
it turns out this can be also done for infinite abstract domain via
grammar-flow analysisACM Digital Library
The ACM
Digital Library is a comprehensive database of full-text articles and
bibliographic records covering the fields of computing and information
technology, published by the Association for Computing Machinery and its
affiliated organizations.grammar-flow analysis. I'm
looking forward to seeing that feature work done!
So if we can encode abstract interpretation in this way, then we can essentially encode any interesting semantic property about the generated output as we like. This is so much more powerful than the initial JSON schema-based generation, because in particular now we can encode properties of the output where one part of the output relates to another part of the output in a dynamic way. For example, if we request the LLM to generate a list of tags and then assign those tags to some items, we can require the assigned tags to actually be from the list of tags that it originally generated. The only way to accomplish this currently is to split the generation into two parts, and dynamically use the result from the first generation to construct the schema for the second generation.
For code generation, the ability to restrict output code to only well-typed programs is an interesting prospect. Of course, this is non-trivial work for supporting each new type system. But the ability to implement this for any new type system means that you can give an LLM some immediate new support for a new or otherwise low resource programming language. This is a promising development towards the perspective that LLM-assisted code generation will facilitate the development of new programming languages rather than consolidate to only a few well-known programming languages that have the most training data available.
§ References
ChopChop: A Programmable Framework for Semantically Constraining the Output of Language Models
.grammar
.txt
structured
outputJSON schemas
Open API 3oneOfstandard techniques for designing such parsing algorithms to efficiently work left-to-right
Mtgen
Visual Novel Engine
pinion-cyoa
OpenAGI Branching Agent
Computer Use: Hack Night at Github
small-language-games
LLM-powered text adventure game
LLM-powered UI generator
generative filesystem simulator
FPV Drone Hackathon Project
AI+ Expo Hackathonllm-sim
latent-state-machine
ChopChop
the ChopChop
paper
the ChopChop
paper
The ChopChop
paper
abstract interpretation
where there are finitely many abstract predicates can be encoded as
abstract finite tree automata (AFTA)grammar-flow analysis
§ Signature
The following code block is the Ed25519 signature of this post's
markdown
content encoded in base 64, using my secret key and
public key.
ec007c82a2cb4127a9ea6ff9e4ff7bdb5851d9593d4108cf1a8f7921981a8c4fc239dd289b2f412a399687ac152f4e75903ac9c829c29f29e0196e41c604d901See Signature for more
information.