Advancements in Constrained Decoding with ChopChop

Advancements in Constrained Decoding with ChopChop

Published: 2026-02-03

Tags: paper, research, llm, ai, structured-output

Abstract

Overview and discussion of the new paper ChopChip: A Programmable Framework for Semantically Constraining the Output of Language Models.

§ Introduction

Recently I read ChopChop: A Programmable Framework for Semantically Constraining the Output of Language Models
/favicon/acm.org.icoACM Digital Library
The ACM Digital Library is a comprehensive database of full-text articles and bibliographic records covering the fields of computing and information technology, published by the Association for Computing Machinery and its affiliated organizations.
/favicon/acm.org.icoChopChop: A Programmable Framework for Semantically Constraining the Output of Language Models
from POPL 2026. I found the paper to be a relatively interesting forwarding of the current ideas in structured output via JSON schema to more semantic properties such as typing. Here's the abstract:

Language models (LMs) can generate code but cannot guarantee its correctness—often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure.

We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.

The main interesting aspects of this work are: - ChopChop is programmable - ChopChop generalizably connects token-level constraints to semantic constraints

The .grammar
/favicon/dottxt.ai.pnghttps://docs.dottxt.ai/
.txt documentation
/favicon/dottxt.ai.png.grammar
project by .txt
/favicon/dottxt.ai.pnghttps://dottxt.ai
.txt is an AI engineering company focused on structured outputs for LLMs.
/favicon/dottxt.ai.png.txt
also seems related to this, but it is proprietary.

§ Constrained Decoding

At a high level, generative LLMs work like this:

  input text 

--( encode )-->

  input embedding

--( extrapolate )-->

  output embedding

--( decode ) -->

  output text

This works surprisingly well for getting sensible natural language responses to natural language queries. But, for many applications, a structured format for the output is required. For example, the output might be used as the input to an automated process which requires input of a very specific format.

Of course, structure comes in many different flavors and levels of complexity. One fairly simple framework for constrained generation been widely adopted: structured output
/favicon/google.dev.pnghttps://ai.google.dev/
Google AI API documentation
/favicon/google.dev.pngstructured output
via JSON schemas
/favicon/json-schema.org.icohttps://json-schema.org
JSON Schema enables the confident and reliable use of the JSON data format.
/favicon/json-schema.org.icoJSON schemas
. This framework allows the decoding of an LLM to be constrained to fit a particular JSON Schema. For example, to represent a record for a person that has name, age, and email fields we can use a JSON schema something like this:

{
    "type": "object",
    "properties": {
        "name": { "type": "string" },
        "age": { "type": "number" },
        "email": { "type": "string" }
    },
    "required": ["name", "age"]
}

Here, the "name" and "email" fields must be strings, the "age" field must be a number, and only "name" and "age" are required fields. As per the name, JSON schemas are schemas only for JSON values. JSON schemas are schemas for standard JSON values, but the schemas themselves have several common specification (but Open API 3
/favicon/openapis.org.pnghttps://spec.openapis.org
This site contains the authoritative HTML renderings of the OpenAPI Initiative’s specifications and extension registries.
/favicon/openapis.org.pngOpen API 3
is probably the most common). There are some more advanced features such as oneOf
/favicon/json-schema.org.icohttps://json-schema.org
JSON Schema enables the confident and reliable use of the JSON data format.
/favicon/json-schema.org.icooneOf
. One cool thing about JSON schemas is there is a JSON schema that is the schema for all JSON schemas.

Essentially, JSON schemas are a format for specify a grammar over JSON-shaped data. This grammar gives rise to a parsing algorithm for checking if a string is in the grammar. There are standard techniques for designing such parsing algorithms to efficiently work left-to-right
/favicon/hackernoon.com.icohttps://hackernoon.com/
HackerNoon is a community-driven online publishing platform that focuses on technology, software development, blockchain, and artificial intelligence.
/favicon/hackernoon.com.icostandard techniques for designing such parsing algorithms to efficiently work left-to-right
, so that a string can be considered as a prefix to the entire value, and can be queried for "is this string a prefix of a valid value?"

Now, to connect this all to text generation with LLMs. Once we have this prefix-checking algorithm, we can use it to filter which tokens that an LLM can decode to. Instead of just choosing from among the most highly weighted tokens as the token to generate next in the output stream, only tokens that Maintain the output stream as a prefix of a valid output are allowed, and we choose from among the highest weight of those tokens. This guarantees that the output stream will be valid according to the schema that determined the prefix checker.

You can also do some more fancy stuff here, like if none of the possible tokens have a very high weight, then we can try backtracking to generate a different trajectory that we might have more luck with.

This prefix checking based on first-order properties like JSON schema validation is great for many applications. I've experimented with a few myself:

However, clearly there is a lot more to be done here because JSON schemas are extremely limited compared to the types of constraints that we would like to have on LLM outputs.

§ Beyond Token-level Analysis

ChopChop
/favicon/acm.org.icoACM Digital Library
The ACM Digital Library is a comprehensive database of full-text articles and bibliographic records covering the fields of computing and information technology, published by the Association for Computing Machinery and its affiliated organizations.
/favicon/acm.org.icoChopChop
demonstrates a way to extend the same approach to prefix checking to more semantic properties desired of the LLM output.

The extra piece of constrained decoding that ChopChop brings is semantic pruning. All together, ChopChop has a prefix checker that calculates the space of valid ASTs that the output stream is a prefix of, and prunes that space by the semantic pruning predicate. A token is only valid if, upon being appended to the output stream, the space of semantically valid completions is inhabited.

This framework is programmable because you can provide your own grammar and semantic pruning predicate. The framework connects token-level constraints to AST-level constraints by prefix checking for valid ASTs, and then semantically considering those ASTs.

What's so interesting about this approach is that it is a feasible way to enforce any semantic predicate over the output, and ChopChop's implementation shows this can be done relatively efficiently even for complex constraints.

ChopChop demonstrates two semantic pruning examples: - equivalent to a reference program modulo term rewriting - well-typed

§ Implementation

ChopChop uses co-induction to encode the realizability and pruning problem for a potentially infinite set of solutions. So instead of representing the set of possible ASTs as a literal set (or list, or array), it's represented as a coinductive space. For example, for the grammar

data Expr
    = Lit String
    -- ^ A numeric literal
    | Sum Expr Expr
    -- ^ A numeric sum operation

the coinductive space of ASTs is encoded as

data ExprSpace
    = Empty
    -- ^ An empty space
    | Union [ExprSpace]
    -- ^ A union of several spaces
    | Lit Regex
    | Sum ExprSpace ExprSpace

This corresponds to a version space where Union is the union node and Sum is the join node. This encoding is able to encode an infinite space of expressions with a finite representation. It's like an extension of regular expressions to tree-structured data (in particular, ASTs defined by context free grammars).

For example, the space of all integer literal expressions is represented by integers:

integers = 
    let integer = Lit "[0-9]+" in
    Union [integer, Sum integers integer]

What's very special about these spaces is that we can write predicates over their finite representations in order to constrain the encoded infinite space. For example, odds is a predicate for only the odd-valued expressions of a given space.

odds Expr       = Expr
odds (Union xs) = Union (map odds xs)
odds (Lit re)   = Lit (re `intersect` "[0-9]*[13579]")
odds (Sum l r)  = Sum (odds l) (odds r)

Here, odds integers is the space of all odd integers, finitely represented. However, if we eagerly evaluate odds integers, Then the evaluation will proceed infinitely because there is a cycle in the definition of integers. So, ChopChop uses a specialized solver that tracks previously expanded terms and creates an explicit cycle when a subterm is revisited, which escapes the infinite evaluation. There is a theorem that spaces of this form that can be predicated with finite checking using this memoization specialization.

There are more details in the ChopChop paper
/favicon/arxiv.org.pnghttps://arxiv.org/abs/1706.03762
Placeholder description for https://arxiv.org/abs/1706.03762
/favicon/arxiv.org.pngthe ChopChop paper
about the specifics of the design and implementation of the full ChopChop system.

§ Results

The ChopChop team found that the semantically constrained generation to be equivalent (up to term rewriting) to a reference program dramatically improved the success rate of generating behaviorally correct programs across a few benchmarks with some open source models (DeepSeek-Coder-6.7b, CodeLlama-7B, CodeLlama-13B) in comparison to unconstrained generation or even grammar constrained generation. See Table 1 in the ChopChop paper
/favicon/arxiv.org.pnghttps://arxiv.org/abs/1706.03762
Placeholder description for https://arxiv.org/abs/1706.03762
/favicon/arxiv.org.pngthe ChopChop paper
for the full results.

With these improvements, there was an overhead on how many tokens need to be churned through to get successful results. For equivalence constrained generation, the vast majority of the time, less than 20 tokens were tried per successful token. For well-typedness constrained generation, the vast majority of the time, less than 60 tokens were tried per successful token. Each try was a query to the realizability solver to decide whether the token was a valid next token or not.

§ Conclusions

What other kinds of predicates can be encoded as semantic pruners in ChopChop? The ChopChop paper
/favicon/arxiv.org.pnghttps://arxiv.org/abs/1706.03762
Placeholder description for https://arxiv.org/abs/1706.03762
/favicon/arxiv.org.pngThe ChopChop paper
mentions in the Related Work section that abstract interpretation where there are finitely many abstract predicates can be encoded as abstract finite tree automata (AFTA)
/favicon/arxiv.org.pnghttps://arxiv.org/abs/1706.03762
Placeholder description for https://arxiv.org/abs/1706.03762
/favicon/arxiv.org.pngabstract interpretation where there are finitely many abstract predicates can be encoded as abstract finite tree automata (AFTA)
, which can be converted to a finitely represented coinductive predicates for use as pruners for ChopChop. And it turns out this can be also done for infinite abstract domain via grammar-flow analysis
/favicon/acm.org.icoACM Digital Library
The ACM Digital Library is a comprehensive database of full-text articles and bibliographic records covering the fields of computing and information technology, published by the Association for Computing Machinery and its affiliated organizations.
/favicon/acm.org.icogrammar-flow analysis
. I'm looking forward to seeing that feature work done!

So if we can encode abstract interpretation in this way, then we can essentially encode any interesting semantic property about the generated output as we like. This is so much more powerful than the initial JSON schema-based generation, because in particular now we can encode properties of the output where one part of the output relates to another part of the output in a dynamic way. For example, if we request the LLM to generate a list of tags and then assign those tags to some items, we can require the assigned tags to actually be from the list of tags that it originally generated. The only way to accomplish this currently is to split the generation into two parts, and dynamically use the result from the first generation to construct the schema for the second generation.

For code generation, the ability to restrict output code to only well-typed programs is an interesting prospect. Of course, this is non-trivial work for supporting each new type system. But the ability to implement this for any new type system means that you can give an LLM some immediate new support for a new or otherwise low resource programming language. This is a promising development towards the perspective that LLM-assisted code generation will facilitate the development of new programming languages rather than consolidate to only a few well-known programming languages that have the most training data available.

§ References

§ Signature

The following code block is the /favicon/wikipedia.org.icoEd25519 signature of this post's /favicon.icomarkdown content encoded in base 64, using my secret key and /favicon.icopublic key.

ec007c82a2cb4127a9ea6ff9e4ff7bdb5851d9593d4108cf1a8f7921981a8c4fc239dd289b2f412a399687ac152f4e75903ac9c829c29f29e0196e41c604d901

See /favicon.icoSignature for more information.