halo2

Usage

This repository contains the halo2_proofs and halo2_gadgets crates, which should be used directly.

Minimum Supported Rust Version

Requires Rust 1.60 or higher.

Minimum supported Rust version can be changed in the future, but it will be done with a minor version bump.

Controlling parallelism

halo2 currently uses rayon for parallel computation. The RAYON_NUM_THREADS environment variable can be used to set the number of threads.

You can disable rayon by disabling the "multicore" feature. Warning! Halo2 will lose access to parallelism if you disable the "multicore" feature. This will significantly degrade performance.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Concepts

First we'll describe the concepts behind zero-knowledge proof systems; the arithmetization (kind of circuit description) used by Halo 2; and the abstractions we use to build circuit implementations.

Proof systems

The aim of any proof system is to be able to prove interesting mathematical or cryptographic statements.

Typically, in a given protocol we will want to prove families of statements that differ in their public inputs. The prover will also need to show that they know some private inputs that make the statement hold.

To do this we write down a relation, $R$ , that specifies which combinations of public and private inputs are valid.

The terminology above is intended to be aligned with the ZKProof Community Reference.

To be precise, we should distinguish between the relation $R$ , and its implementation to be used in a proof system. We call the latter a circuit.

The language that we use to express circuits for a particular proof system is called an arithmetization. Usually, an arithmetization will define circuits in terms of polynomial constraints on variables over a field.

The process of expressing a particular relation as a circuit is also sometimes called "arithmetization", but we'll avoid that usage.

To create a proof of a statement, the prover will need to know the private inputs, and also intermediate values, called advice values, that are used by the circuit.

We assume that we can compute advice values efficiently from the private and public inputs. The particular advice values will depend on how we write the circuit, not only on the high-level statement.

The private inputs and advice values are collectively called a witness.

Some authors use "witness" as just a synonym for private inputs. But in our usage, a witness includes advice, i.e. it includes all values that the prover supplies to the circuit.

For example, suppose that we want to prove knowledge of a preimage $x$ of a hash function $H$ for a digest $y$ :

The private input would be the preimage $x$ .
The public input would be the digest $y$ .
The relation would be ${(x, y) : H (x) = y}$ .
For a particular public input $Y$ , the statement would be: ${(x) : H (x) = Y}$ .
The advice would be all of the intermediate values in the circuit implementing the hash function. The witness would be $x$ and the advice.

A Non-interactive Argument allows a prover to create a proof for a given statement and witness. The proof is data that can be used to convince a verifier that there exists a witness for which the statement holds. The security property that such proofs cannot falsely convince a verifier is called soundness.

A Non-interactive Argument of Knowledge (NARK) further convinces the verifier that the prover knew a witness for which the statement holds. This security property is called knowledge soundness, and it implies soundness.

In practice knowledge soundness is more useful for cryptographic protocols than soundness: if we are interested in whether Alice holds a secret key in some protocol, say, we need Alice to prove that she knows the key, not just that it exists.

Knowledge soundness is formalized by saying that an extractor, which can observe precisely how the proof is generated, must be able to compute the witness.

This property is subtle given that proofs can be malleable. That is, depending on the proof system it may be possible to take an existing proof (or set of proofs) and, without knowing the witness(es), modify it/them to produce a distinct proof of the same or a related statement. Higher-level protocols that use malleable proof systems need to take this into account.

Even without malleability, proofs can also potentially be replayed. For instance, we would not want Alice in our example to be able to present a proof generated by someone else, and have that be taken as a demonstration that she knew the key.

If a proof yields no information about the witness (other than that a witness exists and was known to the prover), then we say that the proof system is zero knowledge.

If a proof system produces short proofs —i.e. of length polylogarithmic in the circuit size— then we say that it is succinct. A succinct NARK is called a SNARK (Succinct Non-Interactive Argument of Knowledge).

By this definition, a SNARK need not have verification time polylogarithmic in the circuit size. Some papers use the term efficient to describe a SNARK with that property, but we'll avoid that term since it's ambiguous for SNARKs that support amortized or recursive verification, which we'll get to later.

A zk-SNARK is a zero-knowledge SNARK.

PLONKish Arithmetization

The arithmetization used by Halo 2 comes from PLONK, or more precisely its extension UltraPLONK that supports custom gates and lookup arguments. We'll call it PLONKish.

PLONKish circuits are defined in terms of a rectangular matrix of values. We refer to rows, columns, and cells of this matrix with the conventional meanings.

A PLONKish circuit depends on a configuration:

A finite field $F$ , where cell values (for a given statement and witness) will be elements of $F$ .
The number of columns in the matrix, and a specification of each column as being fixed, advice, or instance. Fixed columns are fixed by the circuit; advice columns correspond to witness values; and instance columns are normally used for public inputs (technically, they can be used for any elements shared between the prover and verifier).
A subset of the columns that can participate in equality constraints.
A maximum constraint degree.
A sequence of polynomial constraints. These are multivariate polynomials over $F$ that must evaluate to zero for each row. The variables in a polynomial constraint may refer to a cell in a given column of the current row, or a given column of another row relative to this one (with wrap-around, i.e. taken modulo $n$ ). The maximum degree of each polynomial is given by the maximum constraint degree.
A sequence of lookup arguments defined over tuples of input expressions (which are multivariate polynomials as above) and table columns.

A PLONKish circuit also defines:

The number of rows $n$ in the matrix. $n$ must correspond to the size of a multiplicative subgroup of $F^{\times}$ ; typically a power of two.
A sequence of equality constraints, which specify that two given cells must have equal values.
The values of the fixed columns at each row.

From a circuit description we can generate a proving key and a verification key, which are needed for the operations of proving and verification for that circuit.

Note that we specify the ordering of columns, polynomial constraints, lookup arguments, and equality constraints, even though these do not affect the meaning of the circuit. This makes it easier to define the generation of proving and verification keys as a deterministic process.

Typically, a configuration will define polynomial constraints that are switched off and on by selectors defined in fixed columns. For example, a constraint $q_{i} \cdot p (...) = 0$ can be switched off for a particular row $i$ by setting $q_{i} = 0$ . In this case we sometimes refer to a set of constraints controlled by a set of selector columns that are designed to be used together, as a gate. Typically there will be a standard gate that supports generic operations like field multiplication and division, and possibly also custom gates that support more specialized operations.

Chips

The previous section gives a fairly low-level description of a circuit. When implementing circuits we will typically use a higher-level API which aims for the desirable characteristics of auditability, efficiency, modularity, and expressiveness.

Some of the terminology and concepts used in this API are taken from an analogy with integrated circuit design and layout. As for integrated circuits, the above desirable characteristics are easier to obtain by composing chips that provide efficient pre-built implementations of particular functionality.

For example, we might have chips that implement particular cryptographic primitives such as a hash function or cipher, or algorithms like scalar multiplication or pairings.

In PLONKish circuits, it is possible to build up arbitrary logic just from standard gates that do field multiplication and addition. However, very significant efficiency gains can be obtained by using custom gates.

Using our API, we define chips that "know" how to use particular sets of custom gates. This creates an abstraction layer that isolates the implementation of a high-level circuit from the complexity of using custom gates directly.

Even if we sometimes need to "wear two hats", by implementing both a high-level circuit and the chips that it uses, the intention is that this separation will result in code that is easier to understand, audit, and maintain/reuse. This is partly because some potential implementation errors are ruled out by construction.

Gates in PLONKish circuits refer to cells by relative references, i.e. to the cell in a given column, and the row at a given offset relative to the one in which the gate's selector is set. We call this an offset reference when the offset is nonzero (i.e. offset references are a subset of relative references).

Relative references contrast with absolute references used in equality constraints, which can point to any cell.

The motivation for offset references is to reduce the number of columns needed in the configuration, which reduces proof size. If we did not have offset references then we would need a column to hold each value referred to by a custom gate, and we would need to use equality constraints to copy values from other cells of the circuit into that column. With offset references, we not only need fewer columns; we also do not need equality constraints to be supported for all of those columns, which improves efficiency.

In R1CS (another arithmetization which may be more familiar to some readers, but don't worry if it isn't), a circuit consists of a "sea of gates" with no semantically significant ordering. Because of offset references, the order of rows in a PLONKish circuit, on the other hand, is significant. We're going to make some simplifying assumptions and define some abstractions to tame the resulting complexity: the aim will be that, at the gadget level where we do most of our circuit construction, we will not have to deal with relative references or with gate layout explicitly.

We will partition a circuit into regions, where each region contains a disjoint subset of cells, and relative references only ever point within a region. Part of the responsibility of a chip implementation is to ensure that gates that make offset references are laid out in the correct positions in a region.

Given the set of regions and their shapes, we will use a separate floor planner to decide where (i.e. at what starting row) each region is placed. There is a default floor planner that implements a very general algorithm, but you can write your own floor planner if you need to.

Floor planning will in general leave gaps in the matrix, because the gates in a given row did not use all available columns. These are filled in —as far as possible— by gates that do not require offset references, which allows them to be placed on any row.

Chips can also define lookup tables. If more than one table is defined for the same lookup argument, we can use a tag column to specify which table is used on each row. It is also possible to perform a lookup in the union of several tables (limited by the polynomial degree bound).

Composing chips

In order to combine functionality from several chips, we compose them in a tree. The top-level chip defines a set of fixed, advice, and instance columns, and then specifies how they should be distributed between lower-level chips.

In the simplest case, each lower-level chips will use columns disjoint from the other chips. However, it is allowed to share a column between chips. It is important to optimize the number of advice columns in particular, because that affects proof size.

The result (possibly after optimization) is a PLONKish configuration. Our circuit implementation will be parameterized on a chip, and can use any features of the supported lower-level chips via the top-level chip.

Our hope is that less expert users will normally be able to find an existing chip that supports the operations they need, or only have to make minor modifications to an existing chip. Expert users will have full control to do the kind of circuit optimizations that ECC is famous for 🙂.

Gadgets

When implementing a circuit, we could use the features of the chips we've selected directly. Typically, though, we will use them via gadgets. This indirection is useful because, for reasons of efficiency and limitations imposed by PLONKish circuits, the chip interfaces will often be dependent on low-level implementation details. The gadget interface can provide a more convenient and stable API that abstracts away from extraneous detail.

For example, consider a hash function such as SHA-256. The interface of a chip supporting SHA-256 might be dependent on internals of the hash function design such as the separation between message schedule and compression function. The corresponding gadget interface can provide a more convenient and familiar update/finalize API, and can also handle parts of the hash function that do not need chip support, such as padding. This is similar to how accelerated instructions for cryptographic primitives on CPUs are typically accessed via software libraries, rather than directly.

Gadgets can also provide modular and reusable abstractions for circuit programming at a higher level, similar to their use in libraries such as libsnark and bellman. As well as abstracting functions, they can also abstract types, such as elliptic curve points or integers of specific sizes.

User Documentation

You're probably here because you want to write circuits? Excellent!

This section will guide you through the process of creating circuits with halo2.

Developer tools

The halo2 crate includes several utilities to help you design and implement your circuits.

Mock prover

halo2_proofs::dev::MockProver is a tool for debugging circuits, as well as cheaply verifying their correctness in unit tests. The private and public inputs to the circuit are constructed as would normally be done to create a proof, but MockProver::run instead creates an object that will test every constraint in the circuit directly. It returns granular error messages that indicate which specific constraint (if any) is not satisfied.

Circuit visualizations

The dev-graph feature flag exposes several helper methods for creating graphical representations of circuits.

On Debian systems, you will need the following additional packages:

sudo apt install cmake libexpat1-dev libfreetype6-dev libcairo2-dev

Circuit layout

halo2_proofs::dev::CircuitLayout renders the circuit layout as a grid:

fn main() {
    // Prepare the circuit you want to render.
    // You don't need to include any witness variables.
    let a = Fp::random(OsRng);
    let instance = Fp::ONE + Fp::ONE;
    let lookup_table = vec![instance, a, a, Fp::zero()];
    let circuit: MyCircuit<Fp> = MyCircuit {
        a: Value::unknown(),
        lookup_table,
    };

    // Create the area you want to draw on.
    // Use SVGBackend if you want to render to .svg instead.
    use plotters::prelude::*;
    let root = BitMapBackend::new("layout.png", (1024, 768)).into_drawing_area();
    root.fill(&WHITE).unwrap();
    let root = root
        .titled("Example Circuit Layout", ("sans-serif", 60))
        .unwrap();

    halo2_proofs::dev::CircuitLayout::default()
        // You can optionally render only a section of the circuit.
        .view_width(0..2)
        .view_height(0..16)
        // You can hide labels, which can be useful with smaller areas.
        .show_labels(false)
        // Render the circuit onto your area!
        // The first argument is the size parameter for the circuit.
        .render(5, &circuit, &root)
        .unwrap();
}

Columns are laid out from left to right as instance, advice and fixed. The order of columns is otherwise without meaning.
- Instance columns have a white background.
- Advice columns have a red background.
- Fixed columns have a blue background.
Regions are shown as labelled green boxes (overlaying the background colour). A region may appear as multiple boxes if some of its columns happen to not be adjacent.
Cells that have been assigned to by the circuit will be shaded in grey. If any cells are assigned to more than once (which is usually a mistake), they will be shaded darker than the surrounding cells.

Circuit structure

halo2_proofs::dev::circuit_dot_graph builds a DOT graph string representing the given circuit, which can then be rendered with a variety of layout programs. The graph is built from calls to Layouter::namespace both within the circuit, and inside the gadgets and chips that it uses.

fn main() {
    // Prepare the circuit you want to render.
    // You don't need to include any witness variables.
    let a = Fp::rand();
    let instance = Fp::one() + Fp::one();
    let lookup_table = vec![instance, a, a, Fp::zero()];
    let circuit: MyCircuit<Fp> = MyCircuit {
        a: None,
        lookup_table,
    };

    // Generate the DOT graph string.
    let dot_string = halo2_proofs::dev::circuit_dot_graph(&circuit);

    // Now you can either handle it in Rust, or just
    // print it out to use with command-line tools.
    print!("{}", dot_string);
}

Cost estimator

The cost-model binary takes high-level parameters for a circuit design, and estimates the verification cost, as well as resulting proof size.

Usage: cargo run --example cost-model -- [OPTIONS] k

Positional arguments:
  k                       2^K bound on the number of rows.

Optional arguments:
  -h, --help              Print this message.
  -a, --advice R[,R..]    An advice column with the given rotations. May be repeated.
  -i, --instance R[,R..]  An instance column with the given rotations. May be repeated.
  -f, --fixed R[,R..]     A fixed column with the given rotations. May be repeated.
  -g, --gate-degree D     Maximum degree of the custom gates.
  -l, --lookup N,I,T      A lookup over N columns with max input degree I and max table degree T. May be repeated.
  -p, --permutation N     A permutation over N columns. May be repeated.

For example, to estimate the cost of a circuit with three advice columns and one fixed column (with various rotations), and a maximum gate degree of 4:

> cargo run --example cost-model -- -a 0,1 -a 0 -a-0,-1,1 -f 0 -g 4 11
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/examples/cost-model -a 0,1 -a 0 -a 0,-1,1 -f 0 -g 4 11`
Circuit {
    k: 11,
    max_deg: 4,
    advice_columns: 3,
    lookups: 0,
    permutations: [],
    column_queries: 7,
    point_sets: 3,
    estimator: Estimator,
}
Proof size: 1440 bytes
Verification: at least 81.689ms

A simple example

Let's start with a simple circuit, to introduce you to the common APIs and how they are used. The circuit will take a public input $c$ , and will prove knowledge of two private inputs $a$ and $b$ such that

$a^{2} \cdot b^{2} = c .$

Define instructions

Firstly, we need to define the instructions that our circuit will rely on. Instructions are the boundary between high-level gadgets and the low-level circuit operations. Instructions may be as coarse or as granular as desired, but in practice you want to strike a balance between an instruction being large enough to effectively optimize its implementation, and small enough that it is meaningfully reusable.

For our circuit, we will use three instructions:

Load a private number into the circuit.
Multiply two numbers.
Expose a number as a public input to the circuit.

We also need a type for a variable representing a number. Instruction interfaces provide associated types for their inputs and outputs, to allow the implementations to represent these in a way that makes the most sense for their optimization goals.

trait NumericInstructions<F: Field>: Chip<F> {
    /// Variable representing a number.
    type Num;

    /// Loads a number into the circuit as a private input.
    fn load_private(&self, layouter: impl Layouter<F>, a: Value<F>) -> Result<Self::Num, Error>;

    /// Loads a number into the circuit as a fixed constant.
    fn load_constant(&self, layouter: impl Layouter<F>, constant: F) -> Result<Self::Num, Error>;

    /// Returns `c = a * b`.
    fn mul(
        &self,
        layouter: impl Layouter<F>,
        a: Self::Num,
        b: Self::Num,
    ) -> Result<Self::Num, Error>;

    /// Exposes a number as a public input to the circuit.
    fn expose_public(
        &self,
        layouter: impl Layouter<F>,
        num: Self::Num,
        row: usize,
    ) -> Result<(), Error>;
}

Define a chip implementation

For our circuit, we will build a chip that provides the above numeric instructions for a finite field.

/// The chip that will implement our instructions! Chips store their own
/// config, as well as type markers if necessary.
struct FieldChip<F: Field> {
    config: FieldConfig,
    _marker: PhantomData<F>,
}

Every chip needs to implement the Chip trait. This defines the properties of the chip that a Layouter may rely on when synthesizing a circuit, as well as enabling any initial state that the chip requires to be loaded into the circuit.

impl<F: Field> Chip<F> for FieldChip<F> {
    type Config = FieldConfig;
    type Loaded = ();

    fn config(&self) -> &Self::Config {
        &self.config
    }

    fn loaded(&self) -> &Self::Loaded {
        &()
    }
}

Configure the chip

The chip needs to be configured with the columns, permutations, and gates that will be required to implement all of the desired instructions.

/// Chip state is stored in a config struct. This is generated by the chip
/// during configuration, and then stored inside the chip.
#[derive(Clone, Debug)]
struct FieldConfig {
    /// For this chip, we will use two advice columns to implement our instructions.
    /// These are also the columns through which we communicate with other parts of
    /// the circuit.
    advice: [Column<Advice>; 2],

    /// This is the public input (instance) column.
    instance: Column<Instance>,

    // We need a selector to enable the multiplication gate, so that we aren't placing
    // any constraints on cells where `NumericInstructions::mul` is not being used.
    // This is important when building larger circuits, where columns are used by
    // multiple sets of instructions.
    s_mul: Selector,
}

impl<F: Field> FieldChip<F> {
    fn construct(config: <Self as Chip<F>>::Config) -> Self {
        Self {
            config,
            _marker: PhantomData,
        }
    }

    fn configure(
        meta: &mut ConstraintSystem<F>,
        advice: [Column<Advice>; 2],
        instance: Column<Instance>,
        constant: Column<Fixed>,
    ) -> <Self as Chip<F>>::Config {
        meta.enable_equality(instance);
        meta.enable_constant(constant);
        for column in &advice {
            meta.enable_equality(*column);
        }
        let s_mul = meta.selector();

        // Define our multiplication gate!
        meta.create_gate("mul", |meta| {
            // To implement multiplication, we need three advice cells and a selector
            // cell. We arrange them like so:
            //
            // | a0  | a1  | s_mul |
            // |-----|-----|-------|
            // | lhs | rhs | s_mul |
            // | out |     |       |
            //
            // Gates may refer to any relative offsets we want, but each distinct
            // offset adds a cost to the proof. The most common offsets are 0 (the
            // current row), 1 (the next row), and -1 (the previous row), for which
            // `Rotation` has specific constructors.
            let lhs = meta.query_advice(advice[0], Rotation::cur());
            let rhs = meta.query_advice(advice[1], Rotation::cur());
            let out = meta.query_advice(advice[0], Rotation::next());
            let s_mul = meta.query_selector(s_mul);

            // Finally, we return the polynomial expressions that constrain this gate.
            // For our multiplication gate, we only need a single polynomial constraint.
            //
            // The polynomial expressions returned from `create_gate` will be
            // constrained by the proving system to equal zero. Our expression
            // has the following properties:
            // - When s_mul = 0, any value is allowed in lhs, rhs, and out.
            // - When s_mul != 0, this constrains lhs * rhs = out.
            vec![s_mul * (lhs * rhs - out)]
        });

        FieldConfig {
            advice,
            instance,
            s_mul,
        }
    }
}

Implement chip traits

/// A variable representing a number.
#[derive(Clone)]
struct Number<F: Field>(AssignedCell<F, F>);

impl<F: Field> NumericInstructions<F> for FieldChip<F> {
    type Num = Number<F>;

    fn load_private(
        &self,
        mut layouter: impl Layouter<F>,
        value: Value<F>,
    ) -> Result<Self::Num, Error> {
        let config = self.config();

        layouter.assign_region(
            || "load private",
            |mut region| {
                region
                    .assign_advice(|| "private input", config.advice[0], 0, || value)
                    .map(Number)
            },
        )
    }

    fn load_constant(
        &self,
        mut layouter: impl Layouter<F>,
        constant: F,
    ) -> Result<Self::Num, Error> {
        let config = self.config();

        layouter.assign_region(
            || "load constant",
            |mut region| {
                region
                    .assign_advice_from_constant(|| "constant value", config.advice[0], 0, constant)
                    .map(Number)
            },
        )
    }

    fn mul(
        &self,
        mut layouter: impl Layouter<F>,
        a: Self::Num,
        b: Self::Num,
    ) -> Result<Self::Num, Error> {
        let config = self.config();

        layouter.assign_region(
            || "mul",
            |mut region: Region<'_, F>| {
                // We only want to use a single multiplication gate in this region,
                // so we enable it at region offset 0; this means it will constrain
                // cells at offsets 0 and 1.
                config.s_mul.enable(&mut region, 0)?;

                // The inputs we've been given could be located anywhere in the circuit,
                // but we can only rely on relative offsets inside this region. So we
                // assign new cells inside the region and constrain them to have the
                // same values as the inputs.
                a.0.copy_advice(|| "lhs", &mut region, config.advice[0], 0)?;
                b.0.copy_advice(|| "rhs", &mut region, config.advice[1], 0)?;

                // Now we can assign the multiplication result, which is to be assigned
                // into the output position.
                let value = a.0.value().copied() * b.0.value();

                // Finally, we do the assignment to the output, returning a
                // variable to be used in another part of the circuit.
                region
                    .assign_advice(|| "lhs * rhs", config.advice[0], 1, || value)
                    .map(Number)
            },
        )
    }

    fn expose_public(
        &self,
        mut layouter: impl Layouter<F>,
        num: Self::Num,
        row: usize,
    ) -> Result<(), Error> {
        let config = self.config();

        layouter.constrain_instance(num.0.cell(), config.instance, row)
    }
}

Build the circuit

Now that we have the instructions we need, and a chip that implements them, we can finally build our circuit!

/// The full circuit implementation.
///
/// In this struct we store the private input variables. We use `Option<F>` because
/// they won't have any value during key generation. During proving, if any of these
/// were `None` we would get an error.
#[derive(Default)]
struct MyCircuit<F: Field> {
    constant: F,
    a: Value<F>,
    b: Value<F>,
}

impl<F: Field> Circuit<F> for MyCircuit<F> {
    // Since we are using a single chip for everything, we can just reuse its config.
    type Config = FieldConfig;
    type FloorPlanner = SimpleFloorPlanner;

    fn without_witnesses(&self) -> Self {
        Self::default()
    }

    fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config {
        // We create the two advice columns that FieldChip uses for I/O.
        let advice = [meta.advice_column(), meta.advice_column()];

        // We also need an instance column to store public inputs.
        let instance = meta.instance_column();

        // Create a fixed column to load constants.
        let constant = meta.fixed_column();

        FieldChip::configure(meta, advice, instance, constant)
    }

    fn synthesize(
        &self,
        config: Self::Config,
        mut layouter: impl Layouter<F>,
    ) -> Result<(), Error> {
        let field_chip = FieldChip::<F>::construct(config);

        // Load our private values into the circuit.
        let a = field_chip.load_private(layouter.namespace(|| "load a"), self.a)?;
        let b = field_chip.load_private(layouter.namespace(|| "load b"), self.b)?;

        // Load the constant factor into the circuit.
        let constant =
            field_chip.load_constant(layouter.namespace(|| "load constant"), self.constant)?;

        // We only have access to plain multiplication.
        // We could implement our circuit as:
        //     asq  = a*a
        //     bsq  = b*b
        //     absq = asq*bsq
        //     c    = constant*asq*bsq
        //
        // but it's more efficient to implement it as:
        //     ab   = a*b
        //     absq = ab^2
        //     c    = constant*absq
        let ab = field_chip.mul(layouter.namespace(|| "a * b"), a, b)?;
        let absq = field_chip.mul(layouter.namespace(|| "ab * ab"), ab.clone(), ab)?;
        let c = field_chip.mul(layouter.namespace(|| "constant * absq"), constant, absq)?;

        // Expose the result as a public input to the circuit.
        field_chip.expose_public(layouter.namespace(|| "expose c"), c, 0)
    }
}

Testing the circuit

halo2_proofs::dev::MockProver can be used to test that the circuit is working correctly. The private and public inputs to the circuit are constructed as we will do to create a proof, but by passing them to MockProver::run we get an object that can test every constraint in the circuit, and tell us exactly what is failing (if anything).

    // The number of rows in our circuit cannot exceed 2^k. Since our example
    // circuit is very small, we can pick a very small value here.
    let k = 4;

    // Prepare the private and public inputs to the circuit!
    let constant = Fp::from(7);
    let a = Fp::from(2);
    let b = Fp::from(3);
    let c = constant * a.square() * b.square();

    // Instantiate the circuit with the private inputs.
    let circuit = MyCircuit {
        constant,
        a: Value::known(a),
        b: Value::known(b),
    };

    // Arrange the public input. We expose the multiplication result in row 0
    // of the instance column, so we position it there in our public inputs.
    let mut public_inputs = vec![c];

    // Given the correct public input, our circuit will verify.
    let prover = MockProver::run(k, &circuit, vec![public_inputs.clone()]).unwrap();
    assert_eq!(prover.verify(), Ok(()));

    // If we try some other public input, the proof will fail!
    public_inputs[0] += Fp::one();
    let prover = MockProver::run(k, &circuit, vec![public_inputs]).unwrap();
    assert!(prover.verify().is_err());

Full example

You can find the source code for this example here.

Lookup tables

In normal programs, you can trade memory for CPU to improve performance, by pre-computing and storing lookup tables for some part of the computation. We can do the same thing in halo2 circuits!

A lookup table can be thought of as enforcing a relation between variables, where the relation is expressed as a table. Assuming we have only one lookup argument in our constraint system, the total size of tables is constrained by the size of the circuit: each table entry costs one row, and it also costs one row to do each lookup.

TODO

Gadgets

Tips and tricks

This section contains various ideas and snippets that you might find useful while writing halo2 circuits.

Small range constraints

A common constraint used in R1CS circuits is the boolean constraint: $b * (1 - b) = 0$ . This constraint can only be satisfied by $b = 0$ or $b = 1$ .

In halo2 circuits, you can similarly constrain a cell to have one of a small set of values. For example, to constrain $a$ to the range $[0..5]$ , you would create a gate of the form:

$a \cdot (1 - a) \cdot (2 - a) \cdot (3 - a) \cdot (4 - a) = 0$

while to constrain $c$ to be either 7 or 13, you would use:

$(7 - c) \cdot (13 - c) = 0$

The underlying principle here is that we create a polynomial constraint with roots at each value in the set of possible values we want to allow. In R1CS circuits, the maximum supported polynomial degree is 2 (due to all constraints being of the form $a * b = c$ ). In halo2 circuits, you can use arbitrary-degree polynomials - with the proviso that higher-degree constraints are more expensive to use.

Note that the roots don't have to be constants; for example $(a - x) \cdot (a - y) \cdot (a - z) = 0$ will constrain $a$ to be equal to one of ${x, y, z}$ where the latter can be arbitrary polynomials, as long as the whole expression stays within the maximum degree bound.

Small set interpolation

We can use Lagrange interpolation to create a polynomial constraint that maps $f (X) = Y$ for small sets of $X \in {x_{i}}, Y \in {y_{i}}$ .

For instance, say we want to map a 2-bit value to a "spread" version interleaved with zeros. We first precompute the evaluations at each point:

$00 \to 0000 01 \to 0001 10 \to 0100 11 \to 0101 ⟹ ⟹ ⟹ ⟹ 0 \to 0 1 \to 1 2 \to 4 3 \to 5$

Then, we construct the Lagrange basis polynomial for each point using the identity: $l_{j} (X) = 0 \leq m < k, m \neq = j \prod \frac{x - x _{m}}{x _{j} - x _{m}},$ where $k$ is the number of data points. ( $k = 4$ in our example above.)

Recall that the Lagrange basis polynomial $l_{j} (X)$ evaluates to $1$ at $X = x_{j}$ and $0$ at all other $x_{i}, j \neq = i .$

Continuing our example, we get four Lagrange basis polynomials:

$l_{0} (X) l_{1} (X) l_{2} (X) l_{3} (X) = = = = \frac{( X - 3 ) ( X - 2 ) ( X - 1 )}{( - 3 ) ( - 2 ) ( - 1 )} \frac{( X - 3 ) ( X - 2 ) ( X )}{( - 2 ) ( - 1 ) ( 1 )} \frac{( X - 3 ) ( X - 1 ) ( X )}{( - 1 ) ( 1 ) ( 2 )} \frac{( X - 2 ) ( X - 1 ) ( X )}{( 1 ) ( 2 ) ( 3 )}$

Our polynomial constraint is then

$⟹ f (0) \cdot l_{0} (X) 0 \cdot l_{0} (X) + + f (1) \cdot l_{1} (X) 1 \cdot l_{1} (X) + + f (2) \cdot l_{2} (X) 4 \cdot l_{2} (X) + + f (3) \cdot l_{3} (X) 5 \cdot l_{3} (X) - - f (X) f (X) = = 0 0.$

Using halo2 in WASM

Since halo2 is written in Rust, you can compile it to WebAssembly (wasm), which will allow you to use the prover and verifier for your circuits in browser applications. This tutorial takes you through all you need to know to compile your circuits to wasm.

Throughout this tutorial, we will follow the repository for Zordle for reference, one of the first known webapps based on Halo 2 circuits. Zordle is ZK Wordle, where the circuit takes as advice values the player's input words and the player's share grid (the grey, yellow and green squares) and verifies that they match correctly. Therefore, the proof verifies that the player knows a "preimage" to the output share sheet, which can then be verified using just the ZK proof.

Circuit code setup

The first step is to create functions in Rust that will interface with the browser application. In the case of a prover, this will typically input some version of the advice and instance data, use it to generate a complete witness, and then output a proof. In the case of a verifier, this will typically input a proof and some version of the instance, and then output a boolean indicating whether the proof verified correctly or not.

In the case of Zordle, this code is contained in wasm.rs, and consists of two primary functions:

Prover

#[wasm_bindgen]
pub async fn prove_play(final_word: String, words_js: JsValue, params_ser: JsValue) -> JsValue {
  // Steps:
  // - Deserialise function parameters
  // - Generate the instance and advice columns using the words
  // - Instantiate the circuit and generate the witness
  // - Generate the proving key from the params
  // - Create a proof
}

While the specific inputs and their serialisations will depend on your circuit and webapp set up, it's useful to note the format in the specific case of Zordle since your use case will likely be similar:

This function takes as input the final_word that the user aimed for, and the words they attempted to use (in the form of words_js). It also takes as input the parameters for the circuit, which are serialized in params_ser. We will expand on this in the Params section below.

Note that the function parameters are passed in wasm_bindgen-compatible formats: String and JsValue. The JsValue type is a type from the Serde library. You can find much more details about this type and how to use it in the documentation here.

The output is a Vec<u8> converted to a JSValue using Serde. This is later passed in as input to the the verifier function.

Verifier

#[wasm_bindgen]
pub fn verify_play(final_word: String, proof_js: JsValue, diffs_u64_js: JsValue, params_ser: JsValue) -> bool {
  // Steps:
  // - Deserialise function parameters
  // - Generate the instance columns using the diffs representation of the columns
  // - Generate the verifying key using the params
  // - Verify the proof
}

Similar to the prover, we take in input and output a boolean true/false indicating the correctness of the proof. The diffs_u64_js object is a 2D JS array consisting of values for each cell that indicate the color: grey, yellow or green. These are used to assemble the instance columns for the circuit.

Params

Additionally, both the prover and verifier functions input params_ser, a serialised form of the public parameters of the polynomial commitment scheme. These are passed in as input (instead of being regenerated in prove/verify functions) as a performance optimisation since these are constant based only on the circuit's value of K. We can store these separately on a static web server and pass them in as input to the WASM. To generate the binary serialised form of these (separately outside the WASM functions), you can run something like:

fn write_params(K: u32) {
    let mut params_file = File::create("params.bin").unwrap();
    let params: Params<EqAffine> = Params::new(K);
    params.write(&mut params_file).unwrap();
}

Later, we can read the params.bin file from the web-server in Javascript in a byte-serialised format as a Uint8Array and pass it to the WASM as params_ser, which can be deserialised in Rust using the js_sys library.

Ideally, in future, instead of serialising the parameters we would be able to serialise and work directly with the proving key and the verifying key of the circuit, but that is currently not supported by the library, and tracked as issue #449 and #443.

Rust and WASM environment setup

Typically, Rust code is compiled to WASM using the wasm-pack tool and is as simple as changing some build commands. In the case of halo2 prover/verifier functions however, we need to make some additional changes to the build process. In particular, there are two main changes:

Parallelism: halo2 uses the rayon library for parallelism, which is not directly supported by WASM. However, the Chrome team has an adapter to enable rayon-like parallelism using Web Workers in browser: wasm-bindgen-rayon. We'll use this to enable parallelism in our WASM prover/verifier.
WASM max memory: The default memory limit for WASM with wasm-bindgen is set to 2GB, which is not enough to run the halo2 prover for large circuits (with K > 10 or so). We need to increase this limit to the maximum allowed by WASM (4GB!) to support larger circuits (up to K = 15 or so).

Firstly, add all the dependencies particular to your WASM interfacing functions to your Cargo.toml file. You can restrict the dependencies to the WASM compilation by using the WASM target feature flag. In the case of Zordle, this looks like:

[target.'cfg(target_family = "wasm")'.dependencies]
getrandom = { version = "0.2", features = ["js"]}
wasm-bindgen = { version = "0.2.81", features = ["serde-serialize"]}
console_error_panic_hook = "0.1.7"
rayon = "1.5"
wasm-bindgen-rayon = { version = "1.0"}
web-sys = { version = "0.3", features = ["Request", "Window", "Response"] }
wasm-bindgen-futures = "0.4"
js-sys = "0.3"

Next, let's integrate wasm-bindgen-rayon into our code. The README for the library has a great overview of how to do so. In particular, note the changes to the Rust compilation pipeline. You need to switch to a nightly version of Rust and enable support for WASM atomics. Additionally, remember to export the init_thread_pool in Rust code.

Next, we will bump up the default 2GB max memory limit for wasm-pack. To do so, add "-C", "link-arg=--max-memory=4294967296" Rust flag to the wasm target in the .cargo/config file. With the setup for wasm-bindgen-rayon and the memory bump, the .cargo/config file should now look like:

[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+atomics,+bulk-memory,+mutable-globals", "-C", "link-arg=--max-memory=4294967296"]
...

Shoutout to @mattgibb who documented this esoteric change for increasing maximum memory in a random GitHub issue here.¹

Off-topic but it was quite surprising for me to learn that WASM has a hard maximum limitation of 4GB memory. This is because WASM currently has a 32-bit architecture, which was quite surprising to me for such a new, forward-facing assembly language. There are, however, some open proposals to move WASM to a larger address space.

Now that we have the Rust set up, you should be able to build a WASM package simply using wasm-pack build --target web --out-dir pkg and use the output WASM package in your webapp.

Webapp setup

Zordle ships with a minimal React test client as an example (that simply adds WASM support to the default create-react-app template). You can find the code for the test client here. I would recommend forking the test client for your own application and working from there.

The test client includes a clean WebWorker that interfaces with the Rust WASM package. Putting the interface in a WebWorker prevents blocking the main thread of the browser and allows for a clean interface from React/application logic. Checkout halo-worker.ts for the WebWorker code and see how you can interface with the web worker from React in App.tsx.

If you've done everything right so far, you should now be able to generate proofs and verify them in browser! In the case of Zordle, proof generation for a circuit with K = 14 takes about a minute or so on my laptop. During proof generation, if you pop open the Chrome/Firefox task manager, you should additionally see something like this:

Example halo2 proof generation in-browser

Zordle and its test-client set the parallelism to the number of cores available on the machine by default. If you would like to reduce this, you can do so by changing the argument to initThreadPool.

If you'd prefer to use your own Worker/React setup, the code to fetch and serialise parameters, proofs and other instance and advice values may still be useful to look at!

Safari

Note that wasm-bindgen-rayon library is not supported by Safari because it spawns Web Workers from inside another Web Worker. According to the relevant Webkit issue, support for this feature had made it into Safari Technology Preview by November 2022, and indeed the Release Notes for Safari Technology Preview Release 155 claim support, so it is worth checking whether this has made it into Safari if that is important to you.

Debugging

Often, you'll run into issues with your Rust code and see that the WASM execution errors with Uncaught (in promise) RuntimeError: unreachable, a wholly unhelpful error for debugging. This is because the code is compiled in release mode which strips out error messages as a performance optimisation. To debug, you can build the WASM package in debug mode using the flag --dev with wasm-pack build. This will build in debug mode, slowing down execution significantly but allowing you to see any runtime error messages in the browser console. Additionally, you can install the console_error_panic_hook crate (as is done by Zordle) to also get helpful debug messages for runtime panics.

Credits

This guide was written by Nalin. Thanks additionally to Uma and Blaine for significant work on figuring out these steps. Feel free to reach out to me if you have trouble with any of these steps.

Developer Documentation

You want to contribute to the Halo 2 crates? Awesome!

This section covers information about our development processes and review standards, and useful tips for maintaining and extending the codebase.

Feature development

Sometimes feature development can require iterating on a design over time. It can be useful to start using features in downstream crates early on to gain experience with the APIs and functionality, that can feed back into the feature's design prior to it being stabilised. To enable this, we follow a three-stage nightly -> beta -> stable development pattern inspired by (but not identical to) the Rust compiler.

Feature flags

Each unstabilised feature has a default-off feature flag that enables it, of the form unstable-*. The stable API of the crates must not be affected when the feature flag is disabled, except for specific complex features that will be considered on a case-by-case basis.

Two meta-flags are provided to enable all features at a particular stabilisation level:

beta enables all features at the "beta" stage (and implicitly all features at the "stable" stage).
nightly enables all features at the "beta" and "nightly" stages (and implicitly all features at the "stable" stage), i.e. all features are enabled.
When neither flag is enabled (and no feature-specific flags are enabled), then in effect only features at the "stable" stage are enabled.

Feature workflow

If the maintainers have rough consensus that an experimental feature is generally desired, its initial implementation can be merged into the codebase optimistically behind a feature-specific feature flag with a lower standard of review. The feature's flag is added to the nightly feature flag set.
- The feature will become usable by downstream published crates in the next general release of the halo2 crates.
- Subsequent development and refinement of the feature can be performed in-situ via additional PRs, along with additional review.
- If the feature ends up having bad interactions with other features (in particular, already-stabilised features), then it can be removed later without affecting the stable or beta APIs.
Once the feature has had sufficient review, and is at the point where a halo2 user considers it production-ready (and is willing or planning to deploy it to production), the feature's feature flag is moved to the beta feature flag set.
Once the feature has had review equivalent to the stable review policy, and there is rough consensus that the feature is useful to the wider halo2 userbase, the feature's feature flag is removed and the feature becomes part of the main maintained codebase.

For more complex features, the above workflow might be augmented with beta and nightly branches; this will be figured out once a feature requiring this is proposed as a candidate for inclusion.

In-progress features

Feature flag	Stage	Notes
`unstable-sha256-gadget`	`nightly`	The SHA-256 gadget and chip.

Design

Note on Language

We use slightly different language than others to describe PLONK concepts. Here's the overview:

We like to think of PLONK-like arguments as tables, where each column corresponds to a "wire". We refer to entries in this table as "cells".
We like to call "selector polynomials" and so on "fixed columns" instead. We then refer specifically to a "selector constraint" when a cell in a fixed column is being used to control whether a particular constraint is enabled in that row.
We call the other polynomials "advice columns" usually, when they're populated by the prover.
We use the term "rule" to refer to a "gate" like $A (X) \cdot q_{A} (X) + B (X) \cdot q_{B} (X) + A (X) \cdot B (X) \cdot q_{M} (X) + C (X) \cdot q_{C} (X) = 0.$
- TODO: Check how consistent we are with this, and update the code and docs to match.

Proving system

The Halo 2 proving system can be broken down into five stages:

Commit to polynomials encoding the main components of the circuit:
- Cell assignments.
- Permuted values and products for each lookup argument.
- Equality constraint permutations.
Construct the vanishing argument to constrain all circuit relations to zero:
- Standard and custom gates.
- Lookup argument rules.
- Equality constraint permutation rules.
Evaluate the above polynomials at all necessary points:
- All relative rotations used by custom gates across all columns.
- Vanishing argument pieces.
Construct the multipoint opening argument to check that all evaluations are consistent with their respective commitments.
Run the inner product argument to create a polynomial commitment opening proof for the multipoint opening argument polynomial.

These stages are presented in turn across this section of the book.

Example

To aid our explanations, we will at times refer to the following example constraint system:

Four advice columns $a, b, c, d$ .
One fixed column $f$ .
Three custom gates:
- $a \cdot b \cdot c_{- 1} - d = 0$
- $f_{- 1} \cdot c = 0$
- $f \cdot d \cdot a = 0$

tl;dr

The table below provides a (probably too) succinct description of the Halo 2 protocol. This description will likely be replaced by the Halo 2 paper and security proof, but for now serves as a summary of the following sub-sections.

Prover		Verifier
	$\leftarrow$	$t (X) = (X^{n} - 1)$
	$\leftarrow$	$F = [F_{0}, F_{1}, \dots, F_{m - 1}]$
$A = [A_{0}, A_{1}, \dots, A_{m - 1}]$	$\to$
	$\leftarrow$	$θ$
$L = [(A_{0}^{'}, S_{0}^{'}), \dots, (A_{m - 1}^{'}, S_{m - 1}^{'})]$	$\to$
	$\leftarrow$	$β, γ$
$Z_{P} = [Z_{P, 0}, Z_{P, 1}, \dots]$	$\to$
$Z_{L} = [Z_{L, 0}, Z_{L, 1}, \dots]$	$\to$
	$\leftarrow$	$y$
$h (X) = \frac{gate _{0} ( X ) + \dots + y ^{i} \cdot gate _{i} ( X )}{t ( X )}$
$h (X) = h_{0} (X) + \dots + X^{n (d - 1)} h_{d - 1} (X)$
$H = [H_{0}, H_{1}, \dots, H_{d - 1}]$	$\to$
	$\leftarrow$	$x$
$e v a l s = [A_{0} (x), \dots, H_{d - 1} (x)]$	$\to$
		Checks $h (x)$
	$\leftarrow$	$x_{1}, x_{2}$
Constructs $h^{'} (X)$ multipoint opening poly
$U = Commit (h^{'} (X))$	$\to$
	$\leftarrow$	$x_{3}$
$q_{evals} = [Q_{0} (x_{3}), Q_{1} (x_{3}), \dots]$	$\to$
$u_{eval} = U (x_{3})$	$\to$
	$\leftarrow$	$x_{4}$

Then the prover and verifier:

Construct $finalPoly (X)$ as a linear combination of $Q$ and $U$ using powers of $x_{4}$ ;
Construct $finalPolyEval$ as the equivalent linear combination of $q_{evals}$ and $u_{eval}$ ; and
Perform $InnerProduct (finalPoly (X), x_{3}, finalPolyEval) .$

TODO: Write up protocol components that provide zero-knowledge.

Lookup argument

Halo 2 uses the following lookup technique, which allows for lookups in arbitrary sets, and is arguably simpler than Plookup.

Note on Language

In addition to the general notes on language:

We call the $Z (X)$ polynomial (the grand product argument polynomial for the permutation argument) the "permutation product" column.

Technique Description

For ease of explanation, we'll first describe a simplified version of the argument that ignores zero knowledge.

We express lookups in terms of a "subset argument" over a table with $2^{k}$ rows (numbered from 0), and columns $A$ and $S .$

The goal of the subset argument is to enforce that every cell in $A$ is equal to some cell in $S .$ This means that more than one cell in $A$ can be equal to the same cell in $S,$ and some cells in $S$ don't need to be equal to any of the cells in $A .$

$S$ might be fixed, but it doesn't need to be. That is, we can support looking up values in either fixed or variable tables (where the latter includes advice columns).
$A$ and $S$ can contain duplicates. If the sets represented by $A$ and/or $S$ are not naturally of size $2^{k},$ we extend $S$ with duplicates and $A$ with dummy values known to be in $S .$
- Alternatively we could add a "lookup selector" that controls which elements of the $A$ column participate in lookups. This would modify the occurrence of $A (X)$ in the permutation rule below to replace $A$ with, say, $S_{0}$ if a lookup is not selected.

Let $ℓ_{i}$ be the Lagrange basis polynomial that evaluates to $1$ at row $i,$ and $0$ otherwise.

We start by allowing the prover to supply permutation columns of $A$ and $S .$ Let's call these $A^{'}$ and $S^{'},$ respectively. We can enforce that they are permutations using a permutation argument with product column $Z$ with the rules:

$Z (ω X) \cdot (A^{'} (X) + β) \cdot (S^{'} (X) + γ) - Z (X) \cdot (A (X) + β) \cdot (S (X) + γ) = 0$ $ℓ_{0} (X) \cdot (1 - Z (X)) = 0$

i.e. provided that division by zero does not occur, we have for all $i \in [0, 2^{k})$ :

$Z_{i + 1} = Z_{i} \cdot \frac{( A _{i} + β ) \cdot ( S _{i} + γ )}{( A _{i}^{'} + β ) \cdot ( S _{i}^{'} + γ )}$ $Z_{2^{k}} = Z_{0} = 1.$

This is a version of the permutation argument which allows $A^{'}$ and $S^{'}$ to be permutations of $A$ and $S,$ respectively, but doesn't specify the exact permutations. $β$ and $γ$ are separate challenges so that we can combine these two permutation arguments into one without worrying that they might interfere with each other.

The goal of these permutations is to allow $A^{'}$ and $S^{'}$ to be arranged by the prover in a particular way:

All the cells of column $A^{'}$ are arranged so that like-valued cells are vertically adjacent to each other. This could be done by some kind of sorting algorithm, but all that matters is that like-valued cells are on consecutive rows in column $A^{'},$ and that $A^{'}$ is a permutation of $A .$
The first row in a sequence of like values in $A^{'}$ is the row that has the corresponding value in $S^{'} .$ Apart from this constraint, $S^{'}$ is any arbitrary permutation of $S .$

Now, we'll enforce that either $A_{i}^{'} = S_{i}^{'}$ or that $A_{i}^{'} = A_{i - 1}^{'},$ using the rule

$(A^{'} (X) - S^{'} (X)) \cdot (A^{'} (X) - A^{'} (ω^{- 1} X)) = 0$

In addition, we enforce $A_{0}^{'} = S_{0}^{'}$ using the rule

$ℓ_{0} (X) \cdot (A^{'} (X) - S^{'} (X)) = 0$

(The $A^{'} (X) - A^{'} (ω^{- 1} X)$ term of the first rule here has no effect at row $0,$ even though $ω^{- 1} X$ "wraps", because of the second rule.)

Together these constraints effectively force every element in $A^{'}$ (and thus $A$ ) to equal at least one element in $S^{'}$ (and thus $S$ ). Proof: by induction on prefixes of the rows.

Zero-knowledge adjustment

In order to achieve zero knowledge for the PLONK-based proof system, we will need the last $t$ rows of each column to be filled with random values. This requires an adjustment to the lookup argument, because these random values would not satisfy the constraints described above.

We limit the number of usable rows to $u = 2^{k} - t - 1.$ We add two selectors:

$q_{blind}$ is set to $1$ on the last $t$ rows, and $0$ elsewhere;
$q_{last}$ is set to $1$ only on row $u,$ and $0$ elsewhere (i.e. it is set on the row in between the usable rows and the blinding rows).

We enable the constraints from above only for the usable rows:

$(1 - (q_{last} (X) + q_{blind} (X))) \cdot (Z (ω X) \cdot (A^{'} (X) + β) \cdot (S^{'} (X) + γ) - Z (X) \cdot (A (X) + β) \cdot (S (X) + γ)) = 0$ $(1 - (q_{last} (X) + q_{blind} (X))) \cdot (A^{'} (X) - S^{'} (X)) \cdot (A^{'} (X) - A^{'} (ω^{- 1} X)) = 0$

The rules that are enabled on row $0$ remain the same:

$ℓ_{0} (X) \cdot (A^{'} (X) - S^{'} (X)) = 0$ $ℓ_{0} (X) \cdot (1 - Z (X)) = 0$

Since we can no longer rely on the wraparound to ensure that the product $Z$ becomes $1$ again at $ω^{2^{k}},$ we would instead need to constrain $Z (ω^{u})$ to $1.$ However, there is a potential difficulty: if any of the values $A_{i} + β$ or $S_{i} + γ$ are zero for $i \in [0, u),$ then it might not be possible to satisfy the permutation argument. This occurs with negligible probability over choices of $β$ and $γ,$ but is an obstacle to achieving perfect zero knowledge (because an adversary can rule out witnesses that would cause this situation), as well as perfect completeness.

To ensure both perfect completeness and perfect zero knowledge, we allow $Z (ω^{u})$ to be either zero or one:

$q_{last} (X) \cdot (Z (X)^{2} - Z (X)) = 0$

Now if $A_{i} + β$ or $S_{i} + γ$ are zero for some $i,$ we can set $Z_{j} = 0$ for $i < j \leq u,$ satisfying the constraint system.

Note that the challenges $β$ and $γ$ are chosen after committing to $A$ and $S$ (and to $A^{'}$ and $S^{'}$ ), so the prover cannot force the case where some $A_{i} + β$ or $S_{i} + γ$ is zero to occur. Since this case occurs with negligible probability, soundness is not affected.

Cost

There is the original column $A$ and the fixed column $S .$
There is a permutation product column $Z .$
There are the two permutations $A^{'}$ and $S^{'} .$
The gates are all of low degree.

Generalizations

Halo 2's lookup argument implementation generalizes the above technique in the following ways:

$A$ and $S$ can be extended to multiple columns, combined using a random challenge. $A^{'}$ and $S^{'}$ stay as single columns.
- The commitments to the columns of $S$ can be precomputed, then combined cheaply once the challenge is known by taking advantage of the homomorphic property of Pedersen commitments.
- The columns of $A$ can be given as arbitrary polynomial expressions using relative references. These will be substituted into the product column constraint, subject to the maximum degree bound. This potentially saves one or more advice columns.
Then, a lookup argument for an arbitrary-width relation can be implemented in terms of a subset argument, i.e. to constrain $R (x, y, ...)$ in each row, consider $R$ as a set of tuples $S$ (using the method of the previous point), and check that $(x, y, ...) \in R .$
- In the case where $R$ represents a function, this implicitly also checks that the inputs are in the domain. This is typically what we want, and often saves an additional range check.
We can support multiple tables in the same circuit, by combining them into a single table that includes a tag column to identify the original table.
- The tag column could be merged with the "lookup selector" mentioned earlier, if this were implemented.

These generalizations are similar to those in sections 4 and 5 of the Plookup paper. That is, the differences from Plookup are in the subset argument. This argument can then be used in all the same ways; for instance, the optimized range check technique in section 5 of the Plookup paper can also be used with this subset argument.

Permutation argument

Given that gates in halo2 circuits operate "locally" (on cells in the current row or defined relative rows), it is common to need to copy a value from some arbitrary cell into the current row for use in a gate. This is performed with an equality constraint, which enforces that the source and destination cells contain the same value.

We implement these equality constraints by constructing a permutation that represents the constraints, and then using a permutation argument within the proof to enforce them.

Notation

A permutation is a one-to-one and onto mapping of a set onto itself. A permutation can be factored uniquely into a composition of cycles (up to ordering of cycles, and rotation of each cycle).

We sometimes use cycle notation to write permutations. Let $(a b c)$ denote a cycle where $a$ maps to $b,$ $b$ maps to $c,$ and $c$ maps to $a$ (with the obvious generalization to arbitrary-sized cycles). Writing two or more cycles next to each other denotes a composition of the corresponding permutations. For example, $(a b) (c d)$ denotes the permutation that maps $a$ to $b,$ $b$ to $a,$ $c$ to $d,$ and $d$ to $c .$

Constructing the permutation

Goal

We want to construct a permutation in which each subset of variables that are in a equality-constraint set form a cycle. For example, suppose that we have a circuit that defines the following equality constraints:

$a \equiv b$
$a \equiv c$
$d \equiv e$

From this we have the equality-constraint sets ${a, b, c}$ and ${d, e} .$ We want to construct the permutation:

$(a b c) (d e)$

which defines the mapping of $[a, b, c, d, e]$ to $[b, c, a, e, d] .$

Algorithm

We need to keep track of the set of cycles, which is a set of disjoint sets. Efficient data structures for this problem are known; for the sake of simplicity we choose one that is not asymptotically optimal but is easy to implement.

We represent the current state as:

an array $mapping$ for the permutation itself;
an auxiliary array $aux$ that keeps track of a distinguished element of each cycle;
another array $sizes$ that keeps track of the size of each cycle.

We have the invariant that for each element $x$ in a given cycle $C,$ $aux (x)$ points to the same element $c \in C .$ This allows us to quickly decide whether two given elements $x$ and $y$ are in the same cycle, by checking whether $aux (x) = aux (y) .$ Also, $sizes (aux (x))$ gives the size of the cycle containing $x .$ (This is guaranteed only for $sizes (aux (x)),$ not for $sizes (x) .$ )

The algorithm starts with a representation of the identity permutation: for all $x,$ we set $mapping (x) = x,$ $aux (x) = x,$ and $sizes (x) = 1.$

To add an equality constraint $left \equiv right$ :

Check whether $left$ and $right$ are already in the same cycle, i.e. whether $aux (left) = aux (right) .$ If so, there is nothing to do.
Otherwise, $left$ and $right$ belong to different cycles. Make $left$ the larger cycle and $right$ the smaller one, by swapping them iff $sizes (aux (left)) < sizes (aux (right)) .$
Set $sizes (aux (left)) := sizes (aux (left)) + sizes (aux (right)) .$
Following the mapping around the right (smaller) cycle, for each element $x$ set $aux (x) := aux (left) .$
Splice the smaller cycle into the larger one by swapping $mapping (left)$ with $mapping (right) .$

For example, given two disjoint cycles $(A B C D)$ and $(E F G H)$ :

A +---> B
^       +
|       |
+       v
D <---+ C       E +---> F
                ^       +
                |       |
                +       v
                H <---+ G

After adding constraint $B \equiv E$ the above algorithm produces the cycle:

A +---> B +-------------+
^                       |
|                       |
+                       v
D <---+ C <---+ E       F
                ^       +
                |       |
                +       v
                H <---+ G

Broken alternatives

If we did not check whether $left$ and $right$ were already in the same cycle, then we could end up undoing an equality constraint. For example, if we have the following constraints:

$a \equiv b$
$b \equiv c$
$c \equiv d$
$b \equiv d$

and we tried to implement adding an equality constraint just using step 5 of the above algorithm, then we would end up constructing the cycle $(a b) (c d),$ rather than the correct $(a b c d) .$

Argument specification

We need to check a permutation of cells in $m$ columns, represented in Lagrange basis by polynomials $v_{0}, \dots, v_{m - 1} .$

We will label each cell in those $m$ columns with a unique element of $F^{\times} .$

Suppose that we have a permutation on these labels, $σ (column : i, row : j) = (column : i^{'}, row : j^{'}) .$ in which the cycles correspond to equality-constraint sets.

If we consider the set of pairs ${(label, value)}$ , then the values within each cycle are equal if and only if permuting the label in each pair by $σ$ yields the same set:

Since the labels are distinct, set equality is the same as multiset equality, which we can check using a product argument.

Let $ω$ be a $2^{k}$ root of unity and let $δ$ be a $T$ root of unity, where $T \cdot 2^{S} + 1 = p$ with $T$ odd and $k \leq S .$ We will use $δ^{i} \cdot ω^{j} \in F^{\times}$ as the label for the cell in the $j$ th row of the $i$ th column of the permutation argument.

We represent $σ$ by a vector of $m$ polynomials $s_{i} (X)$ such that $s_{i} (ω^{j}) = δ^{i^{'}} \cdot ω^{j^{'}} .$

Notice that the identity permutation can be represented by the vector of $m$ polynomials $ID_{i} (ω^{j})$ such that $ID_{i} (ω^{j}) = δ^{i} \cdot ω^{j} .$

We will use a challenge $β$ to compress each $(label, value)$ pair to $value + β \cdot label .$ Just as in the product argument we used for lookups, we also use a challenge $γ$ to randomize each term of the product.

Now given our permutation represented by $s_{0}, \dots, s_{m - 1}$ over columns represented by $v_{0}, \dots, v_{m - 1},$ we want to ensure that: $i = 0 \prod m - 1 j = 0 \prod n - 1 (\frac{v _{i} ( ω ^{j} ) + β \cdot δ ^{i} \cdot ω ^{j} + γ}{v _{i} ( ω ^{j} ) + β \cdot s _{i} ( ω ^{j} ) + γ}) = 1$

Here $v_{i} (ω^{j}) + β \cdot δ^{i} \cdot ω^{j}$ represents the unpermuted $(label, v a l u e)$ pair, and $v_{i} (ω^{j}) + β \cdot s_{i} (ω^{j})$ represents the permuted $(σ (label), v a l u e)$ pair.

Let $Z_{P}$ be such that $Z_{P} (ω^{0}) = Z_{P} (ω^{n}) = 1$ and for $0 \leq j < n$ : $Z_{P} (ω^{j + 1}) = h = 0 \prod j i = 0 \prod m - 1 \frac{v _{i} ( ω ^{h} ) + β \cdot δ ^{i} \cdot ω ^{h} + γ}{v _{i} ( ω ^{h} ) + β \cdot s _{i} ( ω ^{h} ) + γ} = Z_{P} (ω^{j}) i = 0 \prod m - 1 \frac{v _{i} ( ω ^{j} ) + β \cdot δ ^{i} \cdot ω ^{j} + γ}{v _{i} ( ω ^{j} ) + β \cdot s _{i} ( ω ^{j} ) + γ}$

Then it is sufficient to enforce the rules: $Z_{P} (ω X) \cdot i = 0 \prod m - 1 (v_{i} (X) + β \cdot s_{i} (X) + γ) - Z_{P} (X) \cdot i = 0 \prod m - 1 (v_{i} (X) + β \cdot δ^{i} \cdot X + γ) = 0 ℓ_{0} \cdot (1 - Z_{P} (X)) = 0$

This assumes that the number of columns $m$ is such that the polynomial in the first rule above fits within the degree bound of the PLONK configuration. We will see below how to handle a larger number of columns.

The optimization used to obtain the simple representation of the identity permutation was suggested by Vitalik Buterin for PLONK, and is described at the end of section 8 of the PLONK paper. Note that the $δ^{i}$ are all distinct quadratic non-residues, provided that the number of columns that are enabled for equality is no more than $T$ , which always holds in practice for the curves used in Halo 2.

Zero-knowledge adjustment

Similarly to the lookup argument, we need an adjustment to the above argument to account for the last $t$ rows of each column being filled with random values.

We limit the number of usable rows to $u = 2^{k} - t - 1.$ We add two selectors, defined in the same way as for the lookup argument:

$q_{blind}$ is set to $1$ on the last $t$ rows, and $0$ elsewhere;
$q_{last}$ is set to $1$ only on row $u,$ and $0$ elsewhere (i.e. it is set on the row in between the usable rows and the blinding rows).

We enable the product rule from above only for the usable rows:

$(1 - (q_{last} (X) + q_{blind} (X))) \cdot$ $(Z_{P} (ω X) \cdot i = 0 \prod m - 1 (v_{i} (X) + β \cdot s_{i} (X) + γ) - Z_{P} (X) \cdot i = 0 \prod m - 1 (v_{i} (X) + β \cdot δ^{i} \cdot X + γ)) = 0$

The rule that is enabled on row $0$ remains the same:

$ℓ_{0} (X) \cdot (1 - Z_{P} (X)) = 0$

Since we can no longer rely on the wraparound to ensure that each product $Z_{P}$ becomes $1$ again at $ω^{2^{k}},$ we would instead need to constrain $Z (ω^{u}) = 1.$ This raises the same problem that was described for the lookup argument. So we allow $Z (ω^{u})$ to be either zero or one:

$q_{last} (X) \cdot (Z_{P} (X)^{2} - Z_{P} (X)) = 0$

which gives perfect completeness and zero knowledge.

Spanning a large number of columns

The halo2 implementation does not in practice limit the number of columns for which equality constraints can be enabled. Therefore, it must solve the problem that the above approach might yield a product rule with a polynomial that exceeds the PLONK configuration's degree bound. The degree bound could be raised, but this would be inefficient if no other rules require a larger degree.

Instead, we split the product across $b$ sets of $m$ columns, using product columns $Z_{P, 0}, \dots Z_{P, b - 1},$ and we use another rule to copy the product from the end of one column set to the beginning of the next.

That is, for $0 \leq a < b$ we have:

$(1 - (q_{last} (X) + q_{blind} (X))) \cdot$ $(Z_{P, a} (ω X) \cdot i = am \prod (a + 1) m - 1 (v_{i} (X) + β \cdot s_{i} (X) + γ) - Z_{P} (X) \cdot i = am \prod (a + 1) m - 1 (v_{i} (X) + β \cdot δ^{i} \cdot X + γ))$ $= 0$

For simplicity this is written assuming that the number of columns enabled for equality constraints is a multiple of $m$ ; if not then the products for the last column set will have fewer than $m$ terms.

For the first column set we have:

$ℓ_{0} \cdot (1 - Z_{P, 0} (X)) = 0$

For each subsequent column set, $0 < a < b,$ we use the following rule to copy $Z_{P, a - 1} (ω^{u})$ to the start of the next column set, $Z_{P, a} (ω^{0})$ :

$ℓ_{0} \cdot (Z_{P, a} (X) - Z_{P, a - 1} (ω^{u} X)) = 0$

For the last column set, we allow $Z_{P, b - 1} (ω^{u})$ to be either zero or one:

$q_{last} (X) \cdot (Z_{P, b - 1} (X)^{2} - Z_{P, b - 1} (X)) = 0$

which gives perfect completeness and zero knowledge as before.

Circuit commitments

Committing to the circuit assignments

At the start of proof creation, the prover has a table of cell assignments that it claims satisfy the constraint system. The table has $n = 2^{k}$ rows, and is broken into advice, instance, and fixed columns. We define $F_{i, j}$ as the assignment in the $j$ th row of the $i$ th fixed column. Without loss of generality, we'll similarly define $A_{i, j}$ to represent the advice and instance assignments.

We separate fixed columns here because they are provided by the verifier, whereas the advice and instance columns are provided by the prover. In practice, the commitments to instance and fixed columns are computed by both the prover and verifier, and only the advice commitments are stored in the proof.

To commit to these assignments, we construct Lagrange polynomials of degree $n - 1$ for each column, over an evaluation domain of size $n$ (where $ω$ is the $n$ th primitive root of unity):

$a_{i} (X)$ interpolates such that $a_{i} (ω^{j}) = A_{i, j}$ .
$f_{i} (X)$ interpolates such that $f_{i} (ω^{j}) = F_{i, j}$ .

We then create a blinding commitment to the polynomial for each column:

$A = [Commit (a_{0} (X)), \dots, Commit (a_{i} (X))]$ $F = [Commit (f_{0} (X)), \dots, Commit (f_{i} (X))]$

$F$ is constructed as part of key generation, using a blinding factor of $1$ . $A$ is constructed by the prover and sent to the verifier.

Committing to the lookup permutations

The verifier starts by sampling $θ$ , which is used to keep individual columns within lookups independent. Then, the prover commits to the permutations for each lookup as follows:

Given a lookup with input column polynomials $[A_{0} (X), \dots, A_{m - 1} (X)]$ and table column polynomials $[S_{0} (X), \dots, S_{m - 1} (X)]$ , the prover constructs two compressed polynomials

$A_{compressed} (X) = θ^{m - 1} A_{0} (X) + θ^{m - 2} A_{1} (X) + \dots + θ A_{m - 2} (X) + A_{m - 1} (X)$ $S_{compressed} (X) = θ^{m - 1} S_{0} (X) + θ^{m - 2} S_{1} (X) + \dots + θ S_{m - 2} (X) + S_{m - 1} (X)$
The prover then permutes $A_{compressed} (X)$ and $S_{compressed} (X)$ according to the rules of the lookup argument, obtaining $A^{'} (X)$ and $S^{'} (X)$ .

The prover creates blinding commitments for all of the lookups

$L = [(Commit (A^{'} (X))), Commit (S^{'} (X))), \dots]$

and sends them to the verifier.

After the verifier receives $A$ , $F$ , and $L$ , it samples challenges $β$ and $γ$ that will be used in the permutation argument and the remainder of the lookup argument below. (These challenges can be reused because the arguments are independent.)

Committing to the equality constraint permutation

Let $c$ be the number of columns that are enabled for equality constraints.

Let $m$ be the maximum number of columns that can be accommodated by a column set without exceeding the PLONK configuration's maximum constraint degree.

Let $u$ be the number of “usable” rows as defined in the Permutation argument section.

Let $b = ceiling (c / m) .$

The prover constructs a vector $P$ of length $b u$ such that for each column set $0 \leq a < b$ and each row $0 \leq j < u,$

$P_{a u + j} = i = am \prod m i n (c, (a + 1) m) - 1 \frac{v _{i} ( ω ^{j} ) + β \cdot δ ^{i} \cdot ω ^{j} + γ}{v _{i} ( ω ^{j} ) + β \cdot s _{i} ( ω ^{j} ) + γ} .$

The prover then computes a running product of $P$ , starting at $1$ , and a vector of polynomials $Z_{P, 0.. b - 1}$ that each have a Lagrange basis representation corresponding to a $u$ -sized slice of this running product, as described in the Permutation argument section.

The prover creates blinding commitments to each $Z_{P, a}$ polynomial:

$Z_{P} = [Commit (Z_{P, 0} (X)), \dots, Commit (Z_{P, b - 1} (X))]$

and sends them to the verifier.

Committing to the lookup permutation product columns

In addition to committing to the individual permuted lookups, for each lookup, the prover needs to commit to the permutation product column:

The prover constructs a vector $P$ :

$P_{j} = \frac{( A _{compressed} ( ω ^{j} ) + β ) ( S _{compressed} ( ω ^{j} ) + γ )}{( A ^{'} ( ω ^{j} ) + β ) ( S ^{'} ( ω ^{j} ) + γ )}$

The prover constructs a polynomial $Z_{L}$ which has a Lagrange basis representation corresponding to a running product of $P$ , starting at $Z_{L} (1) = 1$ .

$β$ and $γ$ are used to combine the permutation arguments for $A^{'} (X)$ and $S^{'} (X)$ while keeping them independent. The important thing here is that the verifier samples $β$ and $γ$ after the prover has created $A$ , $F$ , and $L$ (and thus committed to all the cell values used in lookup columns, as well as $A^{'} (X)$ and $S^{'} (X)$ for each lookup).

As before, the prover creates blinding commitments to each $Z_{L}$ polynomial:

$Z_{L} = [Commit (Z_{L} (X)), \dots]$

and sends them to the verifier.

Vanishing argument

Having committed to the circuit assignments, the prover now needs to demonstrate that the various circuit relations are satisfied:

The custom gates, represented by polynomials $gate_{i} (X)$ .
The rules of the lookup arguments.
The rules of the equality constraint permutations.

Each of these relations is represented as a polynomial of degree $d$ (the maximum degree of any of the relations) with respect to the circuit columns. Given that the degree of the assignment polynomials for each column is $n - 1$ , the relation polynomials have degree $d (n - 1)$ with respect to $X$ .

In our example, these would be the gate polynomials, of degree $3 n - 3$ :

$gate_{0} (X) = a_{0} (X) \cdot a_{1} (X) \cdot a_{2} (X ω^{- 1}) - a_{3} (X)$

$gate_{1} (X) = f_{0} (X ω^{- 1}) \cdot a_{2} (X)$

$gate_{2} (X) = f_{0} (X) \cdot a_{3} (X) \cdot a_{0} (X)$

A relation is satisfied if its polynomial is equal to zero. One way to demonstrate this is to divide each polynomial relation by the vanishing polynomial $t (X) = (X^{n} - 1)$ , which is the lowest-degree polynomial that has roots at every $ω^{i}$ . If relation's polynomial is perfectly divisible by $t (X)$ , it is equal to zero over the domain (as desired).

This simple construction would require a polynomial commitment per relation. Instead, we commit to all of the circuit relations simultaneously: the verifier samples $y$ , and then the prover constructs the quotient polynomial

$h (X) = \frac{gate _{0} ( X ) + y \cdot gate _{1} ( X ) + \dots + y ^{i} \cdot gate _{i} ( X ) + \dots}{t ( X )},$

where the numerator is a random (the prover commits to the cell assignments before the verifier samples $y$ ) linear combination of the circuit relations.

If the numerator polynomial (in formal indeterminate $X$ ) is perfectly divisible by $t (X)$ , then with high probability all relations are satisfied.
Conversely, if at least one relation is not satisfied, then with high probability $h (x) \cdot t (x)$ will not equal the evaluation of the numerator at $x$ . In this case, the numerator polynomial would not be perfectly divisible by $t (X)$ .

Committing to $h (X)$

$h (X)$ has degree $d (n - 1) - n$ (because the divisor $t (X)$ has degree $n$ ). However, the polynomial commitment scheme we use for Halo 2 only supports committing to polynomials of degree $n - 1$ (which is the maximum degree that the rest of the protocol needs to commit to). Instead of increasing the cost of the polynomial commitment scheme, the prover split $h (X)$ into pieces of degree $n - 1$

$h_{0} (X) + X^{n} h_{1} (X) + \dots + X^{n (d - 1)} h_{d - 1} (X),$

and produces blinding commitments to each piece

$H = [Commit (h_{0} (X)), Commit (h_{1} (X)), \dots, Commit (h_{d - 1} (X))] .$

Evaluating the polynomials

At this point, we have committed to all properties of the circuit. The verifier now wants to see if the prover committed to the correct $h (X)$ polynomial. The verifier samples $x$ , and the prover produces the purported evaluations of the various polynomials at $x$ , for all the relative offsets used in the circuit, as well as $h (X)$ .

In our example, this would be:

$a_{0} (x)$

$a_{1} (x)$

$a_{2} (x)$ , $a_{2} (x ω^{- 1})$

$a_{3} (x)$

$f_{0} (x)$ , $f_{0} (x ω^{- 1})$

$h_{0} (x)$ , ..., $h_{d - 1} (x)$

The verifier checks that these evaluations satisfy the form of $h (X)$ :

$\frac{gate _{0} ( x ) + \dots + y ^{i} \cdot gate _{i} ( x ) + \dots}{t ( x )} = h_{0} (x) + \dots + x^{n (d - 1)} h_{d - 1} (x)$

Now content that the evaluations collectively satisfy the gate constraints, the verifier needs to check that the evaluations themselves are consistent with the original circuit commitments, as well as $H$ . To implement this efficiently, we use a multipoint opening argument.

Multipoint opening argument

Consider the commitments $A, B, C, D$ to polynomials $a (X), b (X), c (X), d (X)$ . Let's say that $a$ and $b$ were queried at the point $x$ , while $c$ and $d$ were queried at both points $x$ and $ω x$ . (Here, $ω$ is the primitive root of unity in the multiplicative subgroup over which we constructed the polynomials).

To open these commitments, we could create a polynomial $Q$ for each point that we queried at (corresponding to each relative rotation used in the circuit). But this would not be efficient in the circuit; for example, $c (X)$ would appear in multiple polynomials.

Instead, we can group the commitments by the sets of points at which they were queried: ${x} A B {x, ω x} C D$

For each of these groups, we combine them into a polynomial set, and create a single $Q$ for that set, which we open at each rotation.

Optimization steps

The multipoint opening optimization takes as input:

A random $x$ sampled by the verifier, at which we evaluate $a (X), b (X), c (X), d (X)$ .
Evaluations of each polynomial at each point of interest, provided by the prover: $a (x), b (x), c (x), d (x), c (ω x), d (ω x)$

These are the outputs of the vanishing argument.

The multipoint opening optimization proceeds as such:

Sample random $x_{1}$ , to keep $a, b, c, d$ linearly independent.
Accumulate polynomials and their corresponding evaluations according to the point set at which they were queried: q_polys: $q_{1} (X) q_{2} (X) = = a (X) c (X) + + x_{1} b (X) x_{1} d (X)$ q_eval_sets:
```
        [
            [a(x) + x_1 b(x)],
            [
                c(x) + x_1 d(x),
                c(\omega x) + x_1 d(\omega x)
            ]
        ]
```
NB: q_eval_sets is a vector of sets of evaluations, where the outer vector corresponds to the point sets, which in this example are ${x}$ and ${x, ω x}$ , and the inner vector corresponds to the points in each set.
Interpolate each set of values in q_eval_sets: r_polys: $r_{1} (X) s . t . r_{2} (X) s . t . r_{1} (x) r_{2} (x) r_{2} (ω x) = = = a (x) + x_{1} b (x) c (x) + x_{1} d (x) c (ω x) + x_{1} d (ω x)$
Construct f_polys which check the correctness of q_polys: f_polys $f_{1} (X) f_{2} (X) = = \frac{q _{1} ( X ) - r _{1} ( X )}{X - x} \frac{q _{2} ( X ) - r _{2} ( X )}{( X - x ) ( X - ω x )}$

If $q_{1} (x) = r_{1} (x)$ , then $f_{1} (X)$ should be a polynomial. If $q_{2} (x) = r_{2} (x)$ and $q_{2} (ω x) = r_{2} (ω x)$ then $f_{2} (X)$ should be a polynomial.
Sample random $x_{2}$ to keep the f_polys linearly independent.
Construct $f (X) = f_{1} (X) + x_{2} f_{2} (X)$ .
Sample random $x_{3}$ , at which we evaluate $f (X)$ : $f (x_{3}) = = f_{1} (x_{3}) \frac{q _{1} ( x _{3} ) - r _{1} ( x _{3} )}{x _{3} - x} + + x_{2} f_{2} (x_{3}) x_{2} \frac{q _{2} ( x _{3} ) - r _{2} ( x _{3} )}{( x _{3} - x ) ( x _{3} - ω x )}$
Sample random $x_{4}$ to keep $f (X)$ and q_polys linearly independent.
Construct final_poly, $f ina l_p o l y (X) = f (X) + x_{4} q_{1} (X) + x_{4}^{2} q_{2} (X),$ which is the polynomial we commit to in the inner product argument.

Inner product argument

Halo 2 uses a polynomial commitment scheme for which we can create polynomial commitment opening proofs, based around the Inner Product Argument.

TODO: Explain Halo 2's variant of the IPA.

It is very similar to $PC_{DL} . Open$ from Appendix A.2 of BCMS20. See this comparison for details.

Comparison to other work

BCMS20 Appendix A.2

Appendix A.2 of BCMS20 describes a polynomial commitment scheme that is similar to the one described in BGH19 (BCMS20 being a generalization of the original Halo paper). Halo 2 builds on both of these works, and thus itself uses a polynomial commitment scheme that is very similar to the one in BCMS20.

The following table provides a mapping between the variable names in BCMS20, and the equivalent objects in Halo 2 (which builds on the nomenclature from the Halo paper):

BCMS20	Halo 2
$S$	$H$
$H$	$U$
$C$	`msm` or $P$
$α$	$ι$
$ξ_{0}$	$z$
$ξ_{i}$	`challenge_i`
$H^{'}$	$[z] U$
$\overset{p}{ˉ}$	`s_poly`
$\overset{ω}{ˉ}$	`s_poly_blind`
$\overset{ˉ}{C}$	`s_poly_commitment`
$h (X)$	$g (X)$
$U$	$G$
$ω^{'}$	`blind` / $ξ$
$c$	$a$
$c$	$a = a_{0}$
$v^{'}$	$ab$

Halo 2's polynomial commitment scheme differs from Appendix A.2 of BCMS20 in two ways:

Step 8 of the $Open$ algorithm computes a "non-hiding" commitment $C^{'}$ prior to the inner product argument, which opens to the same value as $C$ but is a commitment to a randomly-drawn polynomial. The remainder of the protocol involves no blinding. By contrast, in Halo 2 we blind every single commitment that we make (even for instance and fixed polynomials, though using a blinding factor of 1 for the fixed polynomials); this makes the protocol simpler to reason about. As a consequence of this, the verifier needs to handle the cumulative blinding factor at the end of the protocol, and so there is no need to derive an equivalent to $C^{'}$ at the start of the protocol.
- $C^{'}$ is also an input to the random oracle for $ξ_{0}$ ; in Halo 2 we utilize a transcript that has already committed to the equivalent components of $C^{'}$ prior to sampling $z$ .
The $PC_{DL} . SuccinctCheck$ subroutine (Figure 2 of BCMS20) computes the initial group element $C_{0}$ by adding $[v] H^{'} = [v ξ_{0}] H$ , which requires two scalar multiplications. Instead, we subtract $[v] G_{0}$ from the original commitment $P$ , so that we're effectively opening the polynomial at the point to the value zero. The computation $[v] G_{0}$ is more efficient in the context of recursion because $G_{0}$ is a fixed base (so we can use lookup tables).

Protocol Description

Preliminaries

We take $λ$ as our security parameter, and unless explicitly noted all algorithms and adversaries are probabilistic (interactive) Turing machines that run in polynomial time in this security parameter. We use $negl (λ)$ to denote a function that is negligible in $λ$ .

Cryptographic Groups

We let $G$ denote a cyclic group of prime order $p$ . The identity of a group is written as $O$ . We refer to the scalars of elements in $G$ as elements in a scalar field $F$ of size $p$ . Group elements are written in capital letters while scalars are written in lowercase or Greek letters. Vectors of scalars or group elements are written in boldface, i.e. $a \in F^{n}$ and $G \in G^{n}$ . Group operations are written additively and the multiplication of a group element $G$ by a scalar $a$ is written $[a] G$ .

We will often use the notation $⟨ a, b ⟩$ to describe the inner product of two like-length vectors of scalars $a, b \in F^{n}$ . We also use this notation to represent the linear combination of group elements such as $⟨ a, G ⟩$ with $a \in F^{n}, G \in G^{n}$ , computed in practice by a multiscalar multiplication.

We use $0^{n}$ to describe a vector of length $n$ that contains only zeroes in $F$ .

Discrete Log Relation Problem. The advantage metric $Adv_{G, n}^{dl-rel} (A, λ) = Pr [G_{G, n}^{dl-rel} (A, λ)]$ is defined with respect the following game. $\underline{Game G_{G, n}^{dl-rel} (A, λ) :} G \leftarrow G_{λ}^{n} a \leftarrow A (G) Return (⟨ a, G ⟩ = O \land a \neq = 0^{n})$

Given an $n$ -length vector $G \in G^{n}$ of group elements, the discrete log relation problem asks for $g \in F^{n}$ such that $g \neq = 0^{n}$ and yet $⟨ g, G ⟩ = O$ , which we refer to as a non-trivial discrete log relation. The hardness of this problem is tightly implied by the hardness of the discrete log problem in the group as shown in Lemma 3 of [JT20]. Formally, we use the game $G_{G, n}^{dl-rel}$ defined above to capture this problem.

Interactive Proofs

Interactive proofs are a triple of algorithms $IP = (Setup, P, V)$ . The algorithm $Setup (1^{λ})$ produces as its output some public parameters commonly referred to by $pp$ . The prover $P$ and verifier $V$ are interactive machines (with access to $pp$ ) and we denote by $⟨ P (x), V (y)⟩$ an algorithm that executes a two-party protocol between them on inputs $x, y$ . The output of this protocol, a transcript of their interaction, contains all of the messages sent between $P$ and $V$ . At the end of the protocol, the verifier outputs a decision bit.

Zero knowledge Arguments of Knowledge

Proofs of knowledge are interactive proofs where the prover aims to convince the verifier that they know a witness $w$ such that $(x, w) \in R$ for a statement $x$ and polynomial-time decidable relation $R$ . We will work with arguments of knowledge which assume computationally-bounded provers.

We will analyze arguments of knowledge through the lens of four security notions.

Completeness: If the prover possesses a valid witness, can they always convince the verifier? It is useful to understand this property as it can have implications for the other security notions.
Soundness: Can a cheating prover falsely convince the verifier of the correctness of a statement that is not actually correct? We refer to the probability that a cheating prover can falsely convince the verifier as the soundness error.
Knowledge soundness: When the verifier is convinced the statement is correct, does the prover actually possess ("know") a valid witness? We refer to the probability that a cheating prover falsely convinces the verifier of this knowledge as the knowledge error.
Zero knowledge: Does the verifier learn anything besides that which can be inferred from the correctness of the statement and the prover's knowledge of a valid witness?

First, we will visit the simple definition of completeness.

Perfect Completeness. An interactive argument $(Setup, P, V)$ has perfect completeness if for all polynomial-time decidable relations $R$ and for all non-uniform polynomial-time adversaries $A$ $P r [(x, w) \in / R \lor ⟨ P (pp, x, w), V (pp, x)⟩ accepts pp \leftarrow Setup (1^{λ}) (x, w) \leftarrow A (pp)] = 1$

Soundness

Complicating our analysis is that although our protocol is described as an interactive argument, it is realized in practice as a non-interactive argument through the use of the Fiat-Shamir transformation.

Public coin. We say that an interactive argument is public coin when all of the messages sent by the verifier are each sampled with fresh randomness.

Fiat-Shamir transformation. In this transformation an interactive, public coin argument can be made non-interactive in the random oracle model by replacing the verifier algorithm with a cryptographically strong hash function that produces sufficiently random looking output.

This transformation means that in the concrete protocol a cheating prover can easily "rewind" the verifier by forking the transcript and sending new messages to the verifier. Studying the concrete security of our construction after applying this transformation is important. Fortunately, we are able to follow a framework of analysis by Ghoshal and Tessaro ([GT20]) that has been applied to constructions similar to ours.

We will study our protocol through the notion of state-restoration soundness. In this model the (cheating) prover is allowed to rewind the verifier to any previous state it was in. The prover wins if they are able to produce an accepting transcript.

State-Restoration Soundness. Let $IP$ be an interactive argument with $r >= r (λ)$ verifier challenges and let the $i$ th challenge be sampled from $Ch_{i}$ . The advantage metric $Adv_{IP}^{SRS} (P, λ) = Pr [SRS_{P}^{IP} (λ)]$ of a state restoration prover $P$ is defined with respect to the following game. $\underline{Game SRS_{IP}^{P} (λ) :} win \leftarrow false; tr \leftarrow ϵ pp \leftarrow IP . Setup (1^{λ}) (x, st_{P}) \leftarrow P_{λ} (pp) Run P_{λ}^{O_{SRS}} (st_{P}) Return win \underline{Oracle O_{SRS} (τ = (a_{1}, c_{1}, ..., a_{i - 1}, c_{i - 1}), a_{i}) :} If τ \in tr then If i \leq r then c_{i} \leftarrow Ch_{i}; tr \leftarrow tr ∣∣ (τ, a_{i}, c_{i}); Return c_{i} Else if i = r + 1 then d \leftarrow IP . V (pp, x, (τ, a_{i})); tr \leftarrow (τ, a_{i}) If d = 1 then win \leftarrow true Return d Return ⊥$

As shown in [GT20] (Theorem 1) state restoration soundness is tightly related to soundness after applying the Fiat-Shamir transformation.

Knowledge Soundness

We will show that our protocol satisfies a strengthened notion of knowledge soundness known as witness extended emulation. Informally, this notion states that for any successful prover algorithm there exists an efficient emulator that can extract a witness from it by rewinding it and supplying it with fresh randomness.

However, we must slightly adjust our definition of witness extended emulation to account for the fact that our provers are state restoration provers and can rewind the verifier. Further, to avoid the need for rewinding the state restoration prover during witness extraction we study our protocol in the algebraic group model.

Algebraic Group Model (AGM). An adversary $P_{alg}$ is said to be algebraic if whenever it outputs a group element $X$ it also outputs a representation $x \in F^{n}$ such that $⟨ x, G ⟩ = X$ where $G \in G^{n}$ is the vector of group elements that $P_{alg}$ has seen so far. Notationally, we write ${X}$ to describe a group element $X$ enhanced with this representation. We also write ${X}_{i}^{G}$ to identify the component of the representation of $X$ that corresponds with $G_{i}$ . In other words, $X = i = 0 \sum n - 1 [{X}_{i}^{G}] G_{i}$

The algebraic group model allows us to perform so-called "online" extraction for some protocols: the extractor can obtain the witness from the representations themselves for a single (accepting) transcript.

State Restoration Witness Extended Emulation Let $IP$ be an interactive argument for relation $R$ with $r = r (λ)$ challenges. We define for all non-uniform algebraic provers $P_{alg}$ , extractors $E$ , and computationally unbounded distinguishers $D$ the advantage metric $Adv_{IP, R}^{sr-wee} (P_{alg}, D, E, λ) = Pr [WEE-real_{IP, R}^{P, D} (λ)] - Pr [WEE-ideal_{IP, R}^{E, P, D} (λ)]$ is defined with the respect to the following games. $\underline{Game WEE-real_{IP, R}^{P_{alg}, D} (λ) :} tr \leftarrow ϵ pp \leftarrow IP . Setup (1^{λ}) (x, st_{P}) \leftarrow P_{alg} (pp) Run P_{alg}^{O_{real}} (st_{P}) b \leftarrow D (tr) Return b = 1 \underline{Game WEE-ideal_{IP, R}^{E, P_{alg}, D} (λ) :} tr \leftarrow ϵ pp \leftarrow IP . Setup (1^{λ}) (x, st_{P}) \leftarrow P_{alg} (pp) st_{E} \leftarrow (1^{λ}, pp, x) Run P_{alg}^{O_{ideal}} (st_{P}) w \leftarrow E (st_{E}, ⊥) b \leftarrow D (tr) Return (b = 1) \land (Acc (tr) ⟹ (x, w) \in R) \underline{Oracle O_{real} (τ = (a_{1}, c_{1}, ..., a_{i - 1}, c_{i - 1}), a_{i}) :} If τ \in tr then If i \leq r then c_{i} \leftarrow Ch_{i}; tr \leftarrow tr ∣∣ (τ, a_{i}, c_{i}); Return c_{i} Else if i = r + 1 then d \leftarrow IP . V (pp, x, (τ, a_{i})); tr \leftarrow (τ, a_{i}) If d = 1 then win \leftarrow true Return d Return ⊥ \underline{Oracle O_{ideal} (τ, a) :} If τ \in tr then (r, st_{E}) \leftarrow E (st_{E}, [(τ, a)]) tr \leftarrow tr ∣∣ (τ, a, r) Return r Return ⊥$

Zero Knowledge

We say that an argument of knowledge is zero knowledge if the verifier also does not learn anything from their interaction besides that which can be learned from the existence of a valid $w$ . More formally,

Perfect Special Honest-Verifier Zero Knowledge. A public coin interactive argument $(Setup, P, V)$ has perfect special honest-verifier zero knowledge (PSHVZK) if for all polynomial-time decidable relations $R$ and for all $(x, w) \in R$ and for all non-uniform polynomial-time adversaries $A_{1}, A_{2}$ there exists a probabilistic polynomial-time simulator $S$ such that $= P r A_{1} (σ, x, tr) = 1 pp \leftarrow Setup (1^{λ}); (x, w, ρ) \leftarrow A_{2} (pp); t r \leftarrow ⟨ P (pp, x, w), V (pp, x, ρ)⟩ P r A_{1} (σ, x, tr) = 1 pp \leftarrow Setup (1^{λ}); (x, w, ρ) \leftarrow A_{2} (pp); t r \leftarrow S (pp, x, ρ)$ where $ρ$ is the internal randomness of the verifier.

In this (common) definition of zero-knowledge the verifier is expected to act "honestly" and send challenges that correspond only with their internal randomness; they cannot adaptively respond to the prover based on the prover's messages. We use a strengthened form of this definition that forces the simulator to output a transcript with the same (adversarially provided) challenges that the verifier algorithm sends to the prover.

Protocol

Let $ω \in F$ be a $n = 2^{k}$ primitive root of unity forming the domain $D = (ω^{0}, ω^{1}, ..., ω^{n - 1})$ with $t (X) = X^{n} - 1$ the vanishing polynomial over this domain. Let $n_{g}, n_{a}, n_{e}$ be positive integers with $n_{a}, n_{e} < n$ and $n_{g} \geq 4$ . We present an interactive argument $Halo = (Setup, P, V)$ for the relation $R = ⎩ ⎨ ⎧ ((g (X, C_{0}, ..., C_{n_{a} - 1}, a_{0} (X), ..., a_{n_{a} - 1} (X, C_{0}, ..., C_{n_{a} - 1}, a_{0} (X), ..., a_{n_{a} - 2} (X)))); (a_{0} (X), a_{1} (X, C_{0}, a_{0} (X)), ..., a_{n_{a} - 1} (X, C_{0}, ..., C_{n_{a} - 1}, a_{0} (X), ..., a_{n_{a} - 2} (X)))) : g (ω^{i}, \dots) = 0 \forall i \in [0, 2^{k}) ⎭ ⎬ ⎫$ where $a_{0}, a_{1}, ..., a_{n_{a} - 1}$ are (multivariate) polynomials with degree $n - 1$ in $X$ and $g$ has degree $n_{g} (n - 1)$ at most in any indeterminates $X, C_{0}, C_{1}, ...$ .

$Setup (λ)$ returns $pp = (G, F, G \in G^{n}, U, W \in G)$ .

For all $i \in [0, n_{a})$ :

Let $p_{i}$ be the exhaustive set of integers $j$ (modulo $n$ ) such that $a_{i} (ω^{j} X, \dots)$ appears as a term in $g (X, \dots)$ .
Let $q$ be a list of distinct sets of integers containing $p_{i}$ and the set $q_{0} = {0}$ .
Let $σ (i) = q_{j}$ when $q_{j} = p_{i}$ .

Let $n_{q} \leq n_{a}$ denote the size of $q$ , and let $n_{e}$ denote the size of every $p_{i}$ without loss of generality.

In the following protocol, we take it for granted that each polynomial $a_{i} (X, \dots)$ is defined such that $n_{e} + 1$ blinding factors are freshly sampled by the prover and are each present as an evaluation of $a_{i} (X, \dots)$ over the domain $D$ . In all of the following, the verifier's challenges cannot be zero or an element in $D$ , and some additional limitations are placed on specific challenges as well.

$P$ and $V$ proceed in the following $n_{a}$ rounds of interaction, where in round $j$ (starting at $0$ )

$P$ sets $a_{j}^{'} (X) = a_{j} (X, c_{0}, c_{1}, ..., c_{j - 1}, a_{0} (X, \dots), ..., a_{j - 1} (X, \dots, c_{j - 1}))$
$P$ sends a hiding commitment $A_{j} = ⟨ a^{'}, G ⟩ + [\cdot] W$ where $a^{'}$ are the coefficients of the univariate polynomial $a_{j}^{'} (X)$ and $\cdot$ is some random, independently sampled blinding factor elided for exposition. (This elision notation is used throughout this protocol description to simplify exposition.)
$V$ responds with a challenge $c_{j}$ .

$P$ sets $g^{'} (X) = g (X, c_{0}, c_{1}, ..., c_{n_{a} - 1}, \dots)$ .
$P$ sends a commitment $R = ⟨ r, G ⟩ + [\cdot] W$ where $r \in F^{n}$ are the coefficients of a randomly sampled univariate polynomial $r (X)$ of degree $n - 1$ .
$P$ computes univariate polynomial $h (X) = \frac{g ^{'} ( X )}{t ( X )}$ of degree $n_{g} (n - 1) - n$ .
$P$ computes at most $n - 1$ degree polynomials $h_{0} (X), h_{1} (X), ..., h_{n_{g} - 2} (X)$ such that $h (X) = i = 0 \sum n_{g} - 2 X^{ni} h_{i} (X)$ .
$P$ sends commitments $H_{i} = ⟨ h_{i}, G ⟩ + [\cdot] W$ for all $i$ where $h_{i}$ denotes the vector of coefficients for $h_{i} (X)$ .
$V$ responds with challenge $x$ and computes $H^{'} = i = 0 \sum n_{g} - 2 [x^{ni}] H_{i}$ .
$P$ sets $h^{'} (X) = i = 0 \sum n_{g} - 2 x^{ni} h_{i} (X)$ .
$P$ sends $r = r (x)$ and for all $i \in [0, n_{a})$ sends $a_{i}$ such that $(a_{i})_{j} = a_{i}^{'} (ω^{(p_{i})_{j}} x)$ for all $j \in [0, n_{e} - 1]$ .
For all $i \in [0, n_{a})$ $P$ and $V$ set $s_{i} (X)$ to be the lowest degree univariate polynomial defined such that $s_{i} (ω^{(p_{i})_{j}} x) = (a_{i})_{j}$ for all $j \in [0, n_{e} - 1)$ .
$V$ responds with challenges $x_{1}, x_{2}$ and initializes $Q_{0}, Q_{1}, ..., Q_{n_{q} - 1} = O$ .

Starting at $i = 0$ and ending at $n_{a} - 1$ $V$ sets $Q_{σ (i)} := [x_{1}] Q_{σ (i)} + A_{i}$ .
$V$ finally sets $Q_{0} := [x_{1}^{2}] Q_{0} + [x_{1}] H^{'} + R$ .

$P$ initializes $q_{0} (X), q_{1} (X), ..., q_{n_{q} - 1} (X) = 0$ .

Starting at $i = 0$ and ending at $n_{a} - 1$ $P$ sets $q_{σ (i)} := x_{1} q_{σ (i)} + a^{'} (X)$ .
$P$ finally sets $q_{0} (X) := x_{1}^{2} q_{0} (X) + x_{1} h^{'} (X) + r (X)$ .

$P$ and $V$ initialize $r_{0} (X), r_{1} (X), ..., r_{n_{q} - 1} (X) = 0$ .

Starting at $i = 0$ and ending at $n_{a} - 1$ $P$ and $V$ set $r_{σ (i)} (X) := x_{1} r_{σ (i)} (X) + s_{i} (X)$ .
Finally $P$ and $V$ set $r_{0} := x_{1}^{2} r_{0} + x_{1} h + r$ and where $h$ is computed by $V$ as $\frac{g ^{'} ( x )}{t ( x )}$ using the values $r, a$ provided by $P$ .

$P$ sends $Q^{'} = ⟨ q^{'}, G ⟩ + [\cdot] W$ where $q^{'}$ defines the coefficients of the polynomial $q^{'} (X) = i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{q _{i} ( X ) - r _{i} ( X )}{j = 0 \prod n _{e} - 1 ( X - ω ^{(q_{i})_{j}} x )}$
$V$ responds with challenge $x_{3}$ .
$P$ sends $u \in F^{n_{q}}$ such that $u_{i} = q_{i} (x_{3})$ for all $i \in [0, n_{q})$ .
$V$ responds with challenge $x_{4}$ .
$V$ sets $P = Q^{'} + x_{4} i = 0 \sum n_{q} - 1 [x_{4}^{i}] Q_{i}$ and $v =$ $i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{u _{i} - r _{i} ( x _{3} )}{j = 0 \prod n _{e} - 1 ( x _{3} - ω ^{(q_{i})_{j}} x )} + x_{4} i = 0 \sum n_{q} - 1 x_{4} u_{i}$
$P$ sets $p (X) = q^{'} (X) + [x_{4}] i = 0 \sum n_{q} - 1 x_{4}^{i} q_{i} (X)$ .
$P$ samples a random polynomial $s (X)$ of degree $n - 1$ with a root at $x_{3}$ and sends a commitment $S = ⟨ s, G ⟩ + [\cdot] W$ where $s$ defines the coefficients of $s (X)$ .
$V$ responds with challenges $ξ, z$ .
$V$ sets $P^{'} = P - [v] G_{0} + [ξ] S$ .
$P$ sets $p^{'} (X) = p (X) - p (x_{3}) + ξ s (X)$ (where $p (x_{3})$ should correspond with the verifier's computed value $v$ ).
Initialize $p^{'}$ as the coefficients of $p^{'} (X)$ and $G^{'} = G$ and $b = (x_{3}^{0}, x_{3}^{1}, ..., x_{3}^{n - 1})$ . $P$ and $V$ will interact in the following $k$ rounds, where in the $j$ th round starting in round $j = 0$ and ending in round $j = k - 1$ :

$P$ sends $L_{j} = ⟨ p^{'}_{hi}, G^{'}_{lo} ⟩ + [z ⟨ p^{'}_{hi}, b_{lo} ⟩] U + [\cdot] W$ and $R_{j} = ⟨ p^{'}_{lo}, G^{'}_{hi} ⟩ + [z ⟨ p^{'}_{lo}, b_{hi} ⟩] U + [\cdot] W$ .
$V$ responds with challenge $u_{j}$ chosen such that $1 + u_{k - 1 - j} x_{3}^{2^{j}}$ is nonzero.
$P$ and $V$ set $G^{'} := G^{'}_{lo} + u_{j} G^{'}_{hi}$ and $b := b_{lo} + u_{j} b_{hi}$ .
$P$ sets $p^{'} := p^{'}_{lo} + u_{j}^{- 1} p^{'}_{hi}$ .

$P$ sends $c = p^{'}_{0}$ and synthetic blinding factor $f$ computed from the elided blinding factors.
$V$ accepts only if $\sum_{j = 0}^{k - 1} [u_{j}^{- 1}] L_{j} + P^{'} + \sum_{j = 0}^{k - 1} [u_{j}] R_{j} = [c] G^{'}_{0} + [c b_{0} z] U + [f] W$ .

Zero-knowledge and Completeness

We claim that this protocol is perfectly complete. This can be verified by inspection of the protocol; given a valid witness $a_{i} (X, \dots) \forall i$ the prover succeeds in convincing the verifier with probability $1$ .

We claim that this protocol is perfect special honest-verifier zero knowledge. We do this by showing that a simulator $S$ exists which can produce an accepting transcript that is equally distributed with a valid prover's interaction with a verifier with the same public coins. The simulator will act as an honest prover would, with the following exceptions:

In step $1$ of the protocol $S$ chooses random degree $n - 1$ polynomials (in $X$ ) $a_{i} (X, \dots) \forall i$ .
In step $5$ of the protocol $S$ chooses a random $n - 1$ degree polynomials $h_{0} (X), h_{1} (X), ..., h_{n_{g} - 2} (X)$ .
In step $14$ of the protocol $S$ chooses a random $n - 1$ degree polynomial $q^{'} (X)$ .
In step $20$ of the protocol $S$ uses its foreknowledge of the verifier's choice of $ξ$ to produce a degree $n - 1$ polynomial $s (X)$ conditioned only such that $p (X) - v + ξ s (X)$ has a root at $x_{3}$ .

First, let us consider why this simulator always succeeds in producing an accepting transcript. $S$ lacks a valid witness and simply commits to random polynomials whenever knowledge of a valid witness would be required by the honest prover. The verifier places no conditions on the scalar values in the transcript. $S$ must only guarantee that the check in step $26$ of the protocol succeeds. It does so by using its knowledge of the challenge $ξ$ to produce a polynomial which interferes with $p^{'} (X)$ to ensure it has a root at $x_{3}$ . The transcript will thus always be accepting due to perfect completeness.

In order to see why $S$ produces transcripts distributed identically to the honest prover, we will look at each piece of the transcript and compare the distributions. First, note that $S$ (just as the honest prover) uses a freshly random blinding factor for every group element in the transcript, and so we need only consider the scalars in the transcript. $S$ acts just as the prover does except in the mentioned cases so we will analyze each case:

$S$ and an honest prover reveal $n_{e}$ openings of each polynomial $a_{i} (X, \dots)$ , and at most one additional opening of each $a_{i} (X, \dots)$ in step $16$ . However, the honest prover blinds their polynomials $a_{i} (X, \dots)$ (in $X$ ) with $n_{e} + 1$ random evaluations over the domain $D$ . Thus, the openings of $a_{i} (X, \dots)$ at the challenge $x$ (which is prohibited from being $0$ or in the domain $D$ by the protocol) are distributed identically between $S$ and an honest prover.
Neither $S$ nor the honest prover reveal $h (x)$ as it is computed by the verifier. However, the honest prover may reveal $h^{'} (x_{3})$ --- which has a non-trivial relationship with $h (X)$ --- were it not for the fact that the honest prover also commits to a random degree $n - 1$ polynomial $r (X)$ in step $3$ , producing a commitment $R$ and ensuring that in step $12$ when the prover sets $q_{0} (X) := x_{1}^{2} q_{0} (X) + x_{1} h^{'} (X) + r (X)$ the distribution of $q_{0} (x_{3})$ is uniformly random. Thus, $h^{'} (x_{3})$ is never revealed by the honest prover nor by $S$ .
The expected value of $q^{'} (x_{3})$ is computed by the verifier (in step $18$ ) and so the simulator's actual choice of $q^{'} (X)$ is irrelevant.
$p (X) - v + ξ s (X)$ is conditioned on having a root at $x_{3}$ , but otherwise no conditions are placed on $s (X)$ and so the distribution of the degree $n - 1$ polynomial $p (X) - v + ξ s (X)$ is uniformly random whether or not $s (X)$ has a root at $x_{3}$ . Thus, the distribution of $c$ produced in step $25$ is identical between $S$ and an honest prover. The synthetic blinding factor $f$ also revealed in step $25$ is a trivial function of the prover's other blinding factors and so is distributed identically between $S$ and an honest prover.

Notes:

In an earlier version of our protocol, the prover would open each individual commitment $H_{0}, H_{1}, ...$ at $x$ as part of the multipoint opening argument, and the verifier would confirm that a linear combination of these openings (with powers of $x^{n}$ ) agreed to the expected value of $h (x)$ . This was done because it's more efficient in recursive proofs. However, it was unclear to us what the expected distribution of the openings of these commitments $H_{0}, H_{1}, ...$ was and so proving that the argument was zero-knowledge is difficult. Instead, we changed the argument so that the verifier computes a linear combination of the commitments and that linear combination is opened at $x$ . This avoided leaking $h_{i} (x)$ .
As mentioned, in step $3$ the prover commits to a random polynomial as a way of ensuring that $h^{'} (x_{3})$ is not revealed in the multiopen argument. This is done because it's unclear what the distribution of $h^{'} (x_{3})$ would be.
Technically it's also possible for us to prove zero-knowledge with a simulator that uses its foreknowledge of the challenge $x$ to commit to an $h (X)$ which agrees at $x$ to the value it will be expected to. This would obviate the need for the random polynomial $s (X)$ in the protocol. This may make the analysis of zero-knowledge for the remainder of the protocol a little bit tricky though, so we didn't go this route.
Group element blinding factors are technically not necessary after step $23$ in which the polynomial is completely randomized. However, it's simpler in practice for us to ensure that every group element in the protocol is randomly blinded to make edge cases involving the point at infinity harder.
It is crucial that the verifier cannot challenge the prover to open polynomials at points in $D$ as otherwise the transcript of an honest prover will be forced to contain what could be portions of the prover's witness. We therefore restrict the space of challenges to include all elements of the field except $D$ and, for simplicity, we also prohibit the challenge of $0$ .

Witness-extended Emulation

Let $Halo = Halo [G]$ be the interactive argument described above for relation $R$ and some group $G$ with scalar field $F$ . We can always construct an extractor $E$ such that for any non-uniform algebraic prover $P_{alg}$ making at most $q$ queries to its oracle, there exists a non-uniform adversary $H$ with the property that for any computationally unbounded distinguisher $D$

$Adv_{Halo, R}^{sr-wee} (P_{alg}, D, E, λ) \leq q ϵ + Adv_{G, n + 2}^{dl-rel} (H, λ)$

where $ϵ \leq \frac{n _{g} \cdot ( n - 1 )}{∣ Ch ∣}$ .

Proof. We will prove this by invoking Theorem 1 of [GT20]. First, we note that the challenge space for all rounds is the same, i.e. $\forall i Ch = Ch_{i}$ . Theorem 1 requires us to define:

$BadCh (tr^{'}) \in Ch$ for all partial transcripts $tr^{'} = (pp, x, [a_{0}], c_{0}, \dots, [a_{i}])$ such that $∣ BadCh (tr^{'}) ∣/∣ Ch ∣ \leq ϵ$ .
an extractor function $e$ that takes as input an accepting extended transcript $tr$ and either returns a valid witness or fails.
a function $p_{fail} (Halo, P_{alg}, e, R)$ returning a probability.

We say that an accepting extended transcript $tr$ contains "bad challenges" if and only if there exists a partial extended transcript $tr^{'}$ , a challenge $c_{i} \in BadCh (tr^{'})$ , and some sequence of prover messages and challenges $([a_{i + 1}], c_{i + 1}, \dots [a_{j}])$ such that $tr = tr^{'} ∣∣ (c_{i}, [a_{i + 1}], c_{i + 1}, \dots [a_{j}])$ .

Theorem 1 requires that $e$ , when given an accepting extended transcript $tr$ that does not contain "bad challenges", returns a valid witness for that transcript except with probability bounded above by $p_{fail} (Halo, P_{alg}, e, R)$ .

Our strategy is as follows: we will define $e$ , establish an upper bound on $p_{fail}$ with respect to an adversary $H$ that plays the $dl-rel_{G, n + 2}$ game, substitute these into Theorem 1, and then walk through the protocol to determine the upper bound of the size of $BadCh (tr^{'})$ . The adversary $H$ plays the $dl-rel_{G, n + 2}$ game as follows: given the inputs $U, W \in G, G \in G^{n}$ , the adversary $H$ simulates the game $sr-wee_{Halo, R}$ to $P_{alg}$ using the inputs from the $dl-rel_{G, n + 2}$ game as public parameters. If $P_{alg}$ manages to produce an accepting extended transcript $tr$ , $H$ invokes a function $h$ on $tr$ and returns its output. We shall define $h$ in such a way that for an accepting extended transcript $tr$ that does not contain "bad challenges", $e (tr)$ always returns a valid witness whenever $h (tr)$ does not return a non-trivial discrete log relation. This means that the probability $p_{fail} (Halo, P_{alg}, e, R)$ is no greater than $Adv_{G, n + 2}^{dl-rel} (H, λ)$ , establishing our claim.

Helpful substitutions

We will perform some substitutions to aid in exposition. First, let us define the polynomial

$κ (X) = j = 0 \prod k - 1 (1 + u_{k - 1 - j} X^{2^{j}})$

so that we can write $b_{0} = κ (x_{3})$ . The coefficient vector $s$ of $κ (X)$ is defined such that

$s_{i} = j = 0 \prod k - 1 u_{k - 1 - j}^{f (i, j)}$

where $f (i, j)$ returns $1$ when the $j$ th bit of $i$ is set, and $0$ otherwise. We can also write $G^{'}_{0} = ⟨ s, G ⟩$ .

Description of function $h$

Recall that an accepting transcript $tr$ is such that

$i = 0 \sum k - 1 [u_{j}^{- 1}] {L_{j}} + {P^{'}} + i = 0 \sum k - 1 [u_{j}] {R_{j}} = [c] G^{'}_{0} + [cz b_{0}] U + [f] W$

By inspection of the representations of group elements with respect to $G, U, W$ (recall that $P_{alg}$ is algebraic and so $H$ has them), we obtain the $n$ equalities

$i = 0 \sum k - 1 u_{j}^{- 1} {L_{j}}_{i}^{G} + {P^{'}}_{i}^{G} + i = 0 \sum k - 1 u_{j} {R_{j}}_{i}^{G} = c s_{i} \forall i \in [0, n)$

and the equalities

$i = 0 \sum k - 1 u_{j}^{- 1} {L_{j}}^{U} + {P^{'}}^{U} + i = 0 \sum k - 1 u_{j} {R_{j}}^{U} = cz κ (x_{3})$

$i = 0 \sum k - 1 u_{j}^{- 1} {L_{j}}^{W} + {P^{'}}^{W} + i = 0 \sum k - 1 u_{j} {R_{j}}^{W} = f$

We define the linear-time function $h$ that returns the representation of

$i = 0 \sum n - 1 + + [\sum_{i = 0}^{k - 1} u_{j}^{- 1} {L_{j}}_{i}^{G} + {P^{'}}_{i}^{G} + \sum_{i = 0}^{k - 1} u_{j} {R_{j}}_{i}^{G} - c s_{i}] [\sum_{i = 0}^{k - 1} u_{j}^{- 1} {L_{j}}^{U} + {P^{'}}^{U} + \sum_{i = 0}^{k - 1} u_{j} {R_{j}}^{U} - cz κ (x_{3})] [\sum_{i = 0}^{k - 1} u_{j}^{- 1} {L_{j}}^{W} + {P^{'}}^{W} + \sum_{i = 0}^{k - 1} u_{j} {R_{j}}^{W} - f] G_{i} U W$

which is always a discrete log relation. If any of the equalities above are not satisfied, then this discrete log relation is non-trivial. This is the function invoked by $H$ .

The extractor function $e$

The extractor function $e$ simply returns $a_{i} (X)$ from the representation ${A_{i}}$ for $i \in [0, n_{a})$ . Due to the restrictions we will place on the space of bad challenges in each round, we are guaranteed to obtain polynomials such that $g (X, C_{0}, C_{1}, \dots, a_{0} (X), a_{1} (X), \dots)$ vanishes over $D$ whenever the discrete log relation returned by the adversary's function $h$ is trivial. This trivially gives us that the extractor function $e$ succeeds with probability bounded above by $p_{fail}$ as required.

Defining $BadCh (tr^{'})$

Recall from before that the following $n$ equalities hold:

$i = 0 \sum k - 1 u_{j}^{- 1} {L_{j}}_{i}^{G} + {P^{'}}_{i}^{G} + i = 0 \sum k - 1 u_{j} {R_{j}}_{i}^{G} = c s_{i} \forall i \in [0, n)$

as well as the equality

$i = 0 \sum k - 1 u_{j}^{- 1} {L_{j}}^{U} + {P^{'}}^{U} + i = 0 \sum k - 1 u_{j} {R_{j}}^{U} = cz κ (x_{3})$

For convenience let us introduce the following notation

$M_{i}^{G} (m) M_{U} (m) = \sum_{i = 0}^{m - 1} u_{j}^{- 1} {L_{j}}_{i}^{G} + {P^{'}}_{i}^{G} + \sum_{i = 0}^{m - 1} u_{j} {R_{j}}_{i}^{G} = \sum_{i = 0}^{m - 1} u_{j}^{- 1} {L_{j}}^{U} + {P^{'}}^{U} + \sum_{i = 0}^{m - 1} u_{j} {R_{j}}^{U}$

so that we can rewrite the above (after expanding for $κ (x_{3})$ ) as

$M_{i}^{G} (k) = c s_{i} \forall i \in [0, n)$

$M_{U} (k) = cz j = 0 \prod k - 1 (1 + u_{k - 1 - j} x_{3}^{2^{j}})$

We can combine these equations by multiplying both sides of each instance of the first equation by $s_{i}^{- 1}$ (because $s_{i}$ is never zero) and substituting for $c$ in the second equation, yielding the following $n$ equalities:

$M_{U} (k) = M_{i}^{G} (k) \cdot s_{i}^{- 1} z j = 0 \prod k - 1 (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, n)$

Lemma 1. If $M_{U} (k) = M_{i}^{G} (k) \cdot s_{i}^{- 1} z \prod_{j = 0}^{k - 1} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, n)$ then it follows that ${P^{'}}^{U} = z i = 0 \sum 2^{k} - 1 x_{3}^{i} {P^{'}}_{i}^{G}$ for all transcripts that do not contain bad challenges.

Proof. It will be useful to introduce yet another abstraction defined starting with $Z_{k} (m, i) = M_{i}^{G} (m)$ and then recursively defined for all integers $r$ such that $0 < r \leq k$ $Z_{k - r} (m, i) = Z_{k - r + 1} (m, i) + x_{3}^{2^{k - r}} Z_{k - r + 1} (m, i + 2^{k - r})$ This allows us to rewrite our above equalities as $M_{U} (k) = Z_{k} (k, i) \cdot s_{i}^{- 1} z j = 0 \prod k - 1 (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, n)$

We will now show that for all integers $r$ such that $0 < r \leq k$ that whenever the following holds for $r$ $M_{U} (r) = Z_{r} (r, i) \cdot s_{i}^{- 1} z j = 0 \prod r - 1 (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, 2^{r})$ that the same also holds for $M_{U} (r - 1) = Z_{r - 1} (r - 1, i) \cdot s_{i}^{- 1} z j = 0 \prod r - 2 (1 + u_{k - 2 - j} x_{3}^{2^{j}}) \forall i \in [0, 2^{r - 1})$

For all integers $r$ such that $0 < r \leq k$ we have that $s_{i + 2^{r - 1}} = u_{r - 1} s_{i} \forall i \in [0, 2^{r - 1})$ by the definition of $s$ . This gives us $s_{i + 2^{r - 1}}^{- 1} = s_{i}^{- 1} u_{r - 1}^{- 1} \forall i \in [0, 2^{r - 1})$ as no value in $s$ nor any challenge $u_{r}$ are zeroes. We can use this to relate one half of the equalities with the other half as so: $M_{U} (r) = Z_{r} (r, i) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 1} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) = Z_{r} (r, i + 2^{r - 1}) \cdot s_{i}^{- 1} u_{r - 1}^{- 1} z \prod_{j = 0}^{r - 1} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, 2^{r - 1})$

Notice that $Z_{r} (r, i)$ can be rewritten as $u_{r - 1}^{- 1} {L_{r - 1}}_{i}^{G} + Z_{r} (r - 1, i) + u_{r - 1} {R_{r - 1}}_{i}^{G}$ for all $i \in [0, 2^{r})$ . Thus we can rewrite the above as

$M_{U} (r) = (u_{r - 1}^{- 1} {L_{r - 1}}_{i}^{G} + Z_{r} (r - 1, i) + u_{r - 1} {R_{r - 1}}_{i}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 1} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) = (u_{r - 1}^{- 1} {L_{r - 1}}_{i + 2^{r - 1}}^{G} + Z_{r} (r - 1, i + 2^{r - 1}) + u_{r - 1} {R_{r - 1}}_{i + 2^{r - 1}}^{G}) \cdot s_{i}^{- 1} u_{r - 1}^{- 1} z \prod_{j = 0}^{r - 1} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, 2^{r - 1})$

Now let us rewrite these equalities substituting $u_{r - 1}$ with formal indeterminate $X$ .

$X^{- 1} {L_{r - 1}}^{U} + M_{U} (r - 1) + X {R_{r - 1}}^{U} = (X^{- 1} {L_{r - 1}}_{i}^{G} + Z_{r} (r - 1, i) + X {R_{r - 1}}_{i}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (1 + x_{3}^{2^{r - 1}} X) = (X^{- 1} {L_{r - 1}}_{i + 2^{r - 1}}^{G} + Z_{r} (r - 1, i + 2^{r - 1}) + X {R_{r - 1}}_{i + 2^{r - 1}}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (X^{- 1} + x_{3}^{2^{r - 1}}) \forall i \in [0, 2^{r - 1})$

Now let us rescale everything by $X^{2}$ to remove negative exponents.

$X {L_{r - 1}}^{U} + X^{2} M_{U} (r - 1) + X^{3} {R_{r - 1}}^{U} = (X^{- 1} {L_{r - 1}}_{i}^{G} + Z_{r} (r - 1, i) + X {R_{r - 1}}_{i}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (X^{2} + x_{3}^{2^{r - 1}} X^{3}) = (X^{- 1} {L_{r - 1}}_{i + 2^{r - 1}}^{G} + Z_{r} (r - 1, i + 2^{r - 1}) + X {R_{r - 1}}_{i + 2^{r - 1}}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (X + x_{3}^{2^{r - 1}} X^{2}) \forall i \in [0, 2^{r - 1})$

This gives us $2^{r - 1}$ triples of maximal degree- $4$ polynomials in $X$ that agree at $u_{r - 1}$ despite having coefficients determined prior to the choice of $u_{r - 1}$ . The probability that two of these polynomials would agree at $u_{r - 1}$ and yet be distinct would be $\frac{4}{∣ Ch ∣}$ by the Schwartz-Zippel lemma and so by the union bound the probability that the three of these polynomials agree and yet any of them is distinct from another is $\frac{8}{∣ Ch ∣}$ . By the union bound again the probability that any of the $2^{r - 1}$ triples have multiple distinct polynomials is $\frac{2 ^{r - 1} \cdot 8}{∣ Ch ∣}$ . By restricting the challenge space for $u_{r - 1}$ accordingly we obtain $∣ BadCh (tr^{'} ∣_{u_{r}}) ∣/∣ Ch ∣ \leq \frac{2 ^{r - 1} \cdot 8}{∣ Ch ∣}$ for integers $0 < r \leq k$ and thus $∣ BadCh (tr^{'} ∣_{u_{k}}) ∣/∣ Ch ∣ \leq \frac{4 n}{∣ Ch ∣} \leq ϵ$ .

We can now conclude an equality of polynomials, and thus of coefficients. Consider the coefficients of the constant terms first, which gives us the $2^{r - 1}$ equalities $0 = 0 = s_{i}^{- 1} z (j = 0 \prod r - 2 (1 + u_{k - 1 - j} x_{3}^{2^{j}})) \cdot {L_{r - 1}}_{i + 2^{r - 1}}^{G} \forall i \in [0, 2^{r - 1})$

No value of $s$ is zero, $z$ is never chosen to be $0$ and each $u_{j}$ is chosen so that $1 + u_{k - 1 - j} x_{3}^{2^{j}}$ is nonzero, so we can then conclude $0 = {L_{r - 1}}_{i + 2^{r - 1}}^{G} \forall i \in [0, 2^{r - 1})$

An identical process can be followed with respect to the coefficients of the $X^{4}$ term in the equalities to establish $0 = {R_{r - 1}}_{i}^{G} \forall i \in [0, 2^{r - 1})$ contingent on $x_{3}$ being nonzero, which it always is. Substituting these in our equalities yields us something simpler

$X {L_{r - 1}}^{U} + X^{2} M_{U} (r - 1) + X^{3} {R_{r - 1}}^{U} = (X^{- 1} {L_{r - 1}}_{i}^{G} + Z_{r} (r - 1, i)) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (X^{2} + x_{3}^{2^{r - 1}} X^{3}) = (Z_{r} (r - 1, i + 2^{r - 1}) + X {R_{r - 1}}_{i + 2^{r - 1}}^{G}) \cdot s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) (X + x_{3}^{2^{r - 1}} X^{2}) \forall i \in [0, 2^{r - 1})$

Now we will consider the coefficients in $X$ , which yield the equalities

${L_{r - 1}}^{U} = s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \cdot {L_{r - 1}}_{i}^{G} = s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \cdot Z_{r} (r - 1, i + 2^{r - 1}) \forall i \in [0, 2^{r - 1})$

which for similar reasoning as before yields the equalities ${L_{r - 1}}_{i}^{G} = Z_{r} (r - 1, i + 2^{r - 1}) \forall i \in [0, 2^{r - 1})$

Finally we will consider the coefficients in $X^{2}$ which yield the equalities

$M_{U} (r - 1) = s_{i}^{- 1} z \prod_{j = 0}^{r - 2} (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \cdot (Z_{r} (r - 1, i) + {L_{r - 1}}_{i}^{G} x_{3}^{2^{r - 1}}) \forall i \in [0, 2^{r - 1})$

which by substitution gives us $\forall i \in [0, 2^{r - 1})$ $M_{U} (r - 1) = s_{i}^{- 1} z j = 0 \prod r - 2 (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \cdot (Z_{r} (r - 1, i) + Z_{r} (r - 1, i + 2^{r - 1}) x_{3}^{2^{r - 1}})$

Notice that by the definition of $Z_{r - 1} (m, i)$ we can rewrite this as

$M_{U} (r - 1) = Z_{r - 1} (r - 1, i) \cdot s_{i}^{- 1} z j = 0 \prod r - 2 (1 + u_{k - 1 - j} x_{3}^{2^{j}}) \forall i \in [0, 2^{r - 1})$

which is precisely in the form we set out to demonstrate.

We now proceed by induction from the case $r = k$ (which we know holds) to reach $r = 0$ , which gives us $M_{U} (0) = Z_{0} (0, 0) \cdot s_{0}^{- 1} z$

and because $M_{U} (0) = {P^{'}}^{U}$ and $Z_{0} (0, 0) = \sum_{i = 0}^{2^{k} - 1} x_{3}^{i} {P^{'}}_{i}^{G}$ , we obtain ${P^{'}}^{U} = z \sum_{i = 0}^{2^{k} - 1} x_{3}^{i} {P^{'}}_{i}^{G}$ , which completes the proof.

Having established that ${P^{'}}^{U} = z \sum_{i = 0}^{2^{k} - 1} x_{3}^{i} {P^{'}}_{i}^{G}$ , and given that $x_{3}$ and ${P^{'}}_{i}^{G}$ are fixed in advance of the choice of $z$ , we have that at most one value of $z \in Ch$ (which is nonzero) exists such that ${P^{'}}^{U} = z \sum_{i = 0}^{2^{k} - 1} x_{3}^{i} {P^{'}}_{i}^{G}$ and yet ${P^{'}}^{U} \neq = 0$ . By restricting $∣ BadCh (tr^{'} ∣_{z}) ∣/∣ Ch ∣ \leq \frac{1}{∣ Ch ∣} \leq ϵ$ accordingly we obtain ${P^{'}}^{U} = 0$ and therefore that the polynomial defined by ${P^{'}}^{G}$ has a root at $x_{3}$ .

By construction $P^{'} = P - [v] G_{0} + [ξ] S$ , giving us that the polynomial defined by ${P + [ξ] S}^{G}$ evaluates to $v$ at the point $x_{3}$ . We have that $v, P, S$ are fixed prior to the choice of $ξ$ , and so either the polynomial defined by ${S}^{G}$ has a root at $x_{3}$ (which implies the polynomial defined by ${P}^{G}$ evaluates to $v$ at the point $x_{3}$ ) or else $ξ$ is the single solution in $Ch$ for which ${P + [ξ] S}^{G}$ evaluates to $v$ at the point $x_{3}$ while ${P}^{G}$ itself does not. We avoid the latter case by restricting $∣ BadCh (tr^{'} ∣_{ξ}) ∣/∣ Ch ∣ \leq \frac{1}{∣ Ch ∣} \leq ϵ$ accordingly and can thus conclude that the polynomial defined by ${P}^{G}$ evaluates to $v$ at $x_{3}$ .

The remaining work deals strictly with the representations of group elements sent previously by the prover and their relationship with $P$ as well as the challenges chosen in each round of the protocol. We will simplify things first by using $p (X)$ to represent the polynomial defined by ${P}^{G}$ , as it is the case that this $p (X)$ corresponds exactly with the like-named polynomial in the protocol itself. We will make similar substitutions for the other group elements (and their corresponding polynomials) to aid in exposition, as the remainder of this proof is mainly tedious application of the Schwartz-Zippel lemma to upper bound the bad challenge space size for each of the remaining challenges in the protocol.

Recall that $P = Q^{'} + x_{4} i = 0 \sum n_{q} - 1 [x_{4}^{i}] Q_{i}$ , and so by substitution we have $p (X) = q^{'} (X) + x_{4} i = 0 \sum n_{q} - 1 x_{4}^{i} q_{i} (X)$ . Recall also that

$v = i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{u _{i} - r _{i} ( x _{3} )}{j = 0 \prod n _{e} - 1 ( x _{3} - ω ^{(q_{i})_{j}} x )} + x_{4} i = 0 \sum n_{q} - 1 x_{4} u_{i}$

We have already established that $p (x_{3}) = v$ . Notice that the coefficients in the above expressions for $v$ and $P$ are fixed prior to the choice of $x_{4} \in Ch$ . By the Schwartz-Zippel lemma we have that only at most $n_{q} + 1$ possible choices of $x_{4}$ exist such that these expressions are satisfied and yet $q_{i} (x_{3}) \neq = u_{i}$ for any $i$ or

$q^{'} (x_{3}) \neq = i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{u _{i} - r _{i} ( x _{3} )}{j = 0 \prod n _{e} - 1 ( x _{3} - ω ^{(q_{i})_{j}} x )}$

By restricting $∣ BadCh (tr^{'} ∣_{x_{4}}) ∣/∣ Ch ∣ \leq \frac{n _{q} + 1}{∣ Ch ∣} \leq ϵ$ we can conclude that all of the aforementioned inequalities are untrue. Now we can substitute $u_{i}$ with $q_{i} (x_{3})$ for all $i$ to obtain

$q^{'} (x_{3}) = i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{q _{i} ( x _{3} ) - r _{i} ( x _{3} )}{j = 0 \prod n _{e} - 1 ( x _{3} - ω ^{(q_{i})_{j}} x )}$

Suppose that $q^{'} (X)$ (which is the polynomial defined by ${Q^{'}}^{G}$ , and is of degree at most $n - 1$ ) does not take the form

$i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{q _{i} ( X ) - r _{i} ( X )}{j = 0 \prod n _{e} - 1 ( X - ω ^{(q_{i})_{j}} x )}$

and yet $q^{'} (X)$ agrees with this expression at $x_{3}$ as we've established above. By the Schwartz-Zippel lemma this can only happen for at most $n - 1$ choices of $x_{3} \in Ch$ and so by restricting $∣ BadCh (tr^{'} ∣_{x_{3}}) ∣/∣ Ch ∣ \leq \frac{n - 1}{∣ Ch ∣} \leq ϵ$ we obtain that

$q^{'} (X) = i = 0 \sum n_{q} - 1 x_{2}^{i} \frac{q _{i} ( X ) - r _{i} ( X )}{j = 0 \prod n _{e} - 1 ( X - ω ^{(q_{i})_{j}} x )}$

Next we will extract the coefficients of this polynomial in $x_{2}$ (which are themselves polynomials in formal indeterminate $X$ ) by again applying the Schwartz-Zippel lemma with respect to $x_{2}$ ; again, this leads to the restriction $∣ BadCh (tr^{'} ∣_{x_{2}}) ∣/∣ Ch ∣ \leq \frac{n _{q}}{∣ Ch ∣} \leq ϵ$ and we obtain the following polynomials of degree at most $n - 1$ for all $i \in [0, n_{q} - 1)$

$\frac{q _{i} ( X ) - r _{i} ( X )}{j = 0 \prod n _{e} - 1 ( X - ω ^{(q_{i})_{j}} x )}$

Having established that these are each non-rational polynomials of degree at most $n - 1$ we can then say (by the factor theorem) that for each $i \in [0, n_{q} - 1]$ and $j \in [0, n_{e} - 1]$ we have that $q_{i} (X) - r_{i} (X)$ has a root at $ω^{(q_{i})_{j}} x$ . Note that we can interpret each $q_{i} (X)$ as the restriction of a bivariate polynomial at the point $x_{1}$ whose degree with respect to $x_{1}$ is at most $n_{a} + 1$ and whose coefficients consist of various polynomials $a_{i}^{'} (X)$ (from the representation ${A_{i}^{'}}^{G}$ ) as well as $h^{'} (X)$ (from the representation ${H_{i}^{'}}^{G}$ ) and $r (X)$ (from the representation ${R}^{G}$ ). By similarly applying the Schwartz-Zippel lemma and restricting the challenge space with $∣ BadCh (tr^{'} ∣_{x_{1}}) ∣/∣ Ch ∣ \leq \frac{n _{a} + 1}{∣ Ch ∣} \leq ϵ$ we obtain (by construction of each $q_{i}^{'} (X)$ and $r_{i} (X)$ in steps 12 and 13 of the protocol) that the prover's claimed value of $r$ in step 9 is equal to $r (x)$ ; that the value $h$ computed by the verifier in step 13 is equal to $h^{'} (x)$ ; and that for all $i \in [0, n_{q} - 1]$ the prover's claimed values $(a_{i})_{j} = a_{i}^{'} (ω^{(p_{i})_{j}} x)$ for all $j \in [0, n_{e} - 1]$ .

By construction of $h^{'} (X)$ (from the representation ${H^{'}}^{G}$ ) in step 7 we know that $h^{'} (x) = h (x)$ where by $h (X)$ we refer to the polynomial of degree at most $(n_{g} - 1) \cdot (n - 1)$ whose coefficients correspond to the concatenated representations of each ${H_{i}}^{G}$ . As before, suppose that $h (X)$ does not take the form $g^{'} (X) / t (X)$ . Then because $h (X)$ is determined prior to the choice of $x$ then by the Schwartz-Zippel lemma we know that it would only agree with $g^{'} (X) / t (X)$ at $(n_{g} - 1) \cdot (n - 1)$ points at most if the polynomials were not equal. By restricting again $∣ BadCh (tr^{'} ∣_{x}) ∣/∣ Ch ∣ \leq \frac{( n _{g} - 1 ) \cdot ( n - 1 )}{∣ Ch ∣} \leq ϵ$ we obtain $h (X) = g^{'} (X) / t (X)$ and because $h (X)$ is a non-rational polynomial by the factor theorem we obtain that $g^{'} (X)$ vanishes over the domain $D$ .

We now have that $g^{'} (X)$ vanishes over $D$ but wish to show that $g (X, C_{0}, C_{1}, \dots)$ vanishes over $D$ at all points to complete the proof. This just involves a sequence of applying the same technique to each of the challenges; since the polynomial $g (\dots)$ has degree at most $n_{g} \cdot (n - 1)$ in any indeterminate by definition, and because each polynomial $a_{i} (X, C_{0}, C_{1}, ..., C_{i - 1}, \dots)$ is determined prior to the choice of concrete challenge $c_{i}$ by similarly bounding $∣ BadCh (tr^{'} ∣_{c_{i}}) ∣/∣ Ch ∣ \leq \frac{n _{g} \cdot ( n - 1 )}{∣ Ch ∣} \leq ϵ$ we ensure that $g (X, C_{0}, C_{1}, \dots)$ vanishes over $D$ , completing the proof.

Implementation

Halo 2 proofs

Proofs as opaque byte streams

In proving system implementations like bellman, there is a concrete Proof struct that encapsulates the proof data, is returned by a prover, and can be passed to a verifier.

halo2 does not contain any proof-like structures, for several reasons:

The Proof structures would contain vectors of (vectors of) curve points and scalars. This complicates serialization/deserialization of proofs because the lengths of these vectors depend on the configuration of the circuit. However, we didn't want to encode the lengths of vectors inside of proofs, because at runtime the circuit is fixed, and thus so are the proof sizes.
It's easy to accidentally put stuff into a Proof structure that isn't also placed in the transcript, which is a hazard when developing and implementing a proving system.
We needed to be able to create multiple PLONK proofs at the same time; these proofs share many different substructures when they are for the same circuit.

Instead, halo2 treats proof objects as opaque byte streams. Creation and consumption of these byte streams happens via the transcript:

The TranscriptWrite trait represents something that we can write proof components to (at proving time).
The TranscriptRead trait represents something that we can read proof components from (at verifying time).

Crucially, implementations of TranscriptWrite are responsible for simultaneously writing to some std::io::Write buffer at the same time that they hash things into the transcript, and similarly for TranscriptRead/std::io::Read.

As a bonus, treating proofs as opaque byte streams ensures that verification accounts for the cost of deserialization, which isn't negligible due to point compression.

Proof encoding

A Halo 2 proof, constructed over a curve $E (F_{p})$ , is encoded as a stream of:

Points $P \in E (F_{p})$ (for commitments to polynomials), and
Scalars $s \in F_{q}$ (for evaluations of polynomials, and blinding values).

For the Pallas and Vesta curves, both points and scalars have 32-byte encodings, meaning that proofs are always a multiple of 32 bytes.

The halo2 crate supports proving multiple instances of a circuit simultaneously, in order to share common proof components and protocol logic.

In the encoding description below, we will use the following circuit-specific constants:

$k$ - the size parameter of the circuit (which has $2^{k}$ rows).
$A$ - the number of advice columns.
$F$ - the number of fixed columns.
$I$ - the number of instance columns.
$L$ - the number of lookup arguments.
$P$ - the number of permutation arguments.
$Col_{P}$ - the number of columns involved in permutation argument $P$ .
$D$ - the maximum degree for the quotient polynomial.
$Q_{A}$ - the number of advice column queries.
$Q_{F}$ - the number of fixed column queries.
$Q_{I}$ - the number of instance column queries.
$M$ - the number of instances of the circuit that are being proven simultaneously.

As the proof encoding directly follows the transcript, we can break the encoding into sections matching the Halo 2 protocol:

PLONK commitments:
- $A$ points (repeated $M$ times).
- $2 L$ points (repeated $M$ times).
- $P$ points (repeated $M$ times).
- $L$ points (repeated $M$ times).
Vanishing argument:
- $D - 1$ points.
- $Q_{I}$ scalars (repeated $M$ times).
- $Q_{A}$ scalars (repeated $M$ times).
- $Q_{F}$ scalars.
- $D - 1$ scalars.
PLONK evaluations:
- $(2 + Col_{P}) \times P$ scalars (repeated $M$ times).
- $5 L$ scalars (repeated $M$ times).
Multiopening argument:
- 1 point.
- 1 scalar per set of points in the multiopening argument.
Polynomial commitment scheme:
- $1 + 2 k$ points.
- $2$ scalars.

Fields

The Pasta curves that we use in halo2 are designed to be highly 2-adic, meaning that a large $2^{S}$ multiplicative subgroup exists in each field. That is, we can write $p - 1 \equiv 2^{S} \cdot T$ with $T$ odd. For both Pallas and Vesta, $S = 32$ ; this helps to simplify the field implementations.

Sarkar square-root algorithm (table-based variant)

We use a technique from Sarkar2020 to compute square roots in halo2. The intuition behind the algorithm is that we can split the task into computing square roots in each multiplicative subgroup.

Suppose we want to find the square root of $u$ modulo one of the Pasta primes $p$ , where $u$ is a non-zero square in $Z_{p}^{\times}$ . We define a $2^{S}$ root of unity $g = z^{T}$ where $z$ is a non-square in $Z_{p}^{\times}$ , and precompute the following tables:

$g t ab = g^{0} (g^{2^{8}})^{0} (g^{2^{16}})^{0} (g^{2^{24}})^{0} g^{1} (g^{2^{8}})^{1} (g^{2^{16}})^{1} (g^{2^{24}})^{1} ... ... ... ... g^{2^{8} - 1} (g^{2^{8}})^{2^{8} - 1} (g^{2^{16}})^{2^{8} - 1} (g^{2^{24}})^{2^{8} - 1}$

$in v t ab = [(g^{- 2^{24}})^{0} (g^{- 2^{24}})^{1} ... (g^{- 2^{24}})^{2^{8} - 1}]$

Let $v = u^{(T - 1) /2}$ . We can then define $x = uv \cdot v = u^{T}$ as an element of the $2^{S}$ multiplicative subgroup.

Let $x_{3} = x, x_{2} = x_{3}^{2^{8}}, x_{1} = x_{2}^{2^{8}}, x_{0} = x_{1}^{2^{8}} .$

i = 0, 1

Using $in v t ab$ , we lookup $t_{0}$ such that $x_{0} = (g^{- 2^{24}})^{t_{0}} ⟹ x_{0} \cdot g^{t_{0} \cdot 2^{24}} = 1.$

Define $α_{1} = x_{1} \cdot (g^{2^{16}})^{t_{0}} .$

i = 2

Lookup $t_{1}$ s.t. $α_{1} = (g^{- 2^{24}})^{t_{1}} ⟹ x_{1} \cdot (g^{2^{16}})^{t_{0}} = (g^{- 2^{24}})^{t_{1}} ⟹ x_{1} \cdot g^{(t_{0} + 2^{8} \cdot t_{1}) \cdot 2^{16}} = 1.$

Define $α_{2} = x_{2} \cdot (g^{2^{8}})^{t_{0} + 2^{8} \cdot t_{1}} .$

i = 3

Lookup $t_{2}$ s.t.

$α_{2} = (g^{- 2^{24}})^{t_{2}} ⟹ x_{2} \cdot (g^{2^{8}})^{t_{0} + 2^{8} \cdot t_{1}} = (g^{- 2^{24}})^{t_{2}} ⟹ x_{2} \cdot g^{(t_{0} + 2^{8} \cdot t_{1} + 2^{16} \cdot t_{2}) \cdot 2^{8}} = 1.$

Define $α_{3} = x_{3} \cdot g^{t_{0} + 2^{8} \cdot t_{1} + 2^{16} \cdot t_{2}} .$

Final result

Lookup $t_{3}$ such that

$α_{3} = (g^{- 2^{24}})^{t_{3}} ⟹ x_{3} \cdot g^{t_{0} + 2^{8} \cdot t_{1} + 2^{16} \cdot t_{2}} = (g^{- 2^{24}})^{t_{3}} ⟹ x_{3} \cdot g^{t_{0} + 2^{8} \cdot t_{1} + 2^{16} \cdot t_{2} + 2^{24} \cdot t_{3}} = 1.$

Let $t = t_{0} + 2^{8} \cdot t_{1} + 2^{16} \cdot t_{2} + 2^{24} \cdot t_{3}$ .

We can now write $x_{3} \cdot g^{t} = 1 ⟹ ⟹ ⟹ ⟹ x_{3} u v^{2} uv uv \cdot g^{t /2} = = = = g^{- t} g^{- t} v^{- 1} \cdot g^{- t} v^{- 1} \cdot g^{- t /2} .$

Squaring the RHS, we observe that $(v^{- 1} g^{- t /2})^{2} = v^{- 2} g^{- t} = u .$ Therefore, the square root of $u$ is $uv \cdot g^{t /2}$ ; the first part we computed earlier, and the second part can be computed with three multiplications using lookups in $g t ab$ .

Selector combining

Heavy use of custom gates can lead to a circuit defining many binary selectors, which would increase proof size and verification time.

This section describes an optimization, applied automatically by halo2, that combines binary selector columns into fewer fixed columns.

The basic idea is that if we have $ℓ$ binary selectors labelled $1, \dots, ℓ$ that are enabled on disjoint sets of rows, then under some additional conditions we can combine them into a single fixed column, say $q$ , such that: $q = {k, 0, if the selector labelled k is 1 if all these selectors are 0.$

However, the devil is in the detail.

The halo2 API allows defining some selectors to be "simple selectors", subject to the following condition:

Every polynomial constraint involving a simple selector $s$ must be of the form $s \cdot t = 0,$ where $t$ is a polynomial involving no simple selectors.

Suppose that $s$ has label $k$ in some set of $ℓ$ simple selectors that are combined into $q$ as above. Then this condition ensures that replacing $s$ by $q \cdot \prod_{1 \leq h \leq ℓ, h \neq = k} (h - q)$ will not change the meaning of any constraints.

It would be possible to relax this condition by ensuring that every use of a binary selector is substituted by a precise interpolation of its value from the corresponding combined selector. However,

the restriction simplifies the implementation, developer tooling, and human understanding and debugging of the resulting constraint system;

the scope to apply the optimization is not impeded very much by this restriction for typical circuits.

Note that replacing $s$ by $q \cdot \prod_{1 \leq h \leq ℓ, h \neq = k} (h - q)$ will increase the degree of constraints selected by $s$ by $ℓ - 1$ , and so we must choose the selectors that are combined in such a way that the maximum degree bound is not exceeded.

Identifying selectors that can be combined

We need a partition of the overall set of selectors $s_{0}, \dots, s_{m - 1}$ into subsets (called "combinations"), such that no two selectors in a combination are enabled on the same row.

Labels must be unique within a combination, but they are not unique across combinations. Do not confuse a selector's index with its label.

Suppose that we are given $max_degree$ , the degree bound of the circuit.

We use the following algorithm:

Leave nonsimple selectors unoptimized, i.e. map each of them to a separate fixed column.
Check (or ensure by construction) that all polynomial constraints involving each simple selector $s_{i}$ are of the form $s_{i} \cdot t_{i, j} = 0$ where $t_{i, j}$ do not involve any simple selectors. For each $i$ , record the maximum degree of any $t_{i, j}$ as $d_{i}^{max}$ .
Compute a binary "exclusion matrix" $X$ such that $X_{j, i}$ is $1$ whenever $i \neq = j$ and $s_{i}$ and $s_{j}$ are enabled on the same row; and $0$ otherwise.

Since $X$ is symmetric and is zero on the diagonal, we can represent it by either its upper or lower triangular entries. The rest of the algorithm is guaranteed only to access only the entries $X_{j, i}$ where $j > i$ .
Initialize a boolean array $added_{0.. k - 1}$ to all $false$ .

$added_{i}$ will record whether $s_{i}$ has been included in any combination.
Iterate over the $s_{i}$ that have not yet been added to any combination:
- a. Add $s_{i}$ to a fresh combination $c$ , and set $added_{i} = true$ .
- b. Let mut $d := d_{i}^{max} - 1$ .
  
  $d$ is used to keep track of the largest degree, excluding the selector expression, of any gate involved in the combination $c$ so far.
- c. Iterate over all the selectors $s_{j}$ for $j > i$ that can potentially join $c$ , i.e. for which $added_{j}$ is false:
  - i. (Optimization) If $d + len (c) = max_degree$ , break to the outer loop, since no more selectors can be added to $c$ .
  - ii. Let $d^{new} = max (d, d_{j}^{max} - 1)$ .
  - iii. If $X_{j, i^{'}}$ is $true$ for any $i^{'}$ in $c$ , or if $d^{new} + (len (c) + 1) > max_degree$ , break to the outer loop.
    
    $d^{new} + (len (c) + 1)$ is the maximum degree, including the selector expression, of any constraint that would result from adding $s_{j}$ to the combination $c$ .
  - iv. Set $d := d^{new}$ .
  - v. Add $s_{j}$ to $c$ and set $added_{j} := true$ .
- d. Allocate a fixed column $q_{c}$ , initialized to all-zeroes.
- e. For each selector $s^{'} \in c$ :
  - i. Label $s^{'}$ with a distinct index $k$ where $1 \leq k \leq len (c)$ .
  - ii. Record that $s^{'}$ should be substituted with $q_{c} \cdot \prod_{1 \leq h \leq len (c), h \neq = k} (h - q_{c})$ in all gate constraints.
  - iii. For each row $r$ such that $s^{'}$ is enabled at $r$ , assign the value $k$ to $q_{c}$ at row $r$ .

The above algorithm is implemented in halo2_proofs/src/plonk/circuit/compress_selectors.rs. This is used by the compress_selectors function of halo2_proofs/src/plonk/circuit.rs which does the actual substitutions.

Writing circuits to take best advantage of selector combining

For this optimization it is beneficial for a circuit to use simple selectors as far as possible, rather than fixed columns. It is usually not beneficial to do manual combining of selectors, because the resulting fixed columns cannot take part in the automatic combining. That means that to get comparable results you would need to do a global optimization manually, which would interfere with writing composable gadgets.

Whether two selectors are enabled on the same row (and so are inhibited from being combined) depends on how regions are laid out by the floor planner. The currently implemented floor planners do not attempt to take this into account. We suggest not worrying about it too much — the gains that can be obtained by cajoling a floor planner to shuffle around regions in order to improve combining are likely to be relatively small.

Gadgets

In this section we document the gadgets and chip designs provided in the halo2_gadgets crate.

Neither these gadgets, nor their implementations, have been reviewed, and they should not be used in production.

Elliptic Curves

`EccChip`

halo2_gadgets provides a chip that implements EccInstructions using 10 advice columns. The chip is currently restricted to the Pallas curve, but will be extended to support the Vesta curve in the near future.

Chip assumptions

A non-exhaustive list of assumptions made by EccChip:

$0$ is not an $x$ -coordinate of a valid point on the curve.
- Holds for Pallas because $5$ is not square in $F_{q}$ .
$0$ is not a $y$ -coordinate of a valid point on the curve.
- Holds for Pallas because $- 5$ is not a cube in $F_{q}$ .

Layout

The following table shows how columns are used by the gates for various chip sub-areas:

$W$ - witnessing points.
$A I$ - incomplete point addition.
$A C$ - complete point addition.
$MF$ - Fixed-base scalar multiplication.
$M V I$ - variable-base scalar multiplication, incomplete rounds.
$M V C$ - variable-base scalar multiplication, complete rounds.
$M V O$ - variable-base scalar multiplication, overflow check.

$Sub-area W A I A C MF M V I M V C a_{0} x x_{p} x_{p} x_{p} x_{p} x_{p} a_{1} y y_{p} y_{p} y_{p} y_{p} y_{p} a_{2} x_{q} x_{r} x_{q} x_{r} x_{q} x_{r} λ_{2}^{l o} x_{q} x_{r} a_{3} y_{q} y_{r} y_{q} y_{r} y_{q} y_{r} x_{A}^{hi} y_{q} y_{r} a_{4} λ window λ_{1}^{hi} λ a_{5} α u λ_{2}^{hi} α a_{6} β z^{l o} β a_{7} γ x_{A}^{l o} γ a_{8} δ λ_{1}^{l o} δ a_{9} z^{hi} z^{co m pl e t e}$

Witnessing points

We represent elliptic curve points in the circuit in their affine representation $(x, y)$ . The identity is represented as the pseudo-coordinate $(0, 0)$ , which we assume is not a valid point on the curve.

Non-identity points

To constrain a coordinate pair $(x, y)$ as representing a valid point on the curve, we directly check the curve equation. For Pallas and Vesta, this is:

$y^{2} = x^{3} + 5$

$Degree 4 Constraint q_{point}^{non-id} \cdot (y^{2} - x^{3} - 5) = 0$

Points including the identity

To allow $(x, y)$ to represent either a valid point on the curve, or the pseudo-coordinate $(0, 0)$ , we define a separate gate that enforces the curve equation check unless both $x$ and $y$ are zero.

$Degree 55 Constraint (q_{point} \cdot x) \cdot (y^{2} - x^{3} - 5) = 0 (q_{point} \cdot y) \cdot (y^{2} - x^{3} - 5) = 0$

We will use formulae for curve arithmetic using affine coordinates on short Weierstrass curves, derived from section 4.1 of Hüseyin Hışıl's thesis.

Incomplete addition

Inputs: $P = (x_{p}, y_{p}), Q = (x_{q}, y_{q})$
Output: $R = P ⸭ Q = (x_{r}, y_{r})$

The formulae from Hışıl's thesis are:

$x_{3} = (\frac{y _{1} - y _{2}}{x _{1} - x _{2}})^{2} - x_{1} - x_{2}$
$y_{3} = \frac{y _{1} - y _{2}}{x _{1} - x _{2}} \cdot (x_{1} - x_{3}) - y_{1} .$

Rename $(x_{1}, y_{1})$ to $(x_{q}, y_{q})$ , $(x_{2}, y_{2})$ to $(x_{p}, y_{p})$ , and $(x_{3}, y_{3})$ to $(x_{r}, y_{r})$ , giving

$x_{r} = (\frac{y _{q} - y _{p}}{x _{q} - x _{p}})^{2} - x_{q} - x_{p}$
$y_{r} = \frac{y _{q} - y _{p}}{x _{q} - x _{p}} \cdot (x_{q} - x_{r}) - y_{q}$

which is equivalent to

$x_{r} + x_{q} + x_{p} = (\frac{y _{p} - y _{q}}{x _{p} - x _{q}})^{2}$
$y_{r} + y_{q} = \frac{y _{p} - y _{q}}{x _{p} - x _{q}} \cdot (x_{q} - x_{r}) .$

Assuming $x_{p} \neq = x_{q}$ , we have

$and ⟺ ⟺ ⟺ ⟺ x_{r} + x_{q} + x_{p} (x_{r} + x_{q} + x_{p}) \cdot (x_{p} - x_{q})^{2} (x_{r} + x_{q} + x_{p}) \cdot (x_{p} - x_{q})^{2} - (y_{p} - y_{q})^{2} y_{r} + y_{q} (y_{r} + y_{q}) \cdot (x_{p} - x_{q}) (y_{r} + y_{q}) \cdot (x_{p} - x_{q}) - (y_{p} - y_{q}) \cdot (x_{q} - x_{r}) = = = = = = (\frac{y _{p} - y _{q}}{x _{p} - x _{q}})^{2} (y_{p} - y_{q})^{2} 0 \frac{y _{p} - y _{q}}{x _{p} - x _{q}} \cdot (x_{q} - x_{r}) (y_{p} - y_{q}) \cdot (x_{q} - x_{r}) 0.$

So we get the constraints:

$(x_{r} + x_{q} + x_{p}) \cdot (x_{p} - x_{q})^{2} - (y_{p} - y_{q})^{2} = 0$
- Note that this constraint is unsatisfiable for $P ⸭ (- P)$ (when $P \neq = O$ ), and so cannot be used with arbitrary inputs.
$(y_{r} + y_{q}) \cdot (x_{p} - x_{q}) - (y_{p} - y_{q}) \cdot (x_{q} - x_{r}) = 0.$

Constraints

$Degree 43 Constraint q_{add-incomplete} \cdot ((x_{r} + x_{q} + x_{p}) \cdot (x_{p} - x_{q})^{2} - (y_{p} - y_{q})^{2}) = 0 q_{add-incomplete} \cdot ((y_{r} + y_{q}) \cdot (x_{p} - x_{q}) - (y_{p} - y_{q}) \cdot (x_{q} - x_{r})) = 0$

Complete addition

$O O (x_{p}, y_{p}) (x, y) (x, y) (x_{p}, y_{p}) + + + + + + O (x_{q}, y_{q}) O (x, y) (x, - y) (x_{q}, y_{q}) = O = (x_{q}, y_{q}) = (x_{p}, y_{p}) = [2] (x, y) = O = (x_{p}, y_{p}) ⸭ (x_{q}, y_{q}), if x_{p} \neq = x_{q} .$

Suppose that we represent $O$ as $(0, 0)$ . ( $0$ is not an $x$ -coordinate of a valid point because we would need $y^{2} = x^{3} + 5$ , and $5$ is not square in $F_{q}$ . Also $0$ is not a $y$ -coordinate of a valid point because $- 5$ is not a cube in $F_{q}$ .)

$P + Q (x_{p}, y_{p}) + (x_{q}, y_{q}) λ x_{r} y_{r} = R = (x_{r}, y_{r}) = \frac{y _{q} - y _{p}}{x _{q} - x _{p}} = λ^{2} - x_{p} - x_{q} = λ (x_{p} - x_{r}) - y_{p}$

For the doubling case, Hışıl's thesis tells us that $λ$ has to instead be computed as $\frac{3 x ^{2}}{2 y}$ .

Define $inv0 (x) = {0, 1/ x, if x = 0 otherwise.$

Witness $α, β, γ, δ, λ$ where:

$α = β = γ = δ = λ = inv0 (x_{q} - x_{p}) inv0 (x_{p}) inv0 (x_{q}) {inv0 (y_{q} + y_{p}), 0, if x_{q} = x_{p} otherwise ⎩ ⎨ ⎧ \frac{y _{q} - y _{p}}{x _{q} - x _{p}}, \frac{3 x _{p} ^{2}}{2 y _{p}} 0, if x_{q} \neq = x_{p} if x_{q} = x_{p} \land y_{p} \neq = 0 otherwise.$

Constraints

$Degree 456666444444 Constraint q_{add} \cdot (x_{q} - x_{p}) \cdot ((x_{q} - x_{p}) \cdot λ - (y_{q} - y_{p})) q_{add} \cdot (1 - (x_{q} - x_{p}) \cdot α) \cdot (2 y_{p} \cdot λ - 3 x_{p}^{2}) q_{add} \cdot x_{p} \cdot x_{q} \cdot (x_{q} - x_{p}) \cdot (λ^{2} - x_{p} - x_{q} - x_{r}) q_{add} \cdot x_{p} \cdot x_{q} \cdot (x_{q} - x_{p}) \cdot (λ \cdot (x_{p} - x_{r}) - y_{p} - y_{r}) q_{add} \cdot x_{p} \cdot x_{q} \cdot (y_{q} + y_{p}) \cdot (λ^{2} - x_{p} - x_{q} - x_{r}) q_{add} \cdot x_{p} \cdot x_{q} \cdot (y_{q} + y_{p}) \cdot (λ \cdot (x_{p} - x_{r}) - y_{p} - y_{r}) q_{add} \cdot (1 - x_{p} \cdot β) \cdot (x_{r} - x_{q}) q_{add} \cdot (1 - x_{p} \cdot β) \cdot (y_{r} - y_{q}) q_{add} \cdot (1 - x_{q} \cdot γ) \cdot (x_{r} - x_{p}) q_{add} \cdot (1 - x_{q} \cdot γ) \cdot (y_{r} - y_{p}) q_{add} \cdot (1 - (x_{q} - x_{p}) \cdot α - (y_{q} + y_{p}) \cdot δ) \cdot x_{r} q_{add} \cdot (1 - (x_{q} - x_{p}) \cdot α - (y_{q} + y_{p}) \cdot δ) \cdot y_{r} = = = = = = = = = = = = 000000000000 Meaning x_{q} \neq = x_{p} ⟹ λ = \frac{y _{q} - y _{p}}{x _{q} - x _{p}} {x_{q} = x_{p} \land y_{p} \neq = 0 ⟹ λ = \frac{3 x _{p} ^{2}}{2 y _{p}} x_{q} = x_{p} \land y_{p} = 0 ⟹ x_{p} = 0 x_{p} \neq = 0 \land x_{q} \neq = 0 \land x_{q} \neq = x_{p} ⟹ x_{r} = λ^{2} - x_{p} - x_{q} x_{p} \neq = 0 \land x_{q} \neq = 0 \land x_{q} \neq = x_{p} ⟹ y_{r} = λ \cdot (x_{p} - x_{r}) - y_{p} x_{p} \neq = 0 \land x_{q} \neq = 0 \land y_{q} \neq = - y_{p} ⟹ x_{r} = λ^{2} - x_{p} - x_{q} x_{p} \neq = 0 \land x_{q} \neq = 0 \land y_{q} \neq = - y_{p} ⟹ y_{r} = λ \cdot (x_{p} - x_{r}) - y_{p} x_{p} = 0 ⟹ x_{r} = x_{q} x_{p} = 0 ⟹ y_{r} = y_{q} x_{q} = 0 ⟹ x_{r} = x_{p} x_{q} = 0 ⟹ y_{r} = y_{p} x_{q} = x_{p} \land y_{q} = - y_{p} ⟹ x_{r} = 0 x_{q} = x_{p} \land y_{q} = - y_{p} ⟹ y_{r} = 0$

Max degree: 6

Analysis of constraints

Propositions:

$(1) (2) (3) (4) (5) (6) x_{q} \neq = x_{p} ⟹ λ = (y_{q} - y_{p}) / (x_{q} - x_{p}) (x_{q} = x_{p}) \land y_{p} \neq = 0 ⟹ λ = 3 x_{p}^{2} /2 y_{p} (x_{p} \neq = 0) \land (x_{q} \neq = 0) \land ((x_{q} \neq = x_{p}) \lor (y_{q} \neq = - y_{p})) ⟹ (x_{r} = λ^{2} - x_{p} - x_{q}) \land (y_{r} = λ \cdot (x_{p} - x_{r}) - y_{p}) x_{p} = 0 ⟹ (x_{r}, y_{r}) = (x_{q}, y_{q}) x_{q} = 0 ⟹ (x_{r}, y_{r}) = (x_{p}, y_{p}) x_{q} = x_{p} \land y_{q} = - y_{p} ⟹ (x_{r}, y_{r}) = (0, 0)$

Cases:

$(x_{p}, y_{p}) + (x_{q}, y_{q}) = (x_{r}, y_{r})$

Note that we rely on the fact that $0$ is not a valid $x$ -coordinate or $y$ -coordinate of a point on the Pallas curve other than $O$ .

$(0, 0) + (0, 0)$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} = x_{p} holds because y_{p} = 0 holds because x_{p} = 0 holds because (x_{r}, y_{r}) = (x_{q}, y_{q}) = (0, 0) holds because (x_{r}, y_{r}) = (x_{p}, y_{p}) = (0, 0) holds because (x_{r}, y_{r}) = (0, 0) .$
- Soundness: $(x_{r}, y_{r}) = (0, 0)$ is the only solution to $(6) .$
$(x, y) + (0, 0)$ for $(x, y) \neq = (0, 0)$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} \neq = x_{p}, therefore λ = (y_{q} - y_{p}) / (x_{q} - x_{p}) is a solution holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} is a solution holds because x_{q} = 0 holds because x_{p} \neq = 0, therefore β = x_{p}^{- 1} is a solution holds because (x_{r}, y_{r}) = (x_{p}, y_{p}) holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} and δ = 0 is a solution.$
- Soundness: $(x_{r}, y_{r}) = (x_{p}, y_{p})$ is the only solution to $(5) .$
$(0, 0) + (x, y)$ for $(x, y) \neq = (0, 0)$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} \neq = x_{p}, therefore λ = (y_{q} - y_{p}) / (x_{q} - x_{p}) is a solution holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} is a solution holds because x_{p} = 0 holds because x_{p} = 0 only when (x_{r}, y_{r}) = (x_{q}, y_{q}) holds because x_{q} \neq = 0, therefore γ = x_{q}^{- 1} is a solution holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} and δ = 0 is a solution.$
- Soundness: $(x_{r}, y_{r}) = (x_{q}, y_{q})$ is the only solution to $(4) .$
$(x, y) + (x, y)$ for $(x, y) \neq = (0, 0)$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} = x_{p} holds because x_{q} = x_{p} \land y_{p} \neq = 0, therefore λ = 3 x_{p}^{2} /2 y_{p} is a solution holds because x_{r} = λ^{2} - x_{p} - x_{q} \land y_{r} = λ \cdot (x_{p} - x_{r}) - y_{p} in this case holds because x_{p} \neq = 0, therefore β = x_{p}^{- 1} is a solution holds because x_{p} \neq = 0, therefore γ = x_{q}^{- 1} is a solution holds because x_{q} = x_{p} and y_{q} \neq = - y_{p}, therefore α = 0 and δ = (y_{q} + y_{p})^{- 1} is a solution.$
- Soundness: $λ$ is computed correctly, and $(x_{r}, y_{r}) = (λ^{2} - x_{p} - x_{q}, λ \cdot (x_{p} - x_{r}) - y_{p})$ is the only solution.
$(x, y) + (x, - y)$ for $(x, y) \neq = (0, 0)$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} = x_{p} holds because x_{q} = x_{p} \land y_{p} \neq = 0, therefore λ = 3 x_{p}^{2} /2 y_{p} is a solution (although λ is not used in this case) holds because x_{q} = x_{p} and y_{q} = - y_{p} holds because x_{p} \neq = 0, therefore β = x_{p}^{- 1} is a solution holds because x_{q} \neq = 0, therefore γ = x_{q}^{- 1} is a solution holds because (x_{r}, y_{r}) = (0, 0)$
- Soundness: $(x_{r}, y_{r}) = (0, 0)$ is the only solution to $(6) .$
$(x_{p}, y_{p}) + (x_{q}, y_{q})$ for $(x_{p}, y_{p}) \neq = (0, 0)$ and $(x_{q}, y_{q}) \neq = (0, 0)$ and $x_{p} \neq = x_{q}$
- Completeness:
  
  $(1) (2) (3) (4) (5) (6) holds because x_{q} \neq = x_{p}, therefore λ = (y_{q} - y_{p}) / (x_{q} - x_{p}) is a solution holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} is a solution holds because x_{r} = λ^{2} - x_{p} - x_{q} \land y_{r} = λ \cdot (x_{p} - x_{r}) - y_{p} in this case holds because x_{p} \neq = 0, therefore β = x_{p}^{- 1} is a solution holds because x_{q} \neq = 0, therefore γ = x_{q}^{- 1} is a solution holds because x_{q} \neq = x_{p}, therefore α = (x_{q} - x_{p})^{- 1} and δ = 0 is a solution.$
- Soundness: $λ$ is computed correctly, and $(x_{r}, y_{r}) = (λ^{2} - x_{p} - x_{q}, λ \cdot (x_{p} - x_{r}) - y_{p})$ is the only solution.

Fixed-base scalar multiplication

There are $6$ fixed bases in the Orchard protocol:

$K^{Orchard}$ , used in deriving the nullifier;
$G^{Orchard}$ , used in spend authorization;
$R$ base for $NoteCommit^{Orchard}$ ;
$V$ and $R$ bases for $ValueCommit^{Orchard}$ ; and
$R$ base for $Commit^{ivk}$ .

Decompose scalar

We support fixed-base scalar multiplication with three types of scalars:

Full-width scalar

A $255$ -bit scalar from $F_{q}$ . We decompose a full-width scalar $α$ into $85$ $3$ -bit windows:

$α = k_{0} + k_{1} \cdot (2^{3})^{1} + \dots + k_{84} \cdot (2^{3})^{84}, k_{i} \in [0.. 2^{3}) .$

The scalar multiplication will be computed correctly for $k_{0..84}$ representing any integer in the range $[0, 2^{255})$ - that is, the scalar is allowed to be non-canonical.

We range-constrain each $3$ -bit word of the scalar decomposition using a polynomial range-check constraint: $Degree 9 Constraint q_{mul_fixed_full} \cdot range_check (word, 2^{3}) = 0$ where $range_check (word, range) = word \cdot (1 - word) \dots (range - 1 - word) .$

Base field element

We support using a base field element as the scalar in fixed-base multiplication. This occurs, for example, in the scalar multiplication for the nullifier computation of the Action circuit $DeriveNullifie r_{nk} = Extract_{P} ([(PR F_{nk}^{nfOrchard} (ρ) + ψ) mod q_{P}] K^{Orchard} + cm)$ : here, the scalar $[(PR F_{nk}^{nfOrchard} (ρ) + ψ) mod q_{P}]$ is the result of a base field addition.

Decompose the base field element $α$ into three-bit windows, and range-constrain each window, using the short range decomposition gadget in strict mode, with $W = 85, K = 3.$

If $k_{0..84}$ is witnessed directly then no issue of canonicity arises. However, because the scalar is given as a base field element here, care must be taken to ensure a canonical representation, since $2^{255} > p$ . That is, we must check that $0 \leq α < p,$ where $p$ the is Pallas base field modulus $p = 2^{254} + t_{p} = 2^{254} + 45560315531419706090280762371685220353.$ Note that $t_{p} < 2^{130} .$

To do this, we decompose $α$ into three pieces: $α = α_{0} (252 bits) ∣∣ α_{1} (2 bits) ∣∣ α_{2} (1 bit) .$

We check the correctness of this decomposition by: $Degree 532 Constraint q_{canon-base-field} \cdot range_check (α_{1}, 2^{2}) = 0 q_{canon-base-field} \cdot bool_check (α_{2}) = 0 q_{canon-base-field} \cdot (z_{84} - (α_{1} + α_{2} \cdot 2^{2})) = 0$ If the MSB $α_{2} = 0$ is not set, then $α < 2^{254} < p .$ However, in the case where $α_{2} = 1$ , we must check:

$α_{2} = 1 ⟹ α_{1} = 0;$
$α_{2} = 1 ⟹ α_{0} < t_{p}$ :
- $α_{2} = 1 ⟹ 0 \leq α_{0} < 2^{130}$ ,
- $α_{2} = 1 ⟹ 0 \leq α_{0} + 2^{130} - t_{p} < 2^{130}$

To check that $0 \leq α_{0} < 2^{130},$ we make use of the three-bit running sum decomposition:

Firstly, we constrain $α_{0}$ to be a $132$ -bit value by enforcing its high $120$ bits to be all-zero. We can get $alpha_0_hi_120$ from the decomposition: $z_{44} ⟹ alpha_0_hi_120 = k_{44} + 2^{3} k_{45} + \dots + 2^{3 \cdot (84 - 44)} k_{84} = z_{44} - 2^{3 \cdot (84 - 44)} k_{84} = z_{44} - 2^{3 \cdot (40)} z_{84} .$
Then, we constrain bits $130.. = 131$ of $α_{0}$ to be zeroes; in other words, we constrain the three-bit word $k_{43} = α [129.. = 131] = α_{0} [129.. = 131] \in {0, 1} .$ We make use of the running sum decomposition to obtain $k_{43} = z_{43} - z_{44} \cdot 2^{3} .$

Define $α_{0}^{'} = α_{0} + 2^{130} - t_{p}$ . To check that $0 \leq α_{0}^{'} < 2^{130},$ we use 13 ten-bit lookups, where we constrain the $z_{13}$ running sum output of the lookup to be $0$ if $α_{2} = 1.$ $Degree 23343 Constraint q_{canon-base-field} \cdot (α_{0}^{'} - (α_{0} + 2^{130} - t_{P})) = 0 q_{canon-base-field} \cdot α_{2} \cdot α_{1} = 0 q_{canon-base-field} \cdot α_{2} \cdot alpha_0_hi_120 = 0 q_{canon-base-field} \cdot α_{2} \cdot bool_check (k_{43}) = 0 q_{canon-base-field} \cdot α_{2} \cdot z_{13} (lookup (α_{0}^{'}, 13)) = 0 Comment α_{2} = 1 ⟹ α_{1} = 0 Constrain α_{0} to be a 132-bit value Constrain α_{0} [130.. = 131] to 0 α_{2} = 1 ⟹ 0 \leq α_{0}^{'} < 2^{130}$

Short signed scalar

A short signed scalar is witnessed as a magnitude $m$ and sign $s$ such that $s \in {- 1, 1} m \in [0, 2^{64}) v^{old} - v^{new} = s \cdot m .$

This is used for $ValueCommi t^{Orchard}$ . We want to compute $ValueCommi t_{rcv}^{Orchard} (v^{old} - v^{new}) = [v^{old} - v^{new}] V + [rcv] R$ , where $- (2^{64} - 1) \leq v^{old} - v^{new} \leq 2^{64} - 1$

$v^{old}$ and $v^{new}$ are each already constrained to $64$ bits (by their use as inputs to $NoteCommi t^{Orchard}$ ).

Decompose the magnitude $m$ into three-bit windows, and range-constrain each window, using the short range decomposition gadget in strict mode, with $W = 22, K = 3.$

We have two additional constraints: $Degree 33 Constraint q_{mul_fixed_short} \cdot bool_check (k_{21}) = 0 q_{mul_fixed_short} \cdot (s^{2} - 1) = 0 Comment The last window must be a single bit. The sign must be 1 or - 1.$ where $bool_check (x) = x \cdot (1 - x)$ .

Load fixed base

Then, we precompute multiples of the fixed base $B$ for each window. This takes the form of a window table: $M [0.. W) [0..8)$ such that:

for the first (W-1) rows $M [0.. (W - 1)) [0..8)$ : $M [w] [k] = [(k + 2) \cdot (2^{3})^{w}] B$
in the last row $M [W - 1] [0..8)$ : $M [w] [k] = [k \cdot (2^{3})^{w} - j = 0 \sum 83 2^{3 j + 1}] B$

The additional $(k + 2)$ term lets us avoid adding the point at infinity in the case $k = 0$ . We offset these accumulated terms by subtracting them in the final window, i.e. we subtract $j = 0 \sum W - 2 2^{3 j + 1}$ .

Note: Although an offset of $(k + 1)$ would naively suffice, it introduces an edge case when $k_{0} = 7, k_{1} = 0$ . In this case, the window table entries evaluate to the same point:

$M [0] [k_{0}] = [(7 + 1) * (2^{3})^{0}] B = [8] B,$

$M [1] [k_{1}] = [(0 + 1) * (2^{3})^{1}] B = [8] B .$

In fixed-base scalar multiplication, we sum the multiples of $B$ at each window (except the last) using incomplete addition. Since the point doubling case is not handled by incomplete addition, we avoid it by using an offset of $(k + 2) .$

For each window of fixed-base multiples $M [w] = (M [w] [0], \dots, M [w] [7]), w \in [0.. (W - 1))$ :

Define a Lagrange interpolation polynomial $L_{x} (k)$ that maps $k \in [0..8)$ to the $x$ -coordinate of the multiple $M [w] [k]$ , i.e. $L_{x} (k) = ⎩ ⎨ ⎧ ([(k + 2) \cdot (2^{3})^{w}] B)_{x} ([k \cdot (2^{3})^{w} - j = 0 \sum 83 2^{3 j + 1}] B)_{x} for w \in [0.. (W - 1)); for w = 84; and$
Find a value $z_{w}$ such that $z_{w} + (M [w] [k])_{y}$ is a square $u^{2}$ in the field, but the wrong-sign $y$ -coordinate $z_{w} - (M [w] [k])_{y}$ does not produce a square.

Repeating this for all $W$ windows, we end up with:

an $W \times 8$ table $L_{x}$ storing $8$ coefficients interpolating the $x -$ coordinate for each window. Each $x$ -coordinate interpolation polynomial will be of the form $L_{x} [w] (k) = c_{0} + c_{1} \cdot k + c_{2} \cdot k^{2} + \dots + c_{7} \cdot k^{7},$ where $k \in [0..8), w \in [0..85)$ and $c_{k}$ 's are the coefficients for each power of $k$ ; and
a length- $W$ array $Z$ of $z_{w}$ 's.

We load these precomputed values into fixed columns whenever we do fixed-base scalar multiplication in the circuit.

Fixed-base scalar multiplication

Given a decomposed scalar $α$ and a fixed base $B$ , we compute $[α] B$ as follows:

For each $k_{w}, w \in [0..85), k_{w} \in [0..8)$ in the scalar decomposition, witness the $x$ - and $y$ -coordinates $(x_{w}, y_{w}) = M [w] [k_{w}] .$
Check that $(x_{w}, y_{w})$ is on the curve: $y_{w}^{2} = x_{w}^{3} + b$ .
Witness $u_{w}$ such that $y_{w} + z_{w} = u_{w}^{2}$ .
For all windows but the last, use incomplete addition to sum the $M [w] [k_{w}]$ 's, resulting in $[α - k_{84} \cdot (2^{3})^{84} + j = 0 \sum 83 2^{3 j + 1}] B$ .
For the last window, use complete addition $M [83] [k_{83}] + M [84] [k_{84}]$ and return the final result.

Note: complete addition is required in the final step to correctly map $[0] B$ to a representation of the point at infinity, $(0, 0)$ ; and also to handle a corner case for which the last step is a doubling.

Constraints: $Degree 843 Constraint q_{mul-fixed} \cdot (L_{x} [w] (k_{w}) - x_{w}) = 0 q_{mul-fixed} \cdot (y_{w}^{2} - x_{w}^{3} - b) = 0 q_{mul-fixed} \cdot (u_{w}^{2} - y_{w} - Z [w]) = 0$

where $b = 5$ (from the Pallas curve equation).

Signed short exponent

Recall that the signed short exponent is witnessed as a $64 -$ bit magnitude $m$ , and a sign $s \in 1, - 1 .$ Using the above algorithm, we compute $P = [m] B$ . Then, to get the final result $P^{'},$ we conditionally negate $P$ using $(x, y) \mapsto (x, s \cdot y)$ .

Constraints: $Degree 33 Constraint q_{mul_fixed_short} \cdot (P_{y}^{'} - P_{y}) \cdot (P_{y}^{'} + P_{y}) = 0 q_{mul_fixed_short} \cdot (s \cdot P_{y}^{'} - P_{y}) = 0$

Layout

$x_{P} x_{P, 0} x_{P, 1} x_{P, 2} ⋮ y_{P} y_{P, 0} y_{P, 1} y_{P, 2} ⋮ x_{QR} x_{Q, 1} = x_{P, 0} x_{Q, 2} = x_{R, 1} ⋮ y_{QR} y_{Q, 1} = y_{P, 0} y_{Q, 2} = y_{R, 1} ⋮ u u_{0} u_{1} u_{2} ⋮ window window_{0} window_{1} window_{2} ⋮ L_{0.. = 7} L_{0.. = 7, 0} L_{0.. = 7, 1} L_{0.. = 7, 1} ⋮ fixed_z fixed_z_{0} fixed_z_{1} fixed_z_{2} ⋮$

Note: this doesn't include the last row that uses complete addition. In the implementation this is allocated in a different region.

Variable-base scalar multiplication

In the Orchard circuit we need to check $p k_{d} = [ivk] g_{d}$ where $ivk \in [0, p)$ and the scalar field is $F_{q}$ with $p < q$ .

We have $p = 2^{254} + t_{p}$ and $q = 2^{254} + t_{q}$ , for $t_{p}, t_{q} < 2^{128}$ .

Witness scalar

We're trying to compute $[α] T$ for $α \in [0, q)$ . Set $k = α + t_{q}$ and $n = 254$ . Then we can compute

$[2^{254} + (α + t_{q})] T = [2^{254} + (α + t_{q}) - (2^{254} + t_{q})] T = [α] T$

provided that $α + t_{q} \in [0, 2^{n + 1})$ , i.e. $α < 2^{n + 1} - t_{q}$ which covers the whole range we need because in fact $2^{255} - t_{q} > q$ .

Thus, given a scalar $α$ , we witness the boolean decomposition of $k = α + t_{q} .$ (We use big-endian bit order for convenient input into the variable-base scalar multiplication algorithm.)

$k = k_{254} \cdot 2^{254} + k_{253} \cdot 2^{253} + \dots + k_{0} .$

Variable-base scalar multiplication

We use an optimized double-and-add algorithm, copied from "Faster variable-base scalar multiplication in zk-SNARK circuits" with some variable name changes:

Acc := [2] T
for i from n-1 down to 0 {
    P := k_{i+1} ? T : −T
    Acc := (Acc + P) + Acc
}
return (k_0 = 0) ? (Acc - T) : Acc

It remains to check that the x-coordinates of each pair of points to be added are distinct.

When adding points in a prime-order group, we can rely on Theorem 3 from Appendix C of the Halo paper, which says that if we have two such points with nonzero indices wrt a given odd-prime order base, where the indices taken in the range $- (q - 1) /2.. (q - 1) /2$ are distinct disregarding sign, then they have different x-coordinates. This is helpful, because it is easier to reason about the indices of points occurring in the scalar multiplication algorithm than it is to reason about their x-coordinates directly.

So, the required check is equivalent to saying that the following "indexed version" of the above algorithm never asserts:

acc := 2
for i from n-1 down to 0 {
    p = k_{i+1} ? 1 : −1
    assert acc ≠ ± p
    assert (acc + p) ≠ acc    // X
    acc := (acc + p) + acc
    assert 0 < acc ≤ (q-1)/2
}
if k_0 = 0 {
    assert acc ≠ 1
    acc := acc - 1
}

The maximum value of acc is:

    <--- n 1s --->
  1011111...111111
= 1100000...000000 - 1

= $2^{n + 1} + 2^{n} - 1$

The assertion labelled X obviously cannot fail because $p \neq = 0$ . It is possible to see that acc is monotonically increasing except in the last conditional. It reaches its largest value when $k$ is maximal, i.e. $2^{n + 1} + 2^{n} - 1$ .

So to entirely avoid exceptional cases, we would need $2^{n + 1} + 2^{n} - 1 < (q - 1) /2$ . But we can use $n$ larger by $c$ if the last $c$ iterations use complete addition.

The first $i$ for which the algorithm using only incomplete addition fails is going to be $252$ , since $2^{252 + 1} + 2^{252} - 1 > (q - 1) /2$ . We need $n = 254$ to make the wraparound technique above work.

sage: q = 0x40000000000000000000000000000000224698fc0994a8dd8c46eb2100000001
sage: 2^253 + 2^252 - 1 < (q-1)//2
False
sage: 2^252 + 2^251 - 1 < (q-1)//2
True

So the last three iterations of the loop ( $i = 2..0$ ) need to use complete addition, as does the conditional subtraction at the end. Writing this out using ⸭ for incomplete addition (as we do in the spec), we have:

Acc := [2] T
for i from 253 down to 3 {
    P := k_{i+1} ? T : −T
    Acc := (Acc ⸭ P) ⸭ Acc
}
for i from 2 down to 0 {
    P := k_{i+1} ? T : −T
    Acc := (Acc + P) + Acc  // complete addition
}
return (k_0 = 0) ? (Acc + (-T)) : Acc  // complete addition

Constraint program for optimized double-and-add

Define a running sum $z_{j} = \sum_{i = j}^{n} (k_{i} \cdot 2^{i - j})$ , where $n = 254$ and:

$z_{n + 1} = 0, z_{n} = k_{n}, (most significant bit) z_{0} = k .$

$Initialize A_{254} = [2] T . for i from 254 down to 4 : bool_check (k_{i}) = 0 z_{i} = 2 z_{i + 1} + k_{i} x_{P, i} = x_{T} y_{P, i} = (2 k_{i} - 1) \cdot y_{T} (conditionally negate) λ_{1, i} \cdot (x_{A, i} - x_{P, i}) = y_{A, i} - y_{P, i} x_{R, i} = λ_{1, i}^{2} - x_{A, i} - x_{P, i} (λ_{1, i} + λ_{2, i}) \cdot (x_{A, i} - x_{R, i}) = 2 y_{A, i} λ_{2, i}^{2} = x_{A, i - 1} + x_{R, i} + x_{A, i} λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) = y_{A, i} + y_{A, i - 1} .$

The helper $bool_check (x) = x \cdot (1 - x)$ . After substitution of $x_{P, i}, y_{P, i}, x_{R, i}, y_{A, i}$ , and $y_{A, i - 1}$ , this becomes:

$Initialize A_{254} = [2] T . for i from 254 down to 4 : // let k_{i} = z_{i} - 2 z_{i + 1} // let y_{A, i} = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2} bool_check (k_{i}) = 0 λ_{1, i} \cdot (x_{A, i} - x_{T}) = y_{A, i} - (2 k_{i} - 1) \cdot y_{T} λ_{2, i}^{2} = x_{A, i - 1} + λ_{1, i}^{2} - x_{T} {λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) = y_{A, i} + y_{A, i - 1}, λ_{2, 4} \cdot (x_{A, 4} - x_{A, 3}) = y_{A, 4} + y_{A, 3}^{witnessed}, if i > 4 if i = 4.$

Here, $y_{A, 3}^{witnessed}$ is assigned to a cell. This is unlike previous $y_{A, i}$ 's, which were implicitly derived from $λ_{1, i}, λ_{2, i}, x_{A, i}, x_{T}$ , but never actually assigned.

The bits $k_{3 \dots 1}$ are used in three further steps, using complete addition:

$for i from 3 down to 1 : // let k_{i} = z_{i} - 2 z_{i + 1} bool_check (k_{i}) = 0 (x_{A, i - 1}, y_{A, i - 1}) = ((x_{A, i}, y_{A, i}) + (x_{T}, y_{T})) + (x_{A, i}, y_{A, i})$

If the least significant bit $k_{0} = 1,$ we set $B = O,$ otherwise we set $B = - T$ . Then we return $A + B$ using complete addition.

Let $B = {(0, 0), (x_{T}, - y_{T}), if k_{0} = 1, otherwise.$

Output $(x_{A, 0}, y_{A, 0}) + B$ .

(Note that $(0, 0)$ represents $O$ .)

Incomplete addition

We need six advice columns to witness $(x_{T}, y_{T}, λ_{1}, λ_{2}, x_{A, i}, z_{i})$ . However, since $(x_{T}, y_{T})$ are the same, we can perform two incomplete additions in a single row, reusing the same $(x_{T}, y_{T})$ . We split the scalar bits used in incomplete addition into $hi$ and $l o$ halves and process them in parallel. This means that we effectively have two for loops:

the first, covering the $hi$ half for $i$ from $254$ down to $130$ , with a special case at $i = 130$ ; and
the second, covering the $l o$ half for the remaining $i$ from $129$ down to $4$ , with a special case at $i = 4$ .

$x_{T} x_{T} x_{T} ⋮ x_{T} x_{T} y_{T} y_{T} y_{T} ⋮ y_{T} y_{T} z^{hi} z_{255} = 0 z_{254} z_{253} ⋮ z_{130} x_{A}^{hi} x_{A, 254} = 2 [T]_{x} x_{A, 253} ⋮ x_{A, 130} x_{A, 129} λ_{1}^{hi} y_{A, 254} = 2 [T]_{y} λ_{1, 254} λ_{1, 253} ⋮ λ_{1, 130} y_{A, 129} λ_{2}^{hi} λ_{2, 254} λ_{2, 253} ⋮ λ_{2, 130} q_{1}^{hi} 100 ⋮ 0 q_{2}^{hi} 011 ⋮ 0 q_{3}^{hi} 000 ⋮ 1 z^{l o} z_{130} z_{129} z_{128} ⋮ z_{5} z_{4} x_{A}^{l o} x_{A, 129} x_{A, 128} ⋮ x_{A, 5} x_{A, 4} x_{A, 3} λ_{1}^{l o} y_{A, 129} λ_{1, 129} λ_{1, 128} ⋮ λ_{1, 5} λ_{1, 4} y_{A, 3} λ_{2}^{l o} λ_{2, 129} λ_{2, 128} ⋮ λ_{2, 5} λ_{2, 4} q_{1}^{l o} 100 ⋮ 00 q_{2}^{l o} 011 ⋮ 10 q_{3}^{l o} 000 ⋮ 01$

For each $hi$ and $l o$ half, we have three sets of gates. Note that $i$ is going from $255.. = 3$ ; $i$ is NOT indexing the rows.

$q_{1} = 1$

This gate is only used on the first row (before the for loop). We check that $λ_{1}, λ_{2}$ are initialized to values consistent with the initial $y_{A} .$ $Degree 4 Constraint q_{1} \cdot (y_{A, n}^{witnessed} - y_{A, n}) = 0$ where $y_{A, n} y_{A, n}^{witnessed} = \frac{( λ _{1, n} + λ _{2, n} ) \cdot ( x _{A, n} - ( λ _{1, n}^{2} - x _{A, n} - x _{T} ))}{2}, is witnessed.$

$q_{2} = 1$

This gate is used on all rows corresponding to the for loop except the last.

$Degree 223433 Constraint q_{2} \cdot (x_{T, c u r} - x_{T, n e x t}) = 0 q_{2} \cdot (y_{T, c u r} - y_{T, n e x t}) = 0 q_{2} \cdot bool_check (k_{i}) = 0, where k_{i} = z_{i} - 2 z_{i + 1} q_{2} \cdot (λ_{1, i} \cdot (x_{A, i} - x_{T, i}) - y_{A, i} + (2 k_{i} - 1) \cdot y_{T, i}) = 0 q_{2} \cdot (λ_{2, i}^{2} - x_{A, i - 1} - x_{R, i} - x_{A, i}) = 0 q_{2} \cdot (λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) - y_{A, i} - y_{A, i - 1}) = 0$ where $x_{R, i} y_{A, i} y_{A, i - 1} = λ_{1, i}^{2} - x_{A, i} - x_{T}, = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2}, = \frac{( λ _{1, i - 1} + λ _{2, i - 1} ) \cdot ( x _{A, i - 1} - ( λ _{1, i - 1}^{2} - x _{A, i - 1} - x _{T} ))}{2},$

$q_{3} = 1$

This gate is used on the final iteration of the for loop, handling the special case where we check that the output $y_{A}$ has been witnessed correctly. $Degree 3433 Constraint q_{3} \cdot bool_check (k_{i}) = 0, where k_{i} = z_{i} - 2 z_{i + 1} q_{3} \cdot (λ_{1, i} \cdot (x_{A, i} - x_{T, i}) - y_{A, i} + (2 k_{i} - 1) \cdot y_{T, i}) = 0 q_{3} \cdot (λ_{2, i}^{2} - x_{A, i - 1} - x_{R, i} - x_{A, i}) = 0 q_{3} \cdot (λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) - y_{A, i} - y_{A, i - 1}^{witnessed}) = 0$ where $x_{R, i} y_{A, i} y_{A, i - 1}^{witnessed} = λ_{1, i}^{2} - x_{A, i} - x_{T}, = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2}, is witnessed.$

Complete addition

We reuse the complete addition constraints to implement the final $c$ rounds of double-and-add. This requires two rows per round because we need 9 advice columns for each complete addition. In the 10th advice column we stash the other cells that we need to correctly implement the double-and-add:

The base $y$ coordinate, so we can conditionally negate it as input to one of the complete additions.
The running sum, which we constrain over two rows instead of sequentially.

Layout

$a_{0} x_{T} x_{A} a_{1} y_{p} y_{A} a_{2} x_{A} x_{q} x_{r} a_{3} y_{A} y_{q} y_{r} a_{4} λ_{1} λ_{2} a_{5} α_{1} α_{2} a_{6} β_{1} β_{2} a_{7} γ_{1} γ_{2} a_{8} δ_{1} δ_{2} a_{9} z_{i + 1} y_{T} z_{i} q_{mul_decompose_var} 010$

Constraints

In addition to the complete addition constraints, we define the following gate:

$Degree Constraint q_{mul_decompose_var} \cdot bool_check (k_{i}) = 0 q_{mul_decompose_var} \cdot ternary (k_{i}, y_{T} - y_{p}, y_{T} + y_{p}) = 0$ where $k_{i} = z_{i} - 2 z_{i + 1}$ .

LSB

Layout

$a_{0} x_{p} x_{T} a_{1} y_{p} y_{T} a_{2} x_{A} x_{r} a_{3} y_{A} y_{r} a_{4} λ a_{5} α a_{6} β a_{7} γ a_{8} δ a_{9} z_{1} z_{0} q_{mul_lsb} 10$

Constraints

$Degree Constraint q_{mul_lsb} \cdot bool_check (k_{0}) = 0 q_{mul_lsb} \cdot ternary (k_{0}, x_{p}, x_{p} - x_{T}) = 0 q_{mul_lsb} \cdot ternary (k_{0}, y_{p}, y_{p} + y_{T}) = 0$ where $k_{0} = z_{0} - 2 z_{1}$ .

Overflow check

$z_{i}$ cannot overflow for any $i \geq 1$ , because it is a weighted sum of bits only up to $2^{n - 1} = 2^{253}$ , which is smaller than $p$ (and also $q$ ).

However, $z_{0} = α + t_{q}$ can overflow $[0, p)$ .

Note: for full-width scalar mul, it may not be possible to represent $z_{0}$ in the base field (e.g. when the base field is Pasta's $F_{p}$ and $p < q$ ). In that case, we need to special-case the row that would mention $z_{0}$ so that it is correct for whatever representation we use for a full-width scalar. Our representation for $k$ will be the pair $(k_{254}, k^{'} = k - 2^{254} \cdot k_{254})$ . We'll use $k^{'}$ in place of $α + t_{q}$ for $z_{0}$ , constraining $k^{'}$ to 254 bits so that it fits in an $F_{p}$ element. Then we just have to generalize the argument below to work for $k^{'} \in [0, 2 \cdot t_{q})$ (because the maximum value of $α + t_{q}$ is $q - 1 + t_{q} = 2^{254} + t_{q} - 1 + t_{q}$ ).

Since overflow can only occur in the final step that constrains $z_{0} = 2 \cdot z_{1} + k_{0}$ , we have $z_{0} = k (mod p)$ . It is then sufficient to also check that $z_{0} = α + t_{q} (mod p)$ (so that $k = α + t_{q} (mod p)$ ) and that $k \in [t_{q}, p + t_{q})$ . These conditions together imply that $k = α + t_{q}$ as an integer, and so $2^{254} + k = α (mod q)$ as required.

Note: the bits $k_{254..0}$ do not represent a value reduced modulo $q$ , but rather a representation of the unreduced $α + t_{q}$ .

Optimized check for $k \in [t_{q}, p + t_{q})$

Since $t_{p} + t_{q} < 2^{130}$ (also true if $p$ and $q$ are swapped), we have $[t_{q}, p + t_{q}) = [t_{q}, t_{q} + 2^{130}) \cup [2^{130}, 2^{254}) \cup ([2^{254}, 2^{254} + 2^{130}) \cap [p + t_{q} - 2^{130}, p + t_{q})) .$

We may assume that $k = α + t_{q} (mod p)$ .

(This is true for the use of variable-base scalar mul in Orchard, where we know that $α < p$ . If is also true if we swap $p$ and $q$ so that we have $p > q$ . It is not true for a full-width scalar $α \geq p$ when $p < q$ .)

Therefore, $k \in [t_{q}, p + t_{q}) \Leftrightarrow \Leftrightarrow \Leftrightarrow (k \in [t_{q}, t_{q} + 2^{130}) \lor k \in [2^{130}, 2^{254})) \lor (k \in [2^{254}, 2^{254} + 2^{130}) \land k \in [p + t_{q} - 2^{130}, p + t_{q})) (k_{254} = 0 ⟹ (k \in [t_{q}, t_{q} + 2^{130}) \lor k \in [2^{130}, 2^{254}))) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land k \in [p + t_{q} - 2^{130}, p + t_{q}))) (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k \in [2^{130}, 2^{254}))) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land (α + 2^{130}) mod p \in [0, 2^{130}))) Ⓐ$

Given $k \in [2^{254}, 2^{254} + 2^{130})$ , we prove equivalence of $k \in [p + t_{q} - 2^{130}, p + t_{q})$ and $(α + 2^{130}) mod p \in [0, 2^{130})$ as follows:

shift the range by $2^{130} - p - t_{q}$ to give $k + 2^{130} - p - t_{q} \in [0, 2^{130})$ ;

observe that $k + 2^{130} - p - t_{q}$ is guaranteed to be in $[2^{130} - t_{p} - t_{q}, 2^{131} - t_{p} - t_{q})$ and therefore cannot overflow or underflow modulo $p$ ;

using the fact that $k = α + t_{q} (mod p)$ , observe that $(k + 2^{130} - p - t_{q}) mod p = (α + t_{q} + 2^{130} - p - t_{q}) mod p = (α + 2^{130}) mod p$ .

(We can see in a different way that this is correct by observing that it checks whether $α mod p \in [p - 2^{130}, p)$ , so the upper bound is aligned as we would expect.)

Now, we can continue optimizing from $Ⓐ$ :

$k \in [t_{q}, p + t_{q}) \Leftrightarrow \Leftrightarrow (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k \in [2^{130}, 2^{254})) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land (α + 2^{130}) mod p \in [0, 2^{130}))) (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k_{253..130} are not all 0)) \land (k_{254} = 1 ⟹ (k_{253..130} are all 0 \land (α + 2^{130}) mod p \in [0, 2^{130})))$

Constraining $k_{253..130}$ to be all- $0$ or not-all- $0$ can be implemented almost "for free", as follows.

Recall that $z_{i} = \sum_{h = i}^{n} (k_{h} \cdot 2^{h - i})$ , so we have:

$z_{130} z_{130} z_{130} - k_{254} \cdot 2^{124} = = = \sum_{h = 130}^{254} (k_{h} \cdot 2^{h - 130}) k_{254} \cdot 2^{254 - 130} + \sum_{h = 130}^{253} (k_{h} \cdot 2^{h - 130}) \sum_{h = 130}^{253} (k_{h} \cdot 2^{h - 130})$

So $k_{253..130}$ are all $0$ exactly when $z_{130} = k_{254} \cdot 2^{124}$ .

Finally, we can merge the $130$ -bit decompositions for the $k_{254} = 0$ and $k_{254} = 1$ cases by checking that $(α + k_{254} \cdot 2^{130}) mod p \in [0, 2^{130})$ .

Overflow check constraints

Let $s = α + k_{254} \cdot 2^{130}$ . The constraints for the overflow check are:

$z_{0} k_{254} = 1 ⟹ (z_{130} k_{254} = 0 ⟹ (z_{130} = α + t_{q} (mod p) = 2^{124} \land s mod p \in [0, 2^{130})) \neq = 0 \lor s mod p \in [0, 2^{130}))$

Define $inv0 (x) = {0, 1/ x, if x = 0 otherwise.$

Witness $η = inv0 (z_{130})$ , and decompose $s mod p$ as $s_{129..0}$ .

Then the needed gates are:

$Degree 22335 Constraint q_{mul_overflow} \cdot (s - (α + k_{254} \cdot 2^{130})) = 0 q_{mul_overflow} \cdot (z_{0} - α - t_{q}) = 0 q_{mul_overflow} \cdot (k_{254} \cdot (z_{130} - 2^{124})) = 0 q_{mul_overflow} \cdot (k_{254} \cdot (s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}) = 0 q_{mul_overflow} \cdot ((1 - k_{254}) \cdot (1 - z_{130} \cdot η) \cdot (s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}) = 0$ where $(s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}$ can be computed by another running sum. Note that the factor of $1/ 2^{130}$ has no effect on the constraint, since the RHS is zero.

Running sum range check

We make use of a $10$ -bit lookup range check in the circuit to subtract the low $130$ bits of $s$ . The range check subtracts the first $13 \cdot 10$ bits of $s,$ and right-shifts the result to give $(s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130} .$

Overflow check (general)

Recall that we defined $z_{j} = \sum_{i = j}^{n} (k_{i} \cdot 2^{i - j})$ , where $n = 254$ .

$z_{j}$ cannot overflow for any $j \geq 1$ , because it is a weighted sum of bits only up to and including $2^{n - j}$ . When $n = 254$ and $j = 1$ this sum can be at most $2^{254} - 1$ , which is smaller than $p$ (and also $q$ ).

However, for full-width scalar mul, it may not be possible to represent $z_{0}$ in the base field (e.g. when the base field is Pasta's $F_{p}$ and $p < q$ ). In that case $z_{0} = α + t_{q}$ can overflow $[0, p)$ .

So, we need to special-case the row that would mention $z_{0}$ so that it is correct for whatever representation we use for a full-width scalar.

Our representation for $k$ will be the pair $(k_{254}, k^{'} = k - 2^{254} \cdot k_{254})$ . We'll use $k^{'}$ in place of $α + t_{q}$ for $z_{0}$ , constraining $k^{'}$ to 254 bits so that it fits in an $F_{p}$ element.

Then we just have to generalize the overflow check used for variable-base scalar mul in the Orchard circuit to work for $k^{'} \in [0, 2 \cdot t_{q})$ (because the maximum value of $α + t_{q}$ is $q - 1 + t_{q} = 2^{254} + t_{q} - 1 + t_{q}$ ).

Note: the bits $k_{254..0}$ do not represent a value reduced modulo $q$ , but rather a representation of the unreduced $α + t_{q}$ .

Overflow can only occur in the final step that constrains $z_{0} = 2 \cdot z_{1} + k_{0}$ , and only if $z_{1}$ has the bit with weight $2^{253}$ set (i.e. if $k_{254} = 1$ ). If we instead set $z_{0} = 2 \cdot z_{1} - 2^{254} \cdot k_{254} + k_{0}$ , now $z_{0}$ cannot overflow and should be equal to $k^{'}$ .

It is then sufficient to also check that $z_{0} + 2^{254} \cdot k_{254} = α + t_{q}$ as an integer where $α \in [0, q)$ .

Represent $α$ as $2^{254} \cdot α_{254} + 2^{253} \cdot α_{253} + α^{''}$ where we constrain $α^{''} \in [0, 2^{253})$ and $α_{253}$ and $α_{254}$ to boolean. For this to be a canonical representation we also need $α_{254} = 1 ⟹ (α_{253} = 0 \land α^{''} \in [0, t_{q}))$ .

Let $α^{'} = 2^{253} \cdot α_{253} + α^{''}$ .

If $α_{254} = 1$ :

constrain $k_{254} = 1$ and $z_{0} = α^{'} + t_{q}$ . This cannot overflow because in this case $α^{'} \in [0, t_{q})$ and so $z_{0} \in [0, 2 \cdot t_{q})$ .

If $α_{254} = 0$ :

we should have $k_{254} = 1$ iff $α^{'} \in [2^{254} - t_{q}, 2^{254})$ , i.e. witness $k_{254}$ as boolean and then
- If $k_{254} = 0$ then constrain $α^{'} \neq \in [2^{254} - t_{q}, 2^{254})$ .
  - This can be done by constraining either $α_{253} = 0$ or $α^{''} + t_{q} \in [0, 2^{253})$ . ( $α^{''} + t_{q}$ cannot overflow.)
- If $k_{254} = 1$ then constrain $α^{'} \in [2^{254} - t_{q}, 2^{254})$ .
  - This can be done by constraining $α^{'} - (2^{254} - t_{q}) \in [0, 2^{130})$ and $α^{'} - 2^{254} + 2^{130} \in [0, 2^{130})$ .

Overflow check constraints (general)

Represent $α$ as $2^{254} \cdot α_{254} + 2^{253} \cdot α_{253} + α^{''}$ as above.

The constraints for the overflow check are:

$α_{254} = 1 ⟹ α_{254} = 1 ⟹ α_{254} = 1 ⟹ α_{254} = 1 ⟹ α_{254} = 1 ⟹ α_{254} = 0 \land α_{253} = 1 \land k_{254} = 0 α_{254} = 0 \land k_{254} = 1 α_{254} = 0 \land k_{254} = 1 α^{''} \in [0, 2^{253}) α_{253} \in {0, 1} α_{254} \in {0, 1} k_{254} \in {0, 1} α_{253} = 0 α^{''} \in [0, 2^{130}) ❁ α^{''} + 2^{130} - t_{q} \in [0, 2^{130}) ❁ k_{254} = 1 z_{0} = α^{'} + t_{q} ⟹ α^{''} + t_{q} \in [0, 2^{253}) ⟹ α^{'} - 2^{254} + t_{q} \in [0, 2^{130}) ❁ ⟹ α^{'} - 2^{254} + 2^{130} \in [0, 2^{130}) ❁$

Note that the four 130-bit constraints marked $❁$ are in two pairs that occur in disjoint cases. We can therefore combine them into two 130-bit constraints using a new witness variable $u$ ; the other constraint always being on $u + 2^{130} - t_{q}$ :

$α_{254} = 1 ⟹ α_{254} = 0 \land k_{254} = 1 ⟹ u = α^{''} u = α^{'} - 2^{254} + t_{q} u \in [0, 2^{130}) u + 2^{130} - t_{q} \in [0, 2^{130})$

( $u$ is unconstrained and can be witnessed as $0$ in the case $α_{254} = 0 \land k_{254} = 0$ .)

Cost

25 10-bit and one 3-bit range check, to constrain $α^{''}$ to 253 bits;
25 10-bit and one 3-bit range check, to constrain $α^{''} + t_{q}$ to 253 bits when $α_{254} = 0 \land α_{253} = 1 \land k_{254} = 0$ ;
two times 13 10-bit range checks.

Sinsemilla

Overview

Sinsemilla is a collision-resistant hash function and commitment scheme designed to be efficient in algebraic circuit models that support lookups, such as PLONK or Halo 2.

The security properties of Sinsemilla are similar to Pedersen hashes; it is not designed to be used where a random oracle, PRF, or preimage-resistant hash is required. The only claimed security property of the hash function is collision-resistance for fixed-length inputs.

Sinsemilla is roughly 4 times less efficient than the algebraic hashes Rescue and Poseidon inside a circuit, but around 19 times more efficient than Rescue outside a circuit. Unlike either of these hashes, the collision resistance property of Sinsemilla can be proven based on cryptographic assumptions that have been well-established for at least 20 years. Sinsemilla can also be used as a computationally binding and perfectly hiding commitment scheme.

The general approach is to split the message into $k$ -bit pieces, and for each piece, select from a table of $2^{k}$ bases in our cryptographic group. We combine the selected bases using a double-and-add algorithm. This ends up being provably as secure as a vector Pedersen hash, and makes advantageous use of the lookup facility supported by Halo 2.

Description

This section is an outline of how Sinsemilla works: for the normative specification, refer to §5.4.1.9 Sinsemilla Hash Function in the protocol spec. The incomplete point addition operator, ⸭, that we use below is also defined there.

Let $G$ be a cryptographic group of prime order $q$ . We write $G$ additively, with identity $O$ , and using $[m] P$ for scalar multiplication of $P$ by $m$ .

Let $k \geq 1$ be an integer chosen based on efficiency considerations (the table size will be $2^{k}$ ). Let $n$ be an integer, fixed for each instantiation, such that messages are $kn$ bits, where $2^{n} \leq \frac{q - 1}{2}$ . We use zero-padding to the next multiple of $k$ bits if necessary.

$Setup$ : Choose $Q$ and $P [0.. 2^{k} - 1]$ as $2^{k} + 1$ independent, verifiably random generators of $G$ , using a suitable hash into $G$ , such that none of $Q$ or $P [0.. 2^{k} - 1]$ are $O$ .

In Orchard, we define $Q$ to be dependent on a domain separator $D$ . The protocol specification uses $Q (D)$ in place of $Q$ and $S (m)$ in place of $P [m]$ .

$Hash (M)$ :

Split $M$ into $n$ groups of $k$ bits. Interpret each group as a $k$ -bit little-endian integer $m_{i}$ .
let $Acc_{0} := Q$
for $i$ from $0$ up to $n - 1$ :
- let $Acc_{i + 1} := (Acc_{i} ⸭ P [m_{i + 1}]) ⸭ Acc_{i}$
return $Acc_{n}$

Let $ShortHash (M)$ be the $x$ -coordinate of $Hash (M)$ . (This assumes that $G$ is a prime-order elliptic curve in short Weierstrass form, as is the case for Pallas and Vesta.)

It is slightly more efficient to express a double-and-add $[2] A + R$ as $(A + R) + A$ . We also use incomplete additions: it is shown in the Sinsemilla security argument that in the case where $G$ is a prime-order short Weierstrass elliptic curve, an exceptional case for addition would lead to finding a discrete logarithm, which can be assumed to occur with negligible probability even for adversarial input.

Use as a commitment scheme

Choose another generator $H$ independently of $Q$ and $P [0.. 2^{k} - 1]$ .

The randomness $r$ for a commitment is chosen uniformly on $[0, q)$ .

Let $Commit_{r} (M) = Hash (M) ⸭ [r] H$ .

Let $ShortCommit_{r} (M)$ be the $x -coordinate$ of $Commit_{r} (M)$ . (This again assumes that $G$ is a prime-order elliptic curve in short Weierstrass form.)

Note that unlike a simple Pedersen commitment, this commitment scheme ( $Commit$ or $ShortCommit$ ) is not additively homomorphic.

Efficient implementation

The aim of the design is to optimize the number of bits that can be processed for each step of the algorithm (which requires a doubling and addition in $G$ ) for a given table size. Using a single table of size $2^{k}$ group elements, we can process $k$ bits at a time.

Incomplete addition

In each step of Sinsemilla we want to compute $A_{i + 1} := (A_{i} ⸭ P_{i}) ⸭ A_{i}$ . Let $R_{i} := A_{i} ⸭ P_{i}$ be the intermediate result such that $A_{i + 1} := A_{i} ⸭ R_{i}$ . Recalling the incomplete addition formulae:

$x_{3} y_{3} = (\frac{y _{1} - y _{2}}{x _{1} - x _{2}})^{2} - x_{1} - x_{2} = \frac{y _{1} - y _{2}}{x _{1} - x _{2}} \cdot (x_{1} - x_{3}) - y_{1}$

Let $λ = \frac{y _{1} - y _{2}}{x _{1} - x _{2}}$ . Substituting the coordinates for each of the incomplete additions in turn, and rearranging, we get

$λ_{1, i} x_{R, i} y_{R, i} = \frac{y _{A, i} - y _{P, i}}{x _{A, i} - x _{P, i}} ⟹ y_{A, i} - y_{P, i} = λ_{1, i} \cdot (x_{A, i} - x_{P, i}) ⟹ y_{P, i} = y_{A, i} - λ_{1, i} \cdot (x_{A, i} - x_{P, i}) = λ_{1, i}^{2} - x_{A, i} - x_{P, i} = λ_{1, i} \cdot (x_{A, i} - x_{R, i}) - y_{A, i}$ and $λ_{2, i} x_{A, i + 1} y_{A, i + 1} = \frac{y _{A, i} - y _{R, i}}{x _{A, i} - x _{R, i}} ⟹ y_{A, i} - y_{R, i} = λ_{2, i} \cdot (x_{A, i} - x_{R, i}) ⟹ y_{A, i} - (λ_{1, i} \cdot (x_{A, i} - x_{R, i}) - y_{A, i}) = λ_{2, i} \cdot (x_{A, i} - x_{R, i}) ⟹ 2 \cdot y_{A, i} = (λ_{1, i} + λ_{2, i}) \cdot (x_{A, i} - x_{R, i}) = λ_{2, i}^{2} - x_{A, i} - x_{R, i} = λ_{2, i} \cdot (x_{A, i} - x_{A, i + 1}) - y_{A, i} .$

Constraint program

Let $P = {(j, x_{P [j]}, y_{P [j]}) for j \in {0.. 2^{k} - 1}}$ .

Input: $m_{1.. = n}$ . (The message words are 1-indexed here, as in the protocol spec, but we start the loop from $i = 0$ so that $(x_{A, i}, y_{A, i})$ corresponds to $Acc_{i}$ in the protocol spec.)

Output: $(x_{A, n}, y_{A, n})$ .

$(x_{A, 0}, y_{A, 0}) = Q$
for $i$ from $0$ up to $n - 1$ :
- $y_{P, i} = y_{A, i} - λ_{1, i} \cdot (x_{A, i} - x_{P, i})$
- $x_{R, i} = λ_{1, i}^{2} - x_{A, i} - x_{P, i}$
- $2 \cdot y_{A, i} = (λ_{1, i} + λ_{2, i}) \cdot (x_{A, i} - x_{R, i})$
- $(m_{i + 1}, x_{P, i}, y_{P, i}) \in P$
- $λ_{2, i}^{2} = x_{A, i + 1} + x_{R, i} + x_{A, i}$
- $λ_{2, i} \cdot (x_{A, i} - x_{A, i + 1}) = y_{A, i} + y_{A, i + 1}$

PLONK / Halo 2 constraints

Message decomposition

We have an $n$ -bit message $m = m_{1} + 2^{k} m_{2} + ... + 2^{k \cdot (n - 1)} m_{n}$ . (Note that the message words are 1-indexed as in the protocol spec.)

Initialise the running sum $z_{0} = α$ and define $z_{i + 1} := \frac{z _{i} - m _{i + 1}}{2 ^{K}}$ . We will end up with $z_{n} = 0.$

Rearranging gives us an expression for each word of the original message $m_{i + 1} = z_{i} - 2^{k} \cdot z_{i + 1}$ , which we can look up in the table. We position $z_{i}$ and $z_{i + 1}$ in adjacent rows of the same column, so we can sequentially apply the constraint across the entire message.

In other words, $z_{n - i} = h = 0 \sum i - 1 2^{kh} \cdot m_{h + 1}$ .

For a little-endian decomposition as used here, the running sum is initialized to the scalar and ends at 0. For a big-endian decomposition as used in variable-base scalar multiplication, the running sum would start at 0 and end with recovering the original scalar.

Efficient packing

The running sum only applies to message words within a single field element. That means if $n \geq PrimeField :: NUM_BITS$ then we will need several disjoint running sums. A longer message can be constructed by splitting the message words across several field elements, and then running several instances of the constraints below.

The expression for $m_{i + 1}$ above requires $n + 1$ rows in the $z_{i}$ column, leaving a one-row gap in adjacent columns and making $Acc_{i}$ tricker to accumulate. In order to support chaining multiple field elements without a gap, we use a slightly more complicated expression for $m_{i + 1}$ that includes a selector:

$m_{i + 1} = z_{i} - 2^{k} \cdot q_{r u n, i} \cdot z_{i + 1}$

This effectively forces $z_{n}$ to zero for the last step of each element, which allows the cell that would have been $z_{n}$ to be used to reinitialize the running sum for the next element.

With this sorted out, the incomplete addition accumulator can eliminate $y_{A, i}$ almost entirely, by substituting for $x$ and $λ$ values in the current and next rows. The two exceptions are at the start of Sinsemilla (where we need to constrain the accumulator to have initial value $Q$ ), and the end (where we need to witness $y_{A, n}$ for use outside of Sinsemilla).

Selectors

We need a total of four logical selectors to:

Control the Sinsemilla gate and lookup.
Distinguish between the last message word in a running sum and its earlier words.
Mark the start of Sinsemilla.
Mark the end of Sinsemilla.

We use regular selector columns for the Sinsemilla gate selector $q_{S 1}$ and Sinsemilla start selector $q_{S 4} .$ The other two selectors are synthesized from a single fixed column $q_{S 2}$ as follows:

$q_{S 3} q_{r u n} = q_{S 2} \cdot (q_{S 2} - 1) = q_{S 2} - q_{S 3}$

$q_{S 2} 012 q_{S 3} 002 q_{r u n} 010$

We set $q_{S 2}$ to $1$ on most Sinsemilla rows, and $0$ for the last step of each element, except for the last element where it is set to $2$ . We can then use $q_{S 3}$ to toggle between constraining the substituted $y_{A, i + 1}$ on adjacent rows, and the witnessed $y_{A, n}$ at the end of Sinsemilla:

$λ_{2, i} \cdot (x_{A, i} - x_{A, i + 1}) = y_{A, i} + \frac{2 - q _{S 3}}{2} \cdot y_{A, i + 1} + \frac{q _{S 3}}{2} \cdot y_{A, n}$

Generator lookup table

The Sinsemilla circuit makes use of $2^{10}$ pre-computed random generators. These are loaded into a lookup table: $t ab l e_{i d x} 012 ⋮ 2^{10} - 1 t ab l e_{x} x_{P [0]} x_{P [1]} x_{P [2]} ⋮ x_{P [2^{10} - 1]} t ab l e_{y} y_{P [0]} y_{P [1]} y_{P [2]} ⋮ y_{P [2^{10} - 1]}$

Layout

$Step 012 ⋮ n - 1 0^{'} 1^{'} 2^{'} ⋮ n - 1^{'} n^{'} x_{A} x_{Q} x_{A, 1} x_{A, 2} ⋮ x_{A, n - 1} x_{A, 0}^{'} x_{A, 1}^{'} x_{A, 2}^{'} ⋮ x_{A, n - 1}^{'} x_{A, n}^{'} x_{P} x_{P [m_{1}]} x_{P [m_{2}]} x_{P [m_{3}]} ⋮ x_{P [m_{n}]} x_{P [m_{1}^{'}]} x_{P [m_{2}^{'}]} x_{P [m_{3}^{'}]} ⋮ x_{P [m_{n}^{'}]} bi t s z_{0} z_{1} z_{2} ⋮ z_{n - 1} z_{0}^{'} z_{1}^{'} z_{2}^{'} ⋮ z_{n - 1}^{'} λ_{1} λ_{1, 0} λ_{1, 1} λ_{1, 2} ⋮ λ_{1, n - 1} λ_{1, 0}^{'} λ_{1, 1}^{'} λ_{1, 2}^{'} ⋮ λ_{1, n - 1}^{'} y_{A, n} λ_{2} λ_{2, 0} λ_{2, 1} λ_{2, 2} ⋮ λ_{2, n - 1} λ_{2, 0}^{'} λ_{2, 1}^{'} λ_{2, 2}^{'} ⋮ λ_{2, n - 1}^{'} q_{S 1} 11111111110 q_{S 2} 11110111120 q_{S 4} 10000000000 fixed_y_Q y_{Q} 0000000000$

$x_{Q}$ , $z_{0}$ , $z_{0}^{'}$ , etc. are copied in using equality constraints.

Optimized Sinsemilla gate

$For i \in [0, n), let x_{R, i} Y_{A, i} y_{P, i} m_{i + 1} q_{r u n} q_{S 3} = = = = = = λ_{1, i}^{2} - x_{A, i} - x_{P, i} (λ_{1, i} + λ_{2, i}) \cdot (x_{A, i} - x_{R, i}) Y_{A, i} /2 - λ_{1, i} \cdot (x_{A, i} - x_{P, i}) z_{i} - q_{r u n, i} \cdot z_{i + 1} \cdot 2^{k} q_{S 2} - q_{S 3} q_{S 2} \cdot (q_{S 2} - 1)$

The Halo 2 circuit API can automatically substitute $y_{P, i}$ , $x_{R, i}$ , $Y_{A, i}$ , and $Y_{A, i + 1}$ , so we don't need to do that manually.

$x_{A, 0} = x_{Q}$
$2 \cdot y_{Q} = Y_{A, 0}$
for $i$ from $0$ up to $n - 1$ :
- $(m_{i + 1}, x_{P, i}, y_{P, i}) \in P$
- $λ_{2, i}^{2} = x_{A, i + 1} + x_{R, i} + x_{A, i}$
- $4 \cdot λ_{2, i} \cdot (x_{A, i} - x_{A, i + 1}) = 2 \cdot Y_{A, i} + (2 - q_{S 3}) \cdot Y_{A, i + 1} + 2 q_{S 3} \cdot y_{A, n}$

Note that each term of the last constraint is multiplied by $4$ relative to the constraint program given earlier. This is a small optimization that avoids divisions by $2$ .

By gating the lookup expression on $q_{S 1}$ , we avoid the need to fill in unused cells with dummy values to pass the lookup argument. The optimized lookup value (using a default index of $0$ ) is:

$(q_{S 1} \cdot m_{i + 1}, q_{S 1} \cdot x_{P, i} + (1 - q_{S 1}) \cdot x_{P, 0}, q_{S 1} \cdot y_{P, i} + (1 - q_{S 1}) \cdot y_{P, 0})$

This increases the degree of the lookup argument to $6$ .

$Degree 4635 Constraint q_{S 4} \cdot (2 \cdot y_{Q} - Y_{A, 0}) = 0 q_{S 1, i} \Rightarrow (m_{i + 1}, x_{P, i}, y_{P, i}) \in P q_{S 1, i} \cdot (λ_{2, i}^{2} - (x_{A, i + 1} + x_{R, i} + x_{A, i})) q_{S 1, i} \cdot (4 \cdot λ_{2, i} \cdot (x_{A, i} - x_{A, i + 1}) - (2 \cdot Y_{A, i} + (2 - q_{S 3, i}) \cdot Y_{A, i + 1} + 2 \cdot q_{S 3, i} \cdot y_{A, n})) = 0$

MerkleCRH

Message decomposition

$SinsemillaHash$ is used in the $MerkleCR H^{Orchard}$ hash function. The input to $SinsemillaHash$ is:

$l ⋆ ∣∣ left ⋆ ∣∣ right ⋆,$

where:

$l ⋆ = I2LEBSP_{10} (l) = I2LEBSP_{10} (MerkleDepth^{Orchard} - 1 - layer)$ ,
$left ⋆ = I2LEBSP_{ℓ_{Merkle}^{Orchard}} (left)$ ,
$right ⋆ = I2LEBSP_{ℓ_{Merkle}^{Orchard}} (right)$ ,

with $ℓ_{Merkle}^{Orchard} = 255.$ $left ⋆$ and $right ⋆$ are allowed to be non-canonical $255$ -bit encodings of $left$ and $right$ .

Sinsemilla operates on multiples of 10 bits, so we start by decomposing the message into chunks:

$l ⋆ left ⋆ right ⋆ = a_{0} = a_{1} ∣∣ b_{0} ∣∣ b_{1} = (bits 0.. = 239 of left) ∣∣ (bits 240.. = 249 of left) ∣∣ (bits 250.. = 254 of left) = b_{2} ∣∣ c = (bits 0.. = 4 of right) ∣∣ (bits 5.. = 254 of right)$

Then we recompose the chunks into MessagePieces:

$Length (bits) 25020250 Piece a = a_{0} ∣∣ a_{1} b = b_{0} ∣∣ b_{1} ∣∣ b_{2} c$

Each message piece is constrained by $SinsemillaHash$ to its stated length. Additionally, $left$ and $right$ are witnessed as field elements, so we know that they are canonical. However, we need additional constraints to enforce that the chunks are the correct bit lengths (or else they could overlap in the decompositions and allow the prover to witness an arbitrary $SinsemillaHash$ message).

Some of these constraints can be implemented with reusable circuit gadgets. We define a custom gate controlled by the selector $q_{decompose}$ to hold the remaining constraints.

Bit length constraints

Chunk $c$ is directly constrained by Sinsemilla. We constrain the remaining chunks with the following constraints:

$a_{0}, a_{1}$

$z_{1, a}$ , the index-1 running sum output of $SinsemillaHash (a)$ , is copied into the gate. $z_{1, a}$ has been constrained by $SinsemillaHash$ to be $240$ bits, and is precisely $a_{1}$ . We recover chunk $a_{0}$ using $a, z_{1, a} :$ $z_{1, a} ⟹ a_{0} = \frac{a - a _{0}}{2 ^{10}} = a_{1} = a - z_{1, a} \cdot 2^{10} .$

$b_{0}, b_{1}, b_{2}$

$z_{1, b}$ , the index-1 running sum output of $SinsemillaHash (b)$ , is copied into the gate. $z_{1, b}$ has been constrained by $SinsemillaHash$ to be $10$ bits. We witness the subpieces $b_{1}, b_{2}$ outside this gate, and constrain them each to be $5$ bits. Inside the gate, we check that $b_{1} + 2^{5} \cdot b_{2} = z_{1, b} .$ We also recover the subpiece $b_{0}$ using $(b, z_{1, b})$ : $z_{1, b} ⟹ b_{0} = \frac{b - b _{0.. = 10}}{2 ^{10}} = b - (z_{1, b} \cdot 2^{10}) .$

Constraints

$Degree 2 Constraint short_lookup_range_check (b_{1}, 5) short_lookup_range_check (b_{2}, 5) q_{decompose} \cdot (z_{1, b} - (b_{1} + b_{2} \cdot 2^{5})) = 0$

where $short_lookup_range_check ()$ is a short lookup range check.

Decomposition constraints

We have now derived or witnessed every subpiece, and range-constrained every subpiece:

$a_{0}$ ( $10$ bits), derived as $a_{0} = a - 2^{10} \cdot z_{1, a}$ ;
$a_{1}$ ( $240$ bits), equal to $z_{1, a}$ ;
$b_{0}$ ( $10$ bits), derived as $b_{0} = b - 2^{10} \cdot z_{1, b}$ ;
$b_{1}$ ( $5$ bits) is witnessed and constrained outside the gate;
$b_{2}$ ( $5$ bits) is witnessed and constrained outside the gate;
$c$ ( $250$ bits) is witnessed and constrained outside the gate.
$b_{1} + 2^{5} \cdot b_{2}$ is constrained to equal $z_{1, b}$ .

We can now use them to reconstruct the original field element inputs:

$l left right = a_{0} = a_{1} + 2^{240} \cdot b_{0} + 2^{250} \cdot b_{1} = b_{2} + 2^{5} \cdot c$

$Degree 222 Constraint q_{decompose} \cdot (a_{0} - l) = 0 q_{decompose} \cdot (a_{1} + (b_{0} + b_{1} \cdot 2^{10}) \cdot 2^{240} - left) = 0 q_{decompose} \cdot (b_{2} + c \cdot 2^{5} - right) = 0$

Region layout

$a z_{1, a} b z_{1, b} c b_{1} left b_{2} right l q_{decompose} 10$

Circuit components

The Orchard circuit spans $10$ advice columns while the $Sinsemilla$ chip only uses $5$ advice columns. We distribute the path hashing evenly across two $Sinsemilla$ chips to make better use of the available circuit area. Since the output from the previous layer hash is copied into the next layer hash, we maintain continuity even when moving from one chip to the other.

Decomposition

Given a field element $α$ , these gadgets decompose it into $W$ $K$ -bit windows $α = k_{0} + 2^{K} \cdot k_{1} + 2^{2 K} \cdot k_{2} + \dots + 2^{(W - 1) K} \cdot k_{W - 1}$ where each $k_{i}$ a $K$ -bit value.

This is done using a running sum $z_{i}, i \in [0.. W) .$ We initialize the running sum $z_{0} = α,$ and compute subsequent terms $z_{i + 1} = \frac{z _{i} - k _{i}}{2 ^{K}} .$ This gives us:

$z_{0} z_{1} z_{2} ↓ z_{W} = α = k_{0} + 2^{K} \cdot k_{1} + 2^{2 K} \cdot k_{2} + 2^{3 K} \cdot k_{3} + \dots, = (z_{0} - k_{0}) / 2^{K} = k_{1} + 2^{K} \cdot k_{2} + 2^{2 K} \cdot k_{3} + \dots, = (z_{1} - k_{1}) / 2^{K} = k_{2} + 2^{K} \cdot k_{3} + \dots, ⋮ (in strict mode) = (z_{W - 1} - k_{W - 1}) / 2^{K} = 0 (because z_{W - 1} = k_{W - 1})$

Strict mode

Strict mode constrains the running sum output $z_{W}$ to be zero, thus range-constraining the field element to be within $W \cdot K$ bits.

In strict mode, we are also assured that $z_{W - 1} = k_{W - 1}$ gives us the last window in the decomposition.

Lookup decomposition

This gadget makes use of a $K$ -bit lookup table to decompose a field element $α$ into $K$ -bit words. Each $K$ -bit word $k_{i} = z_{i} - 2^{K} \cdot z_{i + 1}$ is range-constrained by a lookup in the $K$ -bit table.

The region layout for the lookup decomposition uses a single advice column $z$ , and two selectors $q_{l oo k u p}$ and $q_{r u nnin g} .$ $z z_{0} z_{1} ⋮ z_{n - 1} z_{n} q_{lookup} 11 ⋮ 10 q_{running} 11 ⋮ 10$

Short range check

Using two $K$ -bit lookups, we can range-constrain a field element $α$ to be $n$ bits, where $n \leq K .$ To do this:

Constrain $0 \leq α < 2^{K}$ to be within $K$ bits using a $K$ -bit lookup.
Constrain $0 \leq α \cdot 2^{K - n} < 2^{K}$ to be within $K$ bits using a $K$ -bit lookup.

The short variant of the lookup decomposition introduces a $q_{bi t s hi f t}$ selector. The same advice column $z$ has here been renamed to $word$ for clarity: $word α α^{'} 2^{K - n} q_{lookup} 110 q_{running} 000 q_{bitshift} 010$

where $α^{'} = α \cdot 2^{K - n} .$ Note that $2^{K - n}$ is assigned to a fixed column at keygen, and copied in at proving time. This is used in the gate enabled by the $q_{bitshift}$ selector to check that $α$ was shifted correctly: $Degree 2 Constraint q_{bitshift} \cdot ((α \cdot 2^{K - n}) - α^{'})$

Combined lookup expression

Since the lookup decomposition and its short variant both make use of the same lookup table, we combine their lookup input expressions into a single one:

$q_{lookup} \cdot (q_{running} \cdot (z_{i} - 2^{K} \cdot z_{i + 1}) + (1 - q_{running}) \cdot word)$

where $z_{i}$ and $word$ are the same cell (but distinguished here for clarity of usage).

Short range decomposition

For a short range (for instance, $[0, range)$ where $range \leq 8$ ), we can range-constrain each word using a degree- $range$ polynomial constraint instead of a lookup: $range_check (w or d, r an g e) = word \cdot (1 - word) \dots (range - 1 - word) .$

SHA-256

Specification

SHA-256 is specified in NIST FIPS PUB 180-4.

Unlike the specification, we use $⊞$ for addition modulo $2^{32}$ , and $+$ for field addition. $\oplus$ is used for XOR.

Gadget interface

SHA-256 maintains state in eight 32-bit variables. It processes input as 512-bit blocks, but internally splits these blocks into 32-bit chunks. We therefore designed the SHA-256 gadget to consume input in 32-bit chunks.

Chip instructions

The SHA-256 gadget requires a chip with the following instructions:

#![allow(unused)]
fn main() {
extern crate halo2_proofs;
use halo2_proofs::plonk::Error;
use std::fmt;

trait Chip: Sized {}
trait Layouter<C: Chip> {}
const BLOCK_SIZE: usize = 16;
const DIGEST_SIZE: usize = 8;

pub trait Sha256Instructions: Chip {
    /// Variable representing the SHA-256 internal state.
    type State: Clone + fmt::Debug;
    /// Variable representing a 32-bit word of the input block to the SHA-256 compression
    /// function.
    type BlockWord: Copy + fmt::Debug;

    /// Places the SHA-256 IV in the circuit, returning the initial state variable.
    fn initialization_vector(layouter: &mut impl Layouter<Self>) -> Result<Self::State, Error>;

    /// Starting from the given initial state, processes a block of input and returns the
    /// final state.
    fn compress(
        layouter: &mut impl Layouter<Self>,
        initial_state: &Self::State,
        input: [Self::BlockWord; BLOCK_SIZE],
    ) -> Result<Self::State, Error>;

    /// Converts the given state into a message digest.
    fn digest(
        layouter: &mut impl Layouter<Self>,
        state: &Self::State,
    ) -> Result<[Self::BlockWord; DIGEST_SIZE], Error>;
}
}

TODO: Add instruction for computing padding.

This set of instructions was chosen to strike a balance between the reusability of the instructions, and the scope for chips to internally optimise them. In particular, we considered splitting the compression function into its constituent parts (Ch, Maj etc), and providing a compression function gadget that implemented the round logic. However, this would prevent chips from using relative references between the various parts of a compression round. Having an instruction that implements all compression rounds is also similar to the Intel SHA extensions, which provide an instruction that performs multiple compression rounds.

16-bit table chip for SHA-256

This chip implementation is based around a single 16-bit lookup table. It requires a minimum of $2^{16}$ circuit rows, and is therefore suitable for use in larger circuits.

We target a maximum constraint degree of $9$ . That will allow us to handle constraining carries and "small pieces" to a range of up to ${0..7}$ in one row.

Compression round

There are $64$ compression rounds. Each round takes 32-bit values $A, B, C, D, E, F, G, H$ as input, and performs the following operations:

$C h (E, F, G) M aj (A, B, C) Σ_{0} (A) Σ_{1} (E) H^{'} E_{n e w} A_{n e w} = = = = = = = = (E \land F) \oplus (\neg E \land G) (A \land B) \oplus (A \land C) \oplus (B \land C) co u n t (A, B, C) \geq 2 (A ⋙ 2) \oplus (A ⋙ 13) \oplus (A ⋙ 22) (E ⋙ 6) \oplus (E ⋙ 11) \oplus (E ⋙ 25) H + C h (E, F, G) + Σ_{1} (E) + K_{t} + W_{t} re d u c e_{6} (H^{'} + D) re d u c e_{7} (H^{'} + M aj (A, B, C) + Σ_{0} (A))$

where $re d u c e_{i}$ must handle a carry $0 \leq carry < i$ .

The SHA-256 compression function

Define $spread$ as a table mapping a $16$ -bit input to an output interleaved with zero bits. We do not require a separate table for range checks because $spread$ can be used.

Modular addition

To implement addition modulo $2^{32}$ , we note that this is equivalent to adding the operands using field addition, and then masking away all but the lowest 32 bits of the result. For example, if we have two operands $a$ and $b$ :

$a ⊞ b = c,$

we decompose each operand (along with the result) into 16-bit chunks:

$(a_{L} : Z_{2^{16}}, a_{H} : Z_{2^{16}}) ⊞ (b_{L} : Z_{2^{16}}, b_{H} : Z_{2^{16}}) = (c_{L} : Z_{2^{16}}, c_{H} : Z_{2^{16}}),$

and then reformulate the constraint using field addition:

$carry \cdot 2^{32} + c_{H} \cdot 2^{16} + c_{L} = (a_{H} + b_{H}) \cdot 2^{16} + a_{L} + b_{L} .$

More generally, any bit-decomposition of the output can be used, not just a decomposition into 16-bit chunks. Note that this correctly handles the carry from $a_{L} + b_{L}$ .

This constraint requires that each chunk is correctly range-checked (or else an assignment could overflow the field).

The operand and result chunks can be constrained using $spread$ , by looking up each chunk in the "dense" column within a subset of the table. This way we additionally get the "spread" form of the output for free; in particular this is true for the output of the bottom-right $⊞$ which becomes $A_{n e w}$ , and the output of the leftmost $⊞$ which becomes $E_{n e w}$ . We will use this below to optimize $M aj$ and $C h$ .
$carry$ must be constrained to the precise range of allowed carry values for the number of operands. We do this with a small range constraint.

Maj function

$M aj$ can be done in $4$ lookups: $2 spread * 2$ chunks

As mentioned above, after the first round we already have $A$ in spread form $A^{'}$ . Similarly, $B$ and $C$ are equal to the $A$ and $B$ respectively of the previous round, and therefore in the steady state we already have them in spread form $B^{'}$ and $C^{'}$ . In fact we can also assume we have them in spread form in the first round, either from the fixed IV or from the use of $spread$ to reduce the output of the feedforward in the previous block.
Add the spread forms in the field: $M^{'} = A^{'} + B^{'} + C^{'}$ ;
- We can add them as $32$ -bit words or in pieces; it's equivalent
Witness the compressed even bits $M_{i}^{e v e n}$ and the compressed odd bits $M_{i}^{o dd}$ for $i = {0..1}$ ;
Constrain $M^{'} = spread (M_{0}^{e v e n}) + 2 \cdot spread (M_{0}^{o dd}) + 2^{32} \cdot spread (M_{1}^{e v e n}) + 2^{33} \cdot spread (M_{1}^{o dd})$ , where $M_{i}^{o dd}$ is the $M aj$ function output.

Note: by "even" bits we mean the bits of weight an even-power of $2$ , i.e. of weight $2^{0}, 2^{2}, \dots$ . Similarly by "odd" bits we mean the bits of weight an odd-power of $2$ .

Ch function

TODO: can probably be optimized to $4$ or $5$ lookups using an additional table.

$C h$ can be done in $8$ lookups: $4 spread * 2$ chunks

As mentioned above, after the first round we already have $E$ in spread form $E^{'}$ . Similarly, $F$ and $G$ are equal to the $E$ and $F$ respectively of the previous round, and therefore in the steady state we already have them in spread form $F^{'}$ and $G^{'}$ . In fact we can also assume we have them in spread form in the first round, either from the fixed IV or from the use of $spread$ to reduce the output of the feedforward in the previous block.
Calculate $P^{'} = E^{'} + F^{'}$ and $Q^{'} = (e v e n s - E^{'}) + G^{'}$ , where $e v e n s = spread (2^{32} - 1)$ .
- We can add them as $32$ -bit words or in pieces; it's equivalent.
- $e v e n s - E^{'}$ works to compute the spread of $\neg E$ even though negation and $spread$ do not commute in general. It works because each spread bit in $E^{'}$ is subtracted from $1$ , so there are no borrows.
Witness $P_{i}^{e v e n}, P_{i}^{o dd}, Q_{i}^{e v e n}, Q_{i}^{o dd}$ such that $P^{'} = spread (P_{0}^{e v e n}) + 2 \cdot spread (P_{0}^{o dd}) + 2^{32} \cdot spread (P_{1}^{e v e n}) + 2^{33} \cdot spread (P_{1}^{o dd})$ , and similarly for $Q^{'}$ .
${P_{i}^{o dd} + Q_{i}^{o dd}}_{i = 0..1}$ is the $C h$ function output.

Σ_0 function

$Σ_{0} (A)$ can be done in $6$ lookups.

To achieve this we first split $A$ into pieces $(a, b, c, d)$ , of lengths $(2, 11, 9, 10)$ bits respectively counting from the little end. At the same time we obtain the spread forms of these pieces. This can all be done in two PLONK rows, because the $10$ and $11$ -bit pieces can be handled using $spread$ lookups, and the $9$ -bit piece can be split into $3 * 3$ -bit subpieces. The latter and the remaining $2$ -bit piece can be range-checked by polynomial constraints in parallel with the two lookups, two small pieces in each row. The spread forms of these small pieces are found by interpolation.

Note that the splitting into pieces can be combined with the reduction of $A_{n e w}$ , i.e. no extra lookups are needed for the latter. In the last round we reduce $A_{n e w}$ after adding the feedforward (requiring a carry of up to $7$ which is fine).

$(A ⋙ 2) \oplus (A ⋙ 13) \oplus (A ⋙ 22)$ is equivalent to $(A ⋙ 2) \oplus (A ⋙ 13) \oplus (A ⋘ 10)$ :

Then, using $4$ more $spread$ lookups we obtain the result as the even bits of a linear combination of the pieces:

$R^{'} = (a (b (c 4^{30} a 4^{21} b 4^{23} c ∣∣ ∣∣ ∣∣ + + + d a b 4^{20} d 4^{19} a 4^{12} b ∣∣ ∣∣ ∣∣ ⇓ + + + c d a 4^{11} c 4^{9} d 4^{10} a ∣∣ ∣∣ ∣∣ + + + b) c) d) b c d \oplus \oplus + +$

That is, we witness the compressed even bits $R_{i}^{e v e n}$ and the compressed odd bits $R_{i}^{o dd}$ , and constrain $R^{'} = spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ where ${R_{i}^{e v e n}}_{i = 0..1}$ is the $Σ_{0}$ function output.

Σ_1 function

$Σ_{1} (E)$ can be done in $6$ lookups.

To achieve this we first split $E$ into pieces $(a, b, c, d)$ , of lengths $(6, 5, 14, 7)$ bits respectively counting from the little end. At the same time we obtain the spread forms of these pieces. This can all be done in two PLONK rows, because the $7$ and $14$ -bit pieces can be handled using $spread$ lookups, the $5$ -bit piece can be split into $3$ and $2$ -bit subpieces, and the $6$ -bit piece can be split into $2 * 3$ -bit subpieces. The four small pieces can be range-checked by polynomial constraints in parallel with the two lookups, two small pieces in each row. The spread forms of these small pieces are found by interpolation.

Note that the splitting into pieces can be combined with the reduction of $E_{n e w}$ , i.e. no extra lookups are needed for the latter. In the last round we reduce $E_{n e w}$ after adding the feedforward (requiring a carry of up to $6$ which is fine).

$(E ⋙ 6) \oplus (E ⋙ 11) \oplus (E ⋙ 25)$ is equivalent to $(E ⋙ 6) \oplus (E ⋙ 11) \oplus (E ⋘ 7)$ .

Then, using $4$ more $spread$ lookups we obtain the result as the even bits of a linear combination of the pieces, in the same way we did for $Σ_{0}$ :

$R^{'} = (a (b (c 4^{26} a 4^{27} b 4^{18} c ∣∣ ∣∣ ∣∣ + + + d a b 4^{19} d 4^{21} a 4^{13} b ∣∣ ∣∣ ∣∣ ⇓ + + + c d a 4^{5} c 4^{14} d 4^{7} a ∣∣ ∣∣ ∣∣ + + + b) c) d) b c d \oplus \oplus + +$

Block decomposition

For each block $M \in {0, 1}^{512}$ of the padded message, $64$ words of $32$ bits each are constructed as follows:

The first $16$ are obtained by splitting $M$ into $32$ -bit blocks $M = W_{0} ∣∣ W_{1} ∣∣ \dots ∣∣ W_{14} ∣∣ W_{15};$
The remaining $48$ words are constructed using the formula: $W_{i} = σ_{1} (W_{i - 2}) ⊞ W_{i - 7} ⊞ σ_{0} (W_{i - 15}) ⊞ W_{i - 16},$ for $16 \leq i < 64$ .

Note: $0$ -based numbering is used for the $W$ word indices.

$σ_{0} (X) σ_{1} (X) = = (X ⋙ 7) \oplus (X ⋙ 18) \oplus (X ≫ 3) (X ⋙ 17) \oplus (X ⋙ 19) \oplus (X ≫ 10)$

Note: $≫$ is a right-shift, not a rotation.

σ_0 function

$(X ⋙ 7) \oplus (X ⋙ 18) \oplus (X ≫ 3)$ is equivalent to $(X ⋙ 7) \oplus (X ⋘ 14) \oplus (X ≫ 3)$ .

As above but with pieces $(a, b, c, d)$ of lengths $(3, 4, 11, 14)$ counting from the little end. Split $b$ into two $2$ -bit subpieces.

$R^{'} = (0^{[3]} (b (c 4^{28} b 4^{21} c ∣∣ ∣∣ ∣∣ + + d a b 4^{15} d 4^{25} a 4^{17} b ∣∣ ∣∣ ∣∣ ⇓ + + + c d a 4^{4} c 4^{11} d 4^{14} a ∣∣ ∣∣ ∣∣ + + + b) c) d) b c d \oplus \oplus + +$

σ_1 function

$(X ⋙ 17) \oplus (X ⋙ 19) \oplus (X ≫ 10)$ is equivalent to $(X ⋘ 15) \oplus (X ⋘ 13) \oplus (X ≫ 10)$ .

TODO: this diagram doesn't match the expression on the right. This is just for consistency with the other diagrams.

As above but with pieces $(a, b, c, d)$ of lengths $(10, 7, 2, 13)$ counting from the little end. Split $b$ into $(3, 2, 2)$ -bit subpieces.

$R^{'} = (0^{[10]} (b (c 4^{25} b 4^{30} c ∣∣ ∣∣ ∣∣ + + d a b 4^{9} d 4^{15} a 4^{23} b ∣∣ ∣∣ ∣∣ ⇓ + + + c d a 4^{7} c 4^{2} d 4^{13} a ∣∣ ∣∣ ∣∣ + + + b) c) d) b c d \oplus \oplus + +$

Message scheduling

We apply $σ_{0}$ to $W_{1..48}$ , and $σ_{1}$ to $W_{14..61}$ . In order to avoid redundant applications of $spread$ , we can merge the splitting into pieces for $σ_{0}$ and $σ_{1}$ in the case of $W_{14..48}$ . Merging the piece lengths $(3, 4, 11, 14)$ and $(10, 7, 2, 13)$ gives pieces of lengths $(3, 4, 3, 7, 1, 1, 13)$ .

If we can do the merged split in $3$ rows (as opposed to a total of $4$ rows when splitting for $σ_{0}$ and $σ_{1}$ separately), we save $35$ rows.

These might even be doable in $2$ rows; not sure. —Daira

We can merge the reduction mod $2^{32}$ of $W_{16..61}$ into their splitting when they are used to compute subsequent words, similarly to what we did for $A$ and $E$ in the round function.

We will still need to reduce $W_{62..63}$ since they are not split. (Technically we could leave them unreduced since they will be reduced later when they are used to compute $A_{n e w}$ and $E_{n e w}$ -- but that would require handling a carry of up to $10$ rather than $6$ , so it's not worth the complexity.)

The resulting message schedule cost is:

$2$ rows to constrain $W_{0}$ to $32$ bits
- This is technically optional, but let's do it for robustness, since the rest of the input is constrained for free.
$13 * 2$ rows to split $W_{1..13}$ into $(3, 4, 11, 14)$ -bit pieces
$35 * 3$ rows to split $W_{14..48}$ into $(3, 4, 3, 7, 1, 1, 13)$ -bit pieces (merged with a reduction for $W_{16..48}$ )
$13 * 2$ rows to split $W_{49..61}$ into $(10, 7, 2, 13)$ -bit pieces (merged with a reduction)
$4 * 48$ rows to extract the results of $σ_{0}$ for $W_{1..48}$
$4 * 48$ rows to extract the results of $σ_{1}$ for $W_{14..61}$
$2 * 2$ rows to reduce $W_{62..63}$
$= 547$ rows.

Overall cost

For each round:

$8$ rows for $C h$
$4$ rows for $M aj$
$6$ rows for $Σ_{0}$
$6$ rows for $Σ_{1}$
$re d u c e_{6}$ and $re d u c e_{7}$ are always free
$= 24$ per round

This gives $24 * 64 = 1792$ rows for all of "step 3", to which we need to add:

$547$ rows for message scheduling
$2 * 8$ rows for $8$ reductions mod $2^{32}$ in "step 4"

giving a total of $2099$ rows.

Tables

We only require one table $spread$ , with $2^{16}$ rows and $3$ columns. We need a tag column to allow selecting $(7, 10, 11, 13, 14)$ -bit subsets of the table for $Σ_{0..1}$ and $σ_{0..1}$ .

`spread` table

row	tag	table (16b)	spread (32b)
$0$	0	0000000000000000	00000000000000000000000000000000
$1$	0	0000000000000001	00000000000000000000000000000001
$2$	0	0000000000000010	00000000000000000000000000000100
$3$	0	0000000000000011	00000000000000000000000000000101
...	0	...	...
$2^{7} - 1$	0	0000000001111111	00000000000000000001010101010101
$2^{7}$	1	0000000010000000	00000000000000000100000000000000
...	1	...	...
$2^{10} - 1$	1	0000001111111111	00000000000001010101010101010101
...	2	...	...
$2^{11} - 1$	2	0000011111111111	00000000010101010101010101010101
...	3	...	...
$2^{13} - 1$	3	0001111111111111	00000001010101010101010101010101
...	4	...	...
$2^{14} - 1$	4	0011111111111111	00000101010101010101010101010101
...	5	...	...
$2^{16} - 1$	5	1111111111111111	01010101010101010101010101010101

For example, to do an $11$ -bit $spread$ lookup, we polynomial-constrain the tag to be in ${0, 1, 2}$ . For the most common case of a $16$ -bit lookup, we don't need to constrain the tag. Note that we can fill any unused rows beyond $2^{16}$ with a duplicate entry, e.g. all-zeroes.

Gates

Choice gate

Input from previous operations:

$E^{'}, F^{'}, G^{'},$ 64-bit spread forms of 32-bit words $E, F, G$ , assumed to be constrained by previous operations
- in practice, we'll have the spread forms of $E^{'}, F^{'}, G^{'}$ after they've been decomposed into 16-bit subpieces
$e v e n s$ is defined as $spread (2^{32} - 1)$
- $e v e n s_{0} = e v e n s_{1} = spread (2^{16} - 1)$

E ∧ F

s_ch	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$
0	{0,1,2,3,4,5}	$P_{0}^{e v e n}$	$spread (P_{0}^{e v e n})$	$spread (E^{l o})$	$spread (E^{hi})$
1	{0,1,2,3,4,5}	$P_{0}^{o dd}$	$spread (P_{0}^{o dd})$	$spread (P_{1}^{o dd})$
0	{0,1,2,3,4,5}	$P_{1}^{e v e n}$	$spread (P_{1}^{e v e n})$	$spread (F^{l o})$	$spread (F^{hi})$
0	{0,1,2,3,4,5}	$P_{1}^{o dd}$	$spread (P_{1}^{o dd})$

¬E ∧ G

s_ch_neg	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
0	{0,1,2,3,4,5}	$Q_{0}^{e v e n}$	$spread (Q_{0}^{e v e n})$	$spread (E_{n e g}^{l o})$	$spread (E_{n e g}^{hi})$	$spread (E^{l o})$
1	{0,1,2,3,4,5}	$Q_{0}^{o dd}$	$spread (Q_{0}^{o dd})$	$spread (Q_{1}^{o dd})$		$spread (E^{hi})$
0	{0,1,2,3,4,5}	$Q_{1}^{e v e n}$	$spread (Q_{1}^{e v e n})$	$spread (G^{l o})$	$spread (G^{hi})$
0	{0,1,2,3,4,5}	$Q_{1}^{o dd}$	$spread (Q_{1}^{o dd})$

Constraints:

s_ch (choice): $L H S - R H S = 0$
- $L H S = a_{3} ω^{- 1} + a_{3} ω + 2^{32} (a_{4} ω^{- 1} + a_{4} ω)$
- $R H S = a_{2} ω^{- 1} + 2 * a_{2} + 2^{32} (a_{2} ω + 2 * a_{3})$
s_ch_neg (negation): s_ch with an extra negation check
$spread$ lookup on $(a_{0}, a_{1}, a_{2})$
permutation between $(a_{2}, a_{3})$

Output: $C h (E, F, G) = P^{o dd} + Q^{o dd} = (P_{0}^{o dd} + Q_{0}^{o dd}) + 2^{16} (P_{1}^{o dd} + Q_{1}^{o dd})$

Majority gate

Input from previous operations:

$A^{'}, B^{'}, C^{'},$ 64-bit spread forms of 32-bit words $A, B, C$ , assumed to be constrained by previous operations
- in practice, we'll have the spread forms of $A^{'}, B^{'}, C^{'}$ after they've been decomposed into $16$ -bit subpieces

s_maj	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
0	{0,1,2,3,4,5}	$M_{0}^{e v e n}$	$spread (M_{0}^{e v e n})$		$spread (A^{l o})$	$spread (A^{hi})$
1	{0,1,2,3,4,5}	$M_{0}^{o dd}$	$spread (M_{0}^{o dd})$	$spread (M_{1}^{o dd})$	$spread (B^{l o})$	$spread (B^{hi})$
0	{0,1,2,3,4,5}	$M_{1}^{e v e n}$	$spread (M_{1}^{e v e n})$		$spread (C^{l o})$	$spread (C^{hi})$
0	{0,1,2,3,4,5}	$M_{1}^{o dd}$	$spread (M_{1}^{o dd})$

Constraints:

s_maj (majority): $L H S - R H S = 0$
- $L H S = spread (M_{0}^{e v e n}) + 2 \cdot spread (M_{0}^{o dd}) + 2^{32} \cdot spread (M_{1}^{e v e n}) + 2^{33} \cdot spread (M_{1}^{o dd})$
- $R H S = A^{'} + B^{'} + C^{'}$
$spread$ lookup on $(a_{0}, a_{1}, a_{2})$
permutation between $(a_{2}, a_{3})$

Output: $M aj (A, B, C) = M^{o dd} = M_{0}^{o dd} + 2^{16} M_{1}^{o dd}$

Σ_0 gate

$A$ is a 32-bit word split into $(2, 11, 9, 10)$ -bit chunks, starting from the little end. We refer to these chunks as $(a (2), b (11), c (9), d (10))$ respectively, and further split $c (9)$ into three 3-bit chunks $c (9)^{l o}, c (9)^{mi d}, c (9)^{hi}$ . We witness the spread versions of the small chunks.

$Σ_{0} (A) = = (A ⋙ 2) \oplus (A ⋙ 13) \oplus (A ⋙ 22) (A ⋙ 2) \oplus (A ⋙ 13) \oplus (A ⋘ 10)$

s_upp_sigma_0	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$c (9)^{l o}$	$spread (c (9)^{l o})$	$c (9)^{mi d}$	$spread (c (9)^{mi d})$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (10))$	$spread (b (11))$	$c (9)$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$a (2)$	$spread (a (2))$	$c (9)^{hi}$	$spread (c (9)^{hi})$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_upp_sigma_0 ( $Σ_{0}$ constraint): $L H S - R H S + t a g + d eco m p ose = 0$

$t a g d eco m p ose L H S = = = co n s t r ai n_{1} (a_{0} ω^{- 1}) + co n s t r ai n_{2} (a_{0} ω) a (2) + 2^{2} b (11) + 2^{13} c (9)^{l o} + 2^{16} c (9)^{mi d} + 2^{19} c (9)^{hi} + 2^{22} d (10) - A spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{30} spread (a (2)) 4^{21} spread (b (11)) 4^{29} spread (c (9)^{hi}) + + + 4^{20} spread (d (10)) 4^{19} spread (a (2)) 4^{26} spread (c (9)^{mi d}) + + + 4^{17} spread (c (9)^{hi}) 4^{9} spread (d (10)) 4^{23} spread (c (9)^{l o}) + + + 4^{14} spread (c (9)^{mi d}) 4^{6} spread (c (9)^{hi}) 4^{12} spread (b (11)) + + + 4^{11} spread (c (9)^{l o}) 4^{3} spread (c (9)^{mi d}) 4^{10} spread (a (2)) + + + spread (b (11)) spread (c (9)^{l o}) spread (d (10)) + +$

$spread$ lookup on $a_{0}, a_{1}, a_{2}$
2-bit range check and 2-bit spread check on $a (2)$
3-bit range check and 3-bit spread check on $c (9)^{l o}, c (9)^{mi d}, c (9)^{hi}$

(see section Helper gates)

Output: $Σ_{0} (A) = R^{e v e n} = R_{0}^{e v e n} + 2^{16} R_{1}^{e v e n}$

Σ_1 gate

$E$ is a 32-bit word split into $(6, 5, 14, 7)$ -bit chunks, starting from the little end. We refer to these chunks as $(a (6), b (5), c (14), d (7))$ respectively, and further split $a (6)$ into two 3-bit chunks $a (6)^{l o}, a (6)^{hi}$ and $b$ into (2,3)-bit chunks $b (5)^{l o}, b (5)^{hi}$ . We witness the spread versions of the small chunks.

$Σ_{1} (E) = = (E ⋙ 6) \oplus (E ⋙ 11) \oplus (E ⋙ 25) (E ⋙ 6) \oplus (E ⋙ 11) \oplus (E ⋘ 7)$

s_upp_sigma_1	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$b (5)^{l o}$	$spread (b (5)^{l o})$	$b (5)^{hi}$	$spread (b (5)^{hi})$	$b (5)$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (7))$	$spread (c (14))$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$a (6)^{l o}$	$spread (a (6)^{l o})$	$a (6)^{hi}$	$spread (a (6)^{hi})$	$a (6)$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_upp_sigma_1 ( $Σ_{1}$ constraint): $L H S - R H S + t a g + d eco m p ose = 0$

$t a g d eco m p ose L H S = = = a_{0} ω^{- 1} + co n s t r ai n_{4} (a_{0} ω) a (6)^{l o} + 2^{3} a (6)^{hi} + 2^{6} b (5)^{l o} + 2^{8} b (5)^{hi} + 2^{11} c (14) + 2^{25} d (7) - E spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{29} spread (a (6)^{hi}) 4^{29} spread (b (5)^{hi}) 4^{18} spread (c (14)) + + + 4^{26} spread (a (6)^{l o}) 4^{27} spread (b (5)^{l o}) 4^{15} spread (b (5)^{hi}) + + + 4^{19} spread (d (7)) 4^{24} spread (a (6)^{hi}) 4^{13} spread (b (5)^{l o}) + + + 4^{5} spread (c (14)) 4^{21} spread (a (6)^{l o}) 4^{10} spread (a (6)^{hi}) + + + 4^{2} spread (b (5)^{hi}) 4^{14} spread (d (7)) 4^{7} spread (a (6)^{l o}) + + + spread (b (5)^{l o}) spread (c (14)) spread (d (7)) + +$

$spread$ lookup on $a_{0}, a_{1}, a_{2}$
2-bit range check and 2-bit spread check on $b (5)^{l o}$
3-bit range check and 3-bit spread check on $a (6)^{l o}, a (6)^{hi}, b (4)^{hi}$

(see section Helper gates)

Output: $Σ_{1} (E) = R^{e v e n} = R_{0}^{e v e n} + 2^{16} R_{1}^{e v e n}$

σ_0 gate

v1

v1 of the $σ_{0}$ gate takes in a word that's split into $(3, 4, 11, 14)$ -bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a (3), b (4), c (11), d (14)) .$ $b (4)$ is further split into two 2-bit chunks $b (4)^{l o}, b (4)^{hi} .$ We witness the spread versions of the small chunks. We already have $spread (c (11))$ and $spread (d (14))$ from the message scheduling.

$(X ⋙ 7) \oplus (X ⋙ 18) \oplus (X ≫ 3)$ is equivalent to $(X ⋙ 7) \oplus (X ⋘ 14) \oplus (X ≫ 3)$ .

s_low_sigma_0	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$b (4)^{l o}$	$spread (b (4)^{l o})$	$b (4)^{hi}$	$spread (b (4)^{hi})$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (c)$	$spread (d)$	$b (4)$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$0$	$0$	$a$	$spread (a)$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_low_sigma_0 ( $σ_{0}$ v1 constraint): $L H S - R H S = 0$

$L H S = spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{30} b (4)^{hi} 4^{21} c (11) + + 4^{15} d (14) 4^{28} b (4)^{l o} 4^{19} b (4)^{hi} + + + 4^{4} c (11) 4^{25} a (3) 4^{17} b (4)^{l o} + + + 4^{2} b (4)^{hi} 4^{11} d (14) 4^{14} a (3) + + + b (4)^{l o} c (11) d (14) + +$

check that b was properly split into subsections for 4-bit pieces.
- $W^{b (4) l o} + 2^{2} W^{b (4) hi} - W = 0$
2-bit range check and 2-bit spread check on $b (4)^{l o}, b (4)^{hi}$
3-bit range check and 3-bit spread check on $a (3)$

v2

v2 of the $σ_{0}$ gate takes in a word that's split into $(3, 4, 3, 7, 1, 1, 13)$ -bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a (3), b (4), c (3), d (7), e (1), f (1), g (13)) .$ We already have $spread (d (7)), spread (g (13))$ from the message scheduling. The 1-bit $e (1), f (1)$ remain unchanged by the spread operation and can be used directly. We further split $b (4)$ into two 2-bit chunks $b (4)^{l o}, b (4)^{hi} .$ We witness the spread versions of the small chunks.

$(X ⋙ 7) \oplus (X ⋙ 18) \oplus (X ≫ 3)$ is equivalent to $(X ⋙ 7) \oplus (X ⋘ 14) \oplus (X ≫ 3)$ .

s_low_sigma_0_v2	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$b (4)^{l o}$	$spread (b (4)^{l o})$	$b (4)^{hi}$	$spread (b (4)^{hi})$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (7))$	$spread (g (13))$	$b (4)$	$e (1)$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$a (3)$	$spread (a (3))$	$c (3)$	$spread (c (3))$	$f (1)$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_low_sigma_0_v2 ( $σ_{0}$ v2 constraint): $L H S - R H S = 0$

$L H S = spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{30} b (4)^{hi} 4^{31} e (1) + + 4^{16} g (13) 4^{28} b (4)^{l o} 4^{24} d (7) + + + 4^{15} f (1) 4^{25} a (3) 4^{21} c (3) + + + 4^{14} e (1) 4^{12} g (13) 4^{19} b (4)^{hi} + + + 4^{7} d (7) 4^{11} f (1) 4^{17} b (4)^{l o} + + + 4^{4} c (3) 4^{10} e (1) 4^{14} a (3) + + + 4^{2} b (4)^{hi} 4^{3} d (7) 4^{1} g (13) + + + b (4)^{l o} c (3) f (1) + +$

check that b was properly split into subsections for 4-bit pieces.
- $W^{b (4) l o} + 2^{2} W^{b (4) hi} - W = 0$
2-bit range check and 2-bit spread check on $b (4)^{l o}, b (4)^{hi}$
3-bit range check and 3-bit spread check on $a (3), c (3)$

σ_1 gate

v1

v1 of the $σ_{1}$ gate takes in a word that's split into $(10, 7, 2, 13)$ -bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a (10), b (7), c (2), d (13)) .$ $b (7)$ is further split into $(2, 2, 3)$ -bit chunks $b (7)^{l o}, b (7)^{mi d}, b (7)^{hi} .$ We witness the spread versions of the small chunks. We already have $spread (a (10))$ and $spread (d (13))$ from the message scheduling.

$(X ⋙ 17) \oplus (X ⋙ 19) \oplus (X ≫ 10)$ is equivalent to $(X ⋘ 15) \oplus (X ⋘ 13) \oplus (X ≫ 10)$ .

s_low_sigma_1	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$b (7)^{l o}$	$spread (b (7)^{l o})$	$b (7)^{mi d}$	$spread (b (7)^{mi d})$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (a (10))$	$spread (d (13))$	$b (7)$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$c (2)$	$spread (c (2))$	$b (7)^{hi}$	$spread (b (7)^{hi})$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_low_sigma_1 ( $σ_{1}$ v1 constraint): $L H S - R H S = 0$ $L H S = spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{29} b (7)^{hi} 4^{30} c (2) + + 4^{9} d (13) 4^{27} b (7)^{mi d} 4^{27} b (7)^{hi} + + + 4^{7} c (2) 4^{25} b (7)^{l o} 4^{25} b (7)^{mi d} + + + 4^{4} b (7)^{hi} 4^{15} a (10) 4^{23} b (7)^{l o} + + + 4^{2} b (7)^{mi d} 4^{2} d (13) 4^{13} a (10) + + + b (7)^{l o} c (2) d (13) + +$
check that b was properly split into subsections for 7-bit pieces.
- $W^{b (7) l o} + 2^{2} W^{b (7) mi d} + 2^{4} W^{b (7) hi} - W = 0$
2-bit range check and 2-bit spread check on $b (7)^{l o}, b (7)^{mi d}, c (2)$
3-bit range check and 3-bit spread check on $b (7)^{hi}$

v2

v2 of the $σ_{1}$ gate takes in a word that's split into $(3, 4, 3, 7, 1, 1, 13)$ -bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a (3), b (4), c (3), d (7), e (1), f (1), g (13)) .$ We already have $spread (d (7)), spread (g (13))$ from the message scheduling. The 1-bit $e (1), f (1)$ remain unchanged by the spread operation and can be used directly. We further split $b (4)$ into two 2-bit chunks $b (4)^{l o}, b (4)^{hi} .$ We witness the spread versions of the small chunks.

$(X ⋙ 17) \oplus (X ⋙ 19) \oplus (X ≫ 10)$ is equivalent to $(X ⋘ 15) \oplus (X ⋘ 13) \oplus (X ≫ 10)$ .

s_low_sigma_1_v2	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$
0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$b (4)^{l o}$	$spread (b (4)^{l o})$	$b (4)^{hi}$	$spread (b (4)^{hi})$
1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (7))$	$spread (g (13))$	$b (4)$	$e (1)$
0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$a (3)$	$spread (a (3))$	$c (3)$	$spread (c (3))$	$f (1)$
0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$

Constraints:

s_low_sigma_1_v2 ( $σ_{1}$ v2 constraint): $L H S - R H S = 0$

$L H S = spread (R_{0}^{e v e n}) + 2 \cdot spread (R_{0}^{o dd}) + 2^{32} \cdot spread (R_{1}^{e v e n}) + 2^{33} \cdot spread (R_{1}^{o dd})$ $R H S = 4^{25} d (7) 4^{31} f (1) + + 4^{22} c (3) 4^{30} e (1) + + 4^{20} b (4)^{hi} 4^{23} d (7) + + 4^{9} g (13) 4^{18} b (4)^{l o} 4^{20} c (3) + + + 4^{8} f (1) 4^{15} a 4^{18} b (4)^{hi} + + + 4^{7} e (1) 4^{2} g (13) 4^{16} b (4)^{l o} + + + d (7) 4^{1} f (1) 4^{13} a + + + e (1) g (13) +$

check that b was properly split into subsections for 4-bit pieces.
- $W^{b (4) l o} + 2^{2} W^{b (4) hi} - W = 0$
2-bit range check and 2-bit spread check on $b (4)^{l o}, b (4)^{hi}$
3-bit range check and 3-bit spread check on $a (3), c (3)$

Helper gates

Small range constraints

Let $co n s t r ai n_{n} (x) = \prod_{i = 0}^{n} (x - i)$ . Constraining this expression to equal zero enforces that $x$ is in $[0.. n] .$

2-bit range check

$(a - 3) (a - 2) (a - 1) (a) = 0$

sr2	$a_{0}$
1	a

2-bit spread

$l_{1} (a) + 4 * l_{2} (a) + 5 * l_{3} (a) - a^{'} = 0$

ss2	$a_{0}$	$a_{1}$
1	a	a'

with interpolation polynomials:

$l_{0} (a) = \frac{( a - 3 ) ( a - 2 ) ( a - 1 )}{( - 3 ) ( - 2 ) ( - 1 )}$ ( $spread (00) = 0000$ )
$l_{1} (a) = \frac{( a - 3 ) ( a - 2 ) ( a )}{( - 2 ) ( - 1 ) ( 1 )}$ ( $spread (01) = 0001$ )
$l_{2} (a) = \frac{( a - 3 ) ( a - 1 ) ( a )}{( - 1 ) ( 1 ) ( 2 )}$ ( $spread (10) = 0100$ )
$l_{3} (a) = \frac{( a - 2 ) ( a - 1 ) ( a )}{( 1 ) ( 2 ) ( 3 )}$ ( $spread (11) = 0101$ )

3-bit range check

$(a - 7) (a - 6) (a - 5) (a - 4) (a - 3) (a - 2) (a - 1) (a) = 0$

sr3	$a_{0}$
1	a

3-bit spread

$l_{1} (a) + 4 * l_{2} (a) + 5 * l_{3} (a) + 16 * l_{4} (a) + 17 * l_{5} (a) + 20 * l_{6} (a) + 21 * l_{7} (a) - a^{'} = 0$

ss3	$a_{0}$	$a_{1}$
1	a	a'

with interpolation polynomials:

$l_{0} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 5 ) ( a - 4 ) ( a - 3 ) ( a - 2 ) ( a - 1 )}{( - 7 ) ( - 6 ) ( - 5 ) ( - 4 ) ( - 3 ) ( - 2 ) ( - 1 )}$ ( $spread (000) = 000000$ )
$l_{1} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 5 ) ( a - 4 ) ( a - 3 ) ( a - 2 ) ( a )}{( - 6 ) ( - 5 ) ( - 4 ) ( - 3 ) ( - 2 ) ( - 1 ) ( 1 )}$ ( $spread (001) = 000001$ )
$l_{2} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 5 ) ( a - 4 ) ( a - 3 ) ( a - 1 ) ( a )}{( - 5 ) ( - 4 ) ( - 3 ) ( - 2 ) ( - 1 ) ( 1 ) ( 2 )}$ ( $spread (010) = 000100$ )
$l_{3} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 5 ) ( a - 4 ) ( a - 2 ) ( a - 1 ) ( a )}{( - 4 ) ( - 3 ) ( - 2 ) ( - 1 ) ( 1 ) ( 2 ) ( 3 )}$ ( $spread (011) = 000101$ )
$l_{4} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 5 ) ( a - 3 ) ( a - 2 ) ( a - 1 ) ( a )}{( - 3 ) ( - 2 ) ( - 1 ) ( 1 ) ( 2 ) ( 3 ) ( 4 )}$ ( $spread (100) = 010000$ )
$l_{5} (a) = \frac{( a - 7 ) ( a - 6 ) ( a - 4 ) ( a - 3 ) ( a - 2 ) ( a - 1 ) ( a )}{( - 2 ) ( - 1 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 )}$ ( $spread (101) = 010001$ )
$l_{6} (a) = \frac{( a - 7 ) ( a - 5 ) ( a - 4 ) ( a - 3 ) ( a - 2 ) ( a - 1 ) ( a )}{( - 1 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 )}$ ( $spread (110) = 010100$ )
$l_{7} (a) = \frac{( a - 6 ) ( a - 5 ) ( a - 4 ) ( a - 3 ) ( a - 2 ) ( a - 1 ) ( a )}{( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 )}$ ( $spread (111) = 010101$ )

reduce_6 gate

Addition $(mod 2^{32})$ of 6 elements

Input:

$E$
${e_{i}^{l o}, e_{i}^{hi}}_{i = 0}^{5}$
$c a rry$

Check: $E = e_{0} + e_{1} + e_{2} + e_{3} + e_{4} + e_{5} (mod 32)$

Assume inputs are constrained to 16 bits.

Addition gate (sa):
- $a_{0} + a_{1} + a_{2} + a_{3} + a_{4} + a_{5} + a_{6} - a_{7} = 0$
Carry gate (sc):
- $2^{16} a_{6} ω^{- 1} + a_{6} + [(a_{6} - 5) (a_{6} - 4) (a_{6} - 3) (a_{6} - 2) (a_{6} - 1) (a_{6})] = 0$

sa	sc	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$
1	0	$e_{0}^{l o}$	$e_{1}^{l o}$	$e_{2}^{l o}$	$e_{3}^{l o}$	$e_{4}^{l o}$	$e_{5}^{l o}$	$- c a rry * 2^{16}$	$E^{l o}$
1	1	$e_{0}^{hi}$	$e_{1}^{hi}$	$e_{2}^{hi}$	$e_{3}^{hi}$	$e_{4}^{hi}$	$e_{5}^{hi}$	$c a rry$	$E^{hi}$

Assume inputs are constrained to 16 bits.

Addition gate (sa):
- $a_{0} ω^{- 1} + a_{1} ω^{- 1} + a_{2} ω^{- 1} + a_{0} + a_{1} + a_{2} + a_{3} ω^{- 1} - a_{3} = 0$
Carry gate (sc):
- $2^{16} a_{3} ω + a_{3} ω^{- 1} = 0$

sa	sc	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$
0	0	$e_{0}^{l o}$	$e_{1}^{l o}$	$e_{2}^{l o}$	$- c a rry * 2^{16}$
1	1	$e_{3}^{l o}$	$e_{4}^{l o}$	$e_{5}^{l o}$	$E^{l o}$
0	0	$e_{0}^{hi}$	$e_{1}^{hi}$	$e_{2}^{hi}$	$c a rry$
1	0	$e_{3}^{hi}$	$e_{4}^{hi}$	$e_{5}^{hi}$	$E^{hi}$

reduce_7 gate

Addition $(mod 2^{32})$ of 7 elements

Input:

$E$
${e_{i}^{l o}, e_{i}^{hi}}_{i = 0}^{6}$
$c a rry$

Check: $E = e_{0} + e_{1} + e_{2} + e_{3} + e_{4} + e_{5} + e_{6} (mod 32)$

Assume inputs are constrained to 16 bits.

Addition gate (sa):
- $a_{0} + a_{1} + a_{2} + a_{3} + a_{4} + a_{5} + a_{6} + a_{7} - a_{8} = 0$
Carry gate (sc):
- $2^{16} a_{7} ω^{- 1} + a_{7} + [(a_{7} - 6) (a_{7} - 5) (a_{7} - 4) (a_{7} - 3) (a_{7} - 2) (a_{7} - 1) (a_{7})] = 0$

sa	sc	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$	$a_{8}$
1	0	$e_{0}^{l o}$	$e_{1}^{l o}$	$e_{2}^{l o}$	$e_{3}^{l o}$	$e_{4}^{l o}$	$e_{5}^{l o}$	$e_{6}^{l o}$	$- c a rry * 2^{16}$	$E^{l o}$
1	1	$e_{0}^{hi}$	$e_{1}^{hi}$	$e_{2}^{hi}$	$e_{3}^{hi}$	$e_{4}^{hi}$	$e_{5}^{hi}$	$e_{6}^{hi}$	$c a rry$	$E^{hi}$

Message scheduling region

For each block $M \in {0, 1}^{512}$ of the padded message, $64$ words of $32$ bits each are constructed as follows:

the first $16$ are obtained by splitting $M$ into $32$ -bit blocks $M = W_{0} ∣∣ W_{1} ∣∣ \dots ∣∣ W_{14} ∣∣ W_{15};$
the remaining $48$ words are constructed using the formula: $W_{i} = σ_{1} (W_{i - 2}) ⊞ W_{i - 7} ⊞ σ_{0} (W_{i - 15}) ⊞ W_{i - 16},$ for $16 \leq i < 64$ .

sw	sd0	sd1	sd2	sd3	ss0	ss0_v2	ss1	ss1_v2	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$	$a_{8}$	$a_{9}$
0	1	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{0}^{l o}$	$spread (W_{0}^{l o})$	$W_{0}^{l o}$	$W_{0}^{hi}$	$W_{0}$	$σ_{0} (W_{1})^{l o}$	$σ_{1} (W_{14})^{l o}$	$W_{9}^{l o}$
1	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{0}^{hi}$	$spread (W_{0}^{hi})$			$W_{16}$	$σ_{0} (W_{1})^{hi}$	$σ_{1} (W_{14})^{hi}$	$W_{9}^{hi}$	$c a rr y_{16}$
0	1	1	0	0	0	0	0	0	{0,1,2,3,4}	$W_{1}^{d (14)}$	$spread (W_{1}^{d (14)})$	$W_{1}^{l o}$	$W_{1}^{hi}$	$W_{1}$	$σ_{0} (W_{2})^{l o}$	$σ_{1} (W_{15})^{l o}$	$W_{10}^{l o}$
1	0	0	0	0	0	0	0	0	{0,1,2}	$W_{1}^{c (11)}$	$spread (W_{1}^{c (11)})$	$W_{1}^{a (3)}$	$W_{1}^{b (4)}$	$W_{17}$	$σ_{0} (W_{2})^{hi}$	$σ_{1} (W_{15})^{hi}$	$W_{10}^{hi}$	$c a rr y_{17}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$W_{1}^{b (4) l o}$	$spread (W_{1}^{b (4) l o})$	$W_{1}^{b (4) hi}$	$spread (W_{1}^{b (4) hi})$
0	0	0	0	0	1	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (W_{1}^{c (11)})$	$spread (W_{1}^{d (14)})$	$W_{1}^{b (4)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{1}^{e v e n})$	$0$	$0$	$W_{1}^{a (3)}$	$spread (W_{1}^{a (3)})$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{o dd})$	$σ_{0} v 1 R_{0}$	$σ_{0} v 1 R_{1}$	$σ_{0} v 1 R_{0}^{e v e n}$	$σ_{0} v 1 R_{0}^{o dd}$
..	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
0	0	0	0	0	0	0	0	0	{0,1,2,3}	$W_{14}^{g (13)}$	$spread (W_{14}^{g (13)})$	$W_{14}^{a (3)}$	$W_{14}^{c (3)}$
0	1	0	1	0	0	0	0	0	0	$W_{14}^{d (7)}$	$spread (W_{14}^{d (7)})$	$W_{14}^{l o}$	$W_{14}^{hi}$	$W_{14}$	$σ_{0} (W_{15})^{l o}$	$σ_{1} (W_{28})^{l o}$	$W_{23}^{l o}$
1	0	0	0	0	0	0	0	0	0	$W_{14}^{b (4)}$	$spread (W_{14}^{b (4)})$	$W_{14}^{e (1)}$	$W_{14}^{f (1)}$	$W_{30}$	$σ_{0} (W_{15})^{hi}$	$σ_{1} (W_{28})^{hi}$	$W_{23}^{hi}$	$c a rr y_{30}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$W_{14}^{b (4) l o}$	$spread (W_{14}^{b (4) l o})$	$W_{14}^{b (4) hi}$	$spread (W_{14}^{b (4) hi})$
0	0	0	0	0	0	1	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (W_{14}^{d (7)})$	$spread (W_{14}^{g (13)})$	$W_{1}^{b (14)}$	$W_{14}^{e (1)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$W_{14}^{a (3)}$	$spread (W_{14}^{a (3)})$	$W_{14}^{c (3)}$	$spread (W_{14}^{c (3)})$	$W_{14}^{f (1)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$	$σ_{0} v 2 R_{0}$	$σ_{0} v 2 R_{1}$	$σ_{0} v 2 R_{0}^{e v e n}$	$σ_{0} v 2 R_{0}^{o dd}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$W_{14}^{b (4) l o}$	$spread (W_{14}^{b (4) l o})$	$W_{14}^{b (4) hi}$	$spread (W_{14}^{b (4) hi})$
0	0	0	0	0	0	0	0	1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d)$	$spread (g)$		$W_{14}^{e (1)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$W_{14}^{a (3)}$	$spread (W_{14}^{a (3)})$	$W_{14}^{c (3)}$	$spread (W_{14}^{c (3)})$	$W_{14}^{f (1)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$	$σ_{1} v 2 R_{0}$	$σ_{1} v 2 R_{1}$	$σ_{1} v 2 R_{0}^{e v e n}$	$σ_{1} v 2 R_{0}^{o dd}$
..	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
0	1	0	0	1	0	0	0	0	{0,1,2,3}	$W_{49}^{d (13)}$	$spread (W_{49}^{d (13)})$	$W_{49}^{l o}$	$W_{49}^{hi}$	$W_{49}$
0	0	0	0	0	0	0	0	0	{0,1}	$W_{49}^{a (10)}$	$spread (W_{49}^{a (10)})$	$W_{49}^{c (2)}$	$W_{49}^{b (7)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$W_{49}^{b (7) l o}$	$spread (W_{49}^{b (7) l o})$	$W_{49}^{b (7) mi d}$	$spread (W_{49}^{b (7) mi d})$
0	0	0	0	0	0	0	0	1	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (a)$	$spread (d)$	$W_{1}^{b (49)}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$W_{49}^{c (2)}$	$spread (W_{49}^{c (2)})$	$W_{49}^{b (7) hi}$	$spread (W_{49}^{b (7) hi})$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$	$σ_{1} v 1 R_{0}$	$σ_{1} v 1 R_{1}$	$σ_{1} v 1 R_{0}^{e v e n}$	$σ_{1} v 1 R_{0}^{o dd}$
..	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
0	1	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{62}^{l o}$	$spread (W_{62}^{l o})$	$W_{62}^{l o}$	$W_{62}^{hi}$	$W_{62}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{62}^{hi}$	$spread (W_{62}^{hi})$
0	1	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{63}^{l o}$	$spread (W_{63}^{l o})$	$W_{63}^{l o}$	$W_{63}^{hi}$	$W_{63}$
0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$W_{63}^{hi}$	$spread (W_{63}^{hi})$

Constraints:

sw: construct word using $re d u c e_{4}$
sd0: decomposition gate for $W_{0}, W_{62}, W_{63}$
- $W^{l o} + 2^{16} W^{hi} - W = 0$
sd1: decomposition gate for $W_{1..13}$ (split into $(3, 4, 11, 14)$ -bit pieces)
- $W^{a (3)} + 2^{3} W^{b (4) l o} + 2^{5} W^{b (4) hi} + 2^{7} W^{c (11)} + 2^{18} W^{d (14)} - W = 0$
sd2: decomposition gate for $W_{14..48}$ (split into $(3, 4, 3, 7, 1, 1, 13)$ -bit pieces)
- $W^{a (3)} + 2^{3} W^{b (4) l o} + 2^{5} W^{b (4) hi} + 2^{7} W^{c (11)} + 2^{10} W^{d (14)} + 2^{17} W^{e (1)} + 2^{18} W^{f (1)} + 2^{19} W^{g (13)} - W = 0$
sd3: decomposition gate for $W_{49..61}$ (split into $(10, 7, 2, 13)$ -bit pieces)
- $W^{a (10)} + 2^{10} W^{b (7) l o} + 2^{12} W^{b (7) mi d} + 2^{15} W^{b (7) hi} + 2^{17} W^{c (2)} + 2^{19} W^{d (13)} - W = 0$

Compression region

+----------------------------------------------------------+
|                                                          |
|          decompose E,                                    |
|          Σ_1(E)                                          |
|                                                          |
|                  +---------------------------------------+
|                  |                                       |
|                  |        reduce_5() to get H'           |
|                  |                                       |
+----------------------------------------------------------+
|          decompose F, decompose G                        |
|                                                          |
|                        Ch(E,F,G)                         |
|                                                          |
+----------------------------------------------------------+
|                                                          |
|          decompose A,                                    |
|          Σ_0(A)                                          |
|                                                          |
|                                                          |
|                  +---------------------------------------+
|                  |                                       |
|                  |        reduce_7() to get A_new,       |
|                  |              using H'                 |
|                  |                                       |
+------------------+---------------------------------------+
|          decompose B, decompose C                        |
|                                                          |
|          Maj(A,B,C)                                      |
|                                                          |
|                  +---------------------------------------+
|                  |        reduce_6() to get E_new,       |
|                  |              using H'                 |
+------------------+---------------------------------------+

Initial round:

sd_abcd	sd_efgh	ss0	ss1	s_maj	s_ch_neg	s_ch	s_a_new	s_e_new	s_h_prime	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$	$a_{8}$	$a_{9}$
0	1	0	0	0	0	0	0	0	0	{0,1,2}	$F_{0} d (7)$	$spread (E_{0} d (7))$	$E_{0} b (5)^{l o}$	$spread (E_{0} b (5)^{l o})$	$E_{0} b (5)^{hi}$	$spread (E_{0} b (5)^{hi})$	$E_{0}^{l o}$	$spread (E_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$E_{0} c (14)$	$spread (E_{0} c (14))$	$E_{0} a (6)^{l o}$	$spread (E_{0} a (6)^{l o})$	$E_{0} a (6)^{hi}$	$spread (E_{0} a (6)^{hi})$	$E_{0}^{hi}$	$spread (E_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$spread (E_{0} b (5)^{l o})$	$spread (E_{0} b (5)^{hi})$
0	0	0	1	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (E_{0} d (7))$	$spread (E_{0} c (14))$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$spread (E_{0} a (6)^{l o})$	$spread (E_{0} a (6)^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$
0	1	0	0	0	0	0	0	0	0	{0,1,2}	$F_{0} d (7)$	$spread (F_{0} d (7))$	$F_{0} b (5)^{l o}$	$spread (F_{0} b (5)^{l o})$	$F_{0} b (5)^{hi}$	$spread (F_{0} b (5)^{hi})$	$F_{0}^{l o}$	$spread (F_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$F_{0} c (14)$	$spread (F_{0} c (14))$	$F_{0} a (6)^{l o}$	$spread (F_{0} a (6)^{l o})$	$F_{0} a (6)^{hi}$	$spread (F_{0} a (6)^{hi})$	$F_{0}^{hi}$	$spread (F_{0}^{hi})$
0	1	0	0	0	0	0	0	0	0	{0,1,2}	$G_{0} d (7)$	$spread (G_{0} d (7))$	$G_{0} b (5)^{l o}$	$spread (G_{0} b (5)^{l o})$	$G_{0} b (5)^{hi}$	$spread (G_{0} b (5)^{hi})$	$G_{0}^{l o}$	$spread (G_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$G_{0} c (14)$	$spread (G_{0} c (14))$	$G_{0} a (6)^{l o}$	$spread (G_{0} a (6)^{l o})$	$G_{0} a (6)^{hi}$	$spread (G_{0} a (6)^{hi})$	$G_{0}^{hi}$	$spread (G_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{0}^{e v e n}$	$spread (P_{0}^{e v e n})$	$spread (E^{l o})$	$spread (E^{hi})$	$Q_{0}^{o dd}$	$K_{0}^{l o}$	$H_{0}^{l o}$	$W_{0}^{l o}$
0	0	0	0	0	0	1	0	0	1	{0,1,2,3,4,5}	$P_{0}^{o dd}$	$spread (P_{0}^{o dd})$	$spread (P_{1}^{o dd})$	$Σ_{1} (E_{0})^{l o}$	$Σ_{1} (E_{0})^{hi}$	$K_{0}^{hi}$	$H_{0}^{hi}$	$W_{0}^{hi}$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{1}^{e v e n}$	$spread (P_{1}^{e v e n})$	$spread (F^{l o})$	$spread (F^{hi})$	$Q_{1}^{o dd}$	$P_{1}^{o dd}$	$H p r im e_{0}^{l o}$	$H p r im e_{0}^{hi}$	$H p r im e_{0} c a rry$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{1}^{o dd}$	$spread (P_{1}^{o dd})$					$D_{0}^{l o}$	$E_{1}^{l o}$
0	0	0	0	0	0	0	0	1	0	{0,1,2,3,4,5}	$Q_{0}^{e v e n}$	$spread (Q_{0}^{e v e n})$	$spread (E_{n e g}^{l o})$	$spread (E_{n e g}^{hi})$	$spread (E^{l o})$		$D_{0}^{hi}$	$E_{1}^{hi}$	$E_{1} c a rry$
0	0	0	0	0	1	0	0	0	0	{0,1,2,3,4,5}	$Q_{0}^{o dd}$	$spread (Q_{0}^{o dd})$	$spread (Q_{1}^{o dd})$		$spread (E^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$Q_{1}^{e v e n}$	$spread (Q_{1}^{e v e n})$	$spread (G^{l o})$	$spread (G^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$Q_{1}^{o dd}$	$spread (Q_{1}^{o dd})$
1	0	0	0	0	0	0	0	0	0	{0,1,2}	$A_{0} b (11)$	$spread (A_{0} b (11))$	$A_{0} c (9)^{l o}$	$spread (A_{0} c (9)^{l o})$	$A_{0} c (9)^{mi d}$	$spread (A_{0} c (9)^{mi d})$	$A_{0}^{l o}$	$spread (A_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$A_{0} d (10)$	$spread (A_{0} d (10))$	$A_{0} a (2)$	$spread (A_{0} a (2))$	$A_{0} c (9)^{hi}$	$spread (A_{0} c (9)^{hi})$	$A_{0}^{hi}$	$spread (A_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$spread (c (9)^{l o})$	$spread (c (9)^{mi d})$
0	0	1	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (10))$	$spread (b (11))$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$spread (a (2))$	$spread (c (9)^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$
1	0	0	0	0	0	0	0	0	0	{0,1,2}	$B_{0} b (11)$	$spread (B_{0} b (11))$	$B_{0} c (9)^{l o}$	$spread (B_{0} c (9)^{l o})$	$B_{0} c (9)^{mi d}$	$spread (B_{0} c (9)^{mi d})$	$B_{0}^{l o}$	$spread (B_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$B_{0} d (10)$	$spread (B_{0} d (10))$	$B_{0} a (2)$	$spread (B_{0} a (2))$	$B_{0} c (9)^{hi}$	$spread (B_{0} c (9)^{hi})$	$B_{0}^{hi}$	$spread (B_{0}^{hi})$
1	0	0	0	0	0	0	0	0	0	{0,1,2}	$C_{0} b (11)$	$spread (C_{0} b (11))$	$C_{0} c (9)^{l o}$	$spread (C_{0} c (9)^{l o})$	$C_{0} c (9)^{mi d}$	$spread (C_{0} c (9)^{mi d})$	$C_{0}^{l o}$	$spread (C_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$C_{0} d (10)$	$spread (C_{0} d (10))$	$C_{0} a (2)$	$spread (C_{0} a (2))$	$C_{0} c (9)^{hi}$	$spread (C_{0} c (9)^{hi})$	$C_{0}^{hi}$	$spread (C_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{0}^{e v e n}$	$spread (M_{0}^{e v e n})$	$M_{1}^{o dd}$	$spread (A_{0}^{l o})$	$spread (A_{0}^{hi})$		$H p r im e_{0}^{l o}$	$H p r im e_{0}^{hi}$
0	0	0	0	1	0	0	1	0	0	{0,1,2,3,4,5}	$M_{0}^{o dd}$	$spread (M_{0}^{o dd})$	$spread (M_{1}^{o dd})$	$spread (B_{0}^{l o})$	$spread (B_{0}^{hi})$	$Σ_{0} (A_{0})^{l o}$		$A_{1}^{l o}$	$A_{1} c a rry$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{1}^{e v e n}$	$spread (M_{1}^{e v e n})$		$spread (C_{0}^{l o})$	$spread (C_{0}^{hi})$	$Σ_{0} (A_{0})^{hi}$		$A_{1}^{hi}$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{1}^{o dd}$	$spread (M_{1}^{o dd})$

Steady-state:

sd_abcd	sd_efgh	ss0	ss1	s_maj	s_ch_neg	s_ch	s_a_new	s_e_new	s_h_prime	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$	$a_{8}$	$a_{9}$
0	1	0	0	0	0	0	0	0	0	{0,1,2}	$F_{0} d (7)$	$spread (E_{0} d (7))$	$E_{0} b (5)^{l o}$	$spread (E_{0} b (5)^{l o})$	$E_{0} b (5)^{hi}$	$spread (E_{0} b (5)^{hi})$	$E_{0}^{l o}$	$spread (E_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$E_{0} c (14)$	$spread (E_{0} c (14))$	$E_{0} a (6)^{l o}$	$spread (E_{0} a (6)^{l o})$	$E_{0} a (6)^{hi}$	$spread (E_{0} a (6)^{hi})$	$E_{0}^{hi}$	$spread (E_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$spread (E_{0} b (5)^{l o})$	$spread (E_{0} b (5)^{hi})$
0	0	0	1	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (E_{0} d (7))$	$spread (E_{0} c (14))$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$spread (E_{0} a (6)^{l o})$	$spread (E_{0} a (6)^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{0}^{e v e n}$	$spread (P_{0}^{e v e n})$	$spread (E^{l o})$	$spread (E^{hi})$	$Q_{0}^{o dd}$	$K_{0}^{l o}$	$H_{0}^{l o}$	$W_{0}^{l o}$
0	0	0	0	0	0	1	0	0	1	{0,1,2,3,4,5}	$P_{0}^{o dd}$	$spread (P_{0}^{o dd})$	$spread (P_{1}^{o dd})$	$Σ_{1} (E_{0})^{l o}$	$Σ_{1} (E_{0})^{hi}$	$K_{0}^{hi}$	$H_{0}^{hi}$	$W_{0}^{hi}$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{1}^{e v e n}$	$spread (P_{1}^{e v e n})$	$spread (F^{l o})$	$spread (F^{hi})$	$Q_{1}^{o dd}$	$P_{1}^{o dd}$	$H p r im e_{0}^{l o}$	$H p r im e_{0}^{hi}$	$H p r im e_{0} c a rry$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$P_{1}^{o dd}$	$spread (P_{1}^{o dd})$					$D_{0}^{l o}$	$E_{1}^{l o}$
0	0	0	0	0	0	0	0	1	0	{0,1,2,3,4,5}	$Q_{0}^{e v e n}$	$spread (Q_{0}^{e v e n})$	$spread (E_{n e g}^{l o})$	$spread (E_{n e g}^{hi})$	$spread (E^{l o})$		$D_{0}^{hi}$	$E_{1}^{hi}$	$E_{1} c a rry$
0	0	0	0	0	1	0	0	0	0	{0,1,2,3,4,5}	$Q_{0}^{o dd}$	$spread (Q_{0}^{o dd})$	$spread (Q_{1}^{o dd})$		$spread (E^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$Q_{1}^{e v e n}$	$spread (Q_{1}^{e v e n})$	$spread (G^{l o})$	$spread (G^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$Q_{1}^{o dd}$	$spread (Q_{1}^{o dd})$
1	0	0	0	0	0	0	0	0	0	{0,1,2}	$A_{0} b (11)$	$spread (A_{0} b (11))$	$A_{0} c (9)^{l o}$	$spread (A_{0} c (9)^{l o})$	$A_{0} c (9)^{mi d}$	$spread (A_{0} c (9)^{mi d})$	$A_{0}^{l o}$	$spread (A_{0}^{l o})$
0	0	0	0	0	0	0	0	0	0	{0,1}	$A_{0} d (10)$	$spread (A_{0} d (10))$	$A_{0} a (2)$	$spread (A_{0} a (2))$	$A_{0} c (9)^{hi}$	$spread (A_{0} c (9)^{hi})$	$A_{0}^{hi}$	$spread (A_{0}^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{e v e n}$	$spread (R_{0}^{e v e n})$	$spread (c (9)^{l o})$	$spread (c (9)^{mi d})$
0	0	1	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{0}^{o dd}$	$spread (R_{0}^{o dd})$	$spread (R_{1}^{o dd})$	$spread (d (10))$	$spread (b (11))$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{e v e n}$	$spread (R_{1}^{e v e n})$	$spread (a (2))$	$spread (c (9)^{hi})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$R_{1}^{o dd}$	$spread (R_{1}^{o dd})$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{0}^{e v e n}$	$spread (M_{0}^{e v e n})$	$M_{1}^{o dd}$	$spread (A_{0}^{l o})$	$spread (A_{0}^{hi})$		$H p r im e_{0}^{l o}$	$H p r im e_{0}^{hi}$
0	0	0	0	1	0	0	1	0	0	{0,1,2,3,4,5}	$M_{0}^{o dd}$	$spread (M_{0}^{o dd})$	$spread (M_{1}^{o dd})$	$spread (B_{0}^{l o})$	$spread (B_{0}^{hi})$	$Σ_{0} (A_{0})^{l o}$		$A_{1}^{l o}$	$A_{1} c a rry$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{1}^{e v e n}$	$spread (M_{1}^{e v e n})$		$spread (C_{0}^{l o})$	$spread (C_{0}^{hi})$	$Σ_{0} (A_{0})^{hi}$		$A_{1}^{hi}$
0	0	0	0	0	0	0	0	0	0	{0,1,2,3,4,5}	$M_{1}^{o dd}$	$spread (M_{1}^{o dd})$

Final digest:

s_digest	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$	$a_{8}$
1	$A_{63}^{l o}$	$A_{63}^{hi}$	$A_{63}$	$B_{63}^{l o}$	$B_{63}^{hi}$	$B_{63}$
0	$C_{63}^{l o}$	$C_{63}^{hi}$	$C_{63}$	$C_{63}^{l o}$	$C_{63}^{hi}$	$C_{63}$
1	$E_{63}^{l o}$	$E_{63}^{hi}$	$E_{63}$	$G_{63}^{l o}$	$G_{63}^{hi}$	$G_{63}$
0	$F_{63}^{l o}$	$F_{63}^{hi}$	$F_{63}$	$H_{63}^{l o}$	$H_{63}^{hi}$	$H_{63}$

Background Material

This section covers the background material required to understand the Halo 2 proving system. It is targeted at an ELI15 (Explain It Like I'm 15) level; if you think anything could do with additional explanation, let us know!

Fields

A fundamental component of many cryptographic protocols is the algebraic structure known as a field. Fields are sets of objects (usually numbers) with two associated binary operators $+$ and $\times$ such that various field axioms hold. The real numbers $R$ are an example of a field with uncountably many elements.

Halo makes use of finite fields which have a finite number of elements. Finite fields are fully classified as follows:

if $F$ is a finite field, it contains $∣ F ∣ = p^{k}$ elements for some integer $k \geq 1$ and some prime $p$ ;
any two finite fields with the same number of elements are isomorphic. In particular, all of the arithmetic in a prime field $F_{p}$ is isomorphic to addition and multiplication of integers modulo $p$ , i.e. in $Z_{p}$ . This is why we often refer to $p$ as the modulus.

We'll write a field as $F_{q}$ where $q = p^{k}$ . The prime $p$ is called its characteristic. In the cases where $k > 1$ the field $F_{q}$ is a $k$ -degree extension of the field $F_{p}$ . (By analogy, the complex numbers $C = R (i)$ are an extension of the real numbers.) However, in Halo we do not use extension fields. Whenever we write $F_{p}$ we are referring to what we call a prime field which has a prime $p$ number of elements, i.e. $k = 1$ .

Important notes:

There are two special elements in any field: $0$ , the additive identity, and $1$ , the multiplicative identity.
The least significant bit of a field element, when represented as an integer in binary format, can be interpreted as its "sign" to help distinguish it from its additive inverse (negation). This is because for some nonzero element $a$ which has a least significant bit $0$ we have that $- a = p - a$ has a least significant bit $1$ , and vice versa. We could also use whether or not an element is larger than $(p - 1) /2$ to give it a "sign."

Finite fields will be useful later for constructing polynomials and elliptic curves. Elliptic curves are examples of groups, which we discuss next.

Groups

Groups are simpler and more limited than fields; they have only one binary operator $\cdot$ and fewer axioms. They also have an identity, which we'll denote as $1$ .

Any non-zero element $a$ in a group has an inverse $b = a^{- 1}$ , which is the unique element $b$ such that $a \cdot b = 1$ .

For example, the set of nonzero elements of $F_{p}$ forms a group, where the group operation is given by multiplication on the field.

(aside) Additive vs multiplicative notation

If $\cdot$ is written as $\times$ or omitted (i.e. $a \cdot b$ written as $ab$ ), the identity as $1$ , and inversion as $a^{- 1}$ , as we did above, then we say that the group is "written multiplicatively". If $\cdot$ is written as $+$ , the identity as $0$ or $O$ , and inversion as $- a$ , then we say it is "written additively".

It's conventional to use additive notation for elliptic curve groups, and multiplicative notation when the elements come from a finite field.

When additive notation is used, we also write

$[k] A = k times A + A + \dots + A$

for nonnegative $k$ and call this "scalar multiplication"; we also often use uppercase letters for variables denoting group elements. When multiplicative notation is used, we also write

$a^{k} = k times a \times a \times \dots \times a$

and call this "exponentiation". In either case we call the scalar $k$ such that $[k] g = a$ or $g^{k} = a$ the "discrete logarithm" of $a$ to base $g$ . We can extend scalars to negative integers by inversion, i.e. $[- k] A + [k] A = O$ or $a^{- k} \times a^{k} = 1$ .

The order of an element $a$ of a finite group is defined as the smallest positive integer $k$ such that $a^{k} = 1$ (in multiplicative notation) or $[k] a = O$ (in additive notation). The order of the group is the number of elements.

Groups always have a generating set, which is a set of elements such that we can produce any element of the group as (in multiplicative terminology) a product of powers of those elements. So if the generating set is $g_{1.. k}$ , we can produce any element of the group as $i = 1 \prod k g_{i}^{a_{i}}$ . There can be many different generating sets for a given group.

A group is called cyclic if it has a (not necessarily unique) generating set with only a single element — call it $g$ . In that case we can say that $g$ generates the group, and that the order of $g$ is the order of the group.

Any finite cyclic group $G$ of order $n$ is isomorphic to the integers modulo $n$ (denoted $Z / n Z$ ), such that:

the operation $\cdot$ in $G$ corresponds to addition modulo $n$ ;
the identity in $G$ corresponds to $0$ ;
some generator $g \in G$ corresponds to $1$ .

Given a generator $g$ , the isomorphism is always easy to compute in the $Z / n Z \to G$ direction; it is just $a \mapsto g^{a}$ (or in additive notation, $a \mapsto [a] g$ ). It may be difficult in general to compute in the $G \to Z / n Z$ direction; we'll discuss this further when we come to elliptic curves.

If the order $n$ of a finite group is prime, then the group is cyclic, and every non-identity element is a generator.

The multiplicative group of a finite field

We use the notation $F_{p}^{\times}$ for the multiplicative group (i.e. the group operation is multiplication in $F_{p}$ ) over the set $F_{p} - {0}$ .

A quick way of obtaining the inverse in $F_{p}^{\times}$ is $a^{- 1} = a^{p - 2}$ . The reason for this stems from Fermat's little theorem, which states that $a^{p} = a (mod p)$ for any integer $a$ . If $a$ is nonzero, we can divide by $a$ twice to get $a^{p - 2} = a^{- 1} .$

Let's assume that $α$ is a generator of $F_{p}^{\times}$ , so it has order $p - 1$ (equal to the number of elements in $F_{p}^{\times}$ ). Therefore, for any element in $a \in F_{p}^{\times}$ there is a unique integer $i \in {0.. p - 2}$ such that $a = α^{i}$ .

Notice that $a \times b$ where $a, b \in F_{p}^{\times}$ can really be interpreted as $α^{i} \times α^{j}$ where $a = α^{i}$ and $b = α^{j}$ . Indeed, it holds that $α^{i} \times α^{j} = α^{i + j}$ for all $0 \leq i, j < p - 1$ . As a result the multiplication of nonzero field elements can be interpreted as addition modulo $p - 1$ with respect to some fixed generator $α$ . The addition just happens "in the exponent."

This is another way to look at where $a^{p - 2}$ comes from for computing inverses in the field:

$p - 2 \equiv - 1 (mod p - 1),$

so $a^{p - 2} = a^{- 1}$ .

Montgomery's Trick

Montgomery's trick, named after Peter Montgomery (RIP) is a way to compute many group inversions at the same time. It is commonly used to compute inversions in $F_{p}^{\times}$ , which are quite computationally expensive compared to multiplication.

Imagine we need to compute the inverses of three nonzero elements $a, b, c \in F_{p}^{\times}$ . Instead, we'll compute the products $x = ab$ and $y = x c = ab c$ , and compute the inversion

$z = y^{p - 2} = \frac{1}{ab c} .$

We can now multiply $z$ by $x$ to obtain $\frac{1}{c}$ and multiply $z$ by $c$ to obtain $\frac{1}{ab}$ , which we can then multiply by $a, b$ to obtain their respective inverses.

This technique generalizes to arbitrary numbers of group elements with just a single inversion necessary.

Multiplicative subgroups

A subgroup of a group $G$ with operation $\cdot$ , is a subset of elements of $G$ that also form a group under $\cdot$ .

In the previous section we said that $α$ is a generator of the $(p - 1)$ -order multiplicative group $F_{p}^{\times}$ . This group has composite order, and so by the Chinese remainder theorem¹ it has strict subgroups. As an example let's imagine that $p = 11$ , and so $p - 1$ factors into $5 \cdot 2$ . Thus, there is a generator $β$ of the $5$ -order subgroup and a generator $γ$ of the $2$ -order subgroup. All elements in $F_{p}^{\times}$ , therefore, can be written uniquely as $β^{i} \cdot γ^{j}$ for some $i$ (modulo $5$ ) and some $j$ (modulo $2$ ).

If we have $a = β^{i} \cdot γ^{j}$ notice what happens when we compute

$a^{5} = (β^{i} \cdot γ^{j})^{5} = β^{i \cdot 5} \cdot γ^{j \cdot 5} = β^{0} \cdot γ^{j \cdot 5} = γ^{j \cdot 5};$

we have effectively "killed" the $5$ -order subgroup component, producing a value in the $2$ -order subgroup.

Lagrange's theorem (group theory) states that the order of any subgroup $H$ of a finite group $G$ divides the order of $G$ . Therefore, the order of any subgroup of $F_{p}^{\times}$ must divide $p - 1.$

PLONK-based proving systems like Halo 2 are more convenient to use with fields that have a large number of multiplicative subgroups with a "smooth" distribution (which makes the performance cliffs smaller and more granular as circuit sizes increase). The Pallas and Vesta curves specifically have primes of the form

$T \cdot 2^{S} = p - 1$

with $S = 32$ and $T$ odd (i.e. $p - 1$ has 32 lower zero-bits). This means they have multiplicative subgroups of order $2^{k}$ for all $k \leq 32$ . These 2-adic subgroups are nice for efficient FFTs, as well as enabling a wide variety of circuit sizes.

Square roots

In a field $F_{p}$ exactly half of all nonzero elements are squares; the remainder are non-squares or "quadratic non-residues". In order to see why, consider an $α$ that generates the $2$ -order multiplicative subgroup of $F_{p}^{\times}$ (this exists because $p - 1$ is divisible by $2$ since $p$ is a prime greater than $2$ ) and $β$ that generates the $t$ -order multiplicative subgroup of $F_{p}^{\times}$ where $p - 1 = 2 t$ . Then every element $a \in F_{p}^{\times}$ can be written uniquely as $α^{i} \cdot β^{j}$ with $i \in Z_{2}$ and $j \in Z_{t}$ . Half of all elements will have $i = 0$ and the other half will have $i = 1$ .

Let's consider the simple case where $p \equiv 3 (mod 4)$ and so $t$ is odd (if $t$ is even, then $p - 1$ would be divisible by $4$ , which contradicts $p$ being $3 (mod 4)$ ). If $a \in F_{p}^{\times}$ is a square, then there must exist $b = α^{i} \cdot β^{j}$ such that $b^{2} = a$ . But this means that

$a = (α^{i} \cdot β^{j})^{2} = α^{2 i} \cdot β^{2 j} = β^{2 j} .$

In other words, all squares in this particular field do not generate the $2$ -order multiplicative subgroup, and so since half of the elements generate the $2$ -order subgroup then at most half of the elements are square. In fact exactly half of the elements are square (since squaring each nonsquare element gives a unique square). This means we can assume all squares can be written as $β^{m}$ for some $m$ , and therefore finding the square root is a matter of exponentiating by $2^{- 1} (mod t)$ .

In the event that $p \equiv 1 (mod 4)$ then things get more complicated because $2^{- 1} (mod t)$ does not exist. Let's write $p - 1$ as $2^{k} \cdot t$ with $t$ odd. The case $k = 0$ is impossible, and the case $k = 1$ is what we already described, so consider $k \geq 2$ . $α$ generates a $2^{k}$ -order multiplicative subgroup and $β$ generates the odd $t$ -order multiplicative subgroup. Then every element $a \in F_{p}^{\times}$ can be written as $α^{i} \cdot β^{j}$ for $i \in Z_{2^{k}}$ and $j \in Z_{t}$ . If the element is a square, then there exists some $b = a$ which can be written $b = α^{i^{'}} \cdot β^{j^{'}}$ for $i^{'} \in Z_{2^{k}}$ and $j^{'} \in Z_{t}$ . This means that $a = b^{2} = α^{2 i^{'}} \cdot β^{2 j^{'}}$ , therefore we have $i \equiv 2 i^{'} (mod 2^{k})$ , and $j \equiv 2 j^{'} (mod t)$ . $i$ would have to be even in this case because otherwise it would be impossible to have $i \equiv 2 i^{'} (mod 2^{k})$ for any $i^{'}$ . In the case that $a$ is not a square, then $i$ is odd, and so half of all elements are squares.

In order to compute the square root, we can first raise the element $a = α^{i} \cdot β^{j}$ to the power $t$ to "kill" the $t$ -order component, giving

$a^{t} = α^{i t (mod 2^{k})} \cdot β^{j t (mod t)} = α^{i t (mod 2^{k})}$

and then raise this result to the power $t^{- 1} (mod 2^{k})$ to undo the effect of the original exponentiation on the $2^{k}$ -order component:

$(α^{i t mod 2^{k}})^{t^{- 1} (mod 2^{k})} = α^{i}$

(since $t$ is relatively prime to $2^{k}$ ). This leaves bare the $α^{i}$ value which we can trivially handle. We can similarly kill the $2^{k}$ -order component to obtain $β^{j \cdot 2^{- 1} (mod t)}$ , and put the values together to obtain the square root.

It turns out that in the cases $k = 2, 3$ there are simpler algorithms that merge several of these exponentiations together for efficiency. For other values of $k$ , the only known way is to manually extract $i$ by squaring until you obtain the identity for every single bit of $i$ . This is the essence of the Tonelli-Shanks square root algorithm and describes the general strategy. (There is another square root algorithm that uses quadratic extension fields, but it doesn't pay off in efficiency until the prime becomes quite large.)

Roots of unity

In the previous sections we wrote $p - 1 = 2^{k} \cdot t$ with $t$ odd, and stated that an element $α \in F_{p}^{\times}$ generated the $2^{k}$ -order subgroup. For convenience, let's denote $n := 2^{k} .$ The elements ${1, α, \dots, α^{n - 1}}$ are known as the $n$ th roots of unity.

The primitive root of unity, $ω,$ is an $n$ th root of unity such that $ω^{i} \neq = 1$ except when $i \equiv 0 (mod n)$ .

Important notes:

If $α$ is an $n$ th root of unity, $α$ satisfies $α^{n} - 1 = 0.$ If $α \neq = 1,$ then $1 + α + α^{2} + \dots + α^{n - 1} = 0.$
Equivalently, the roots of unity are solutions to the equation $X^{n} - 1 = (X - 1) (X - α) (X - α^{2}) \dots (X - α^{n - 1}) .$
$ω^{\frac{n}{2} + i} = - ω^{i}$ ("Negation lemma"). Proof: $ω^{n} = 1 ⟹ ω^{n} - 1 = 0 ⟹ (ω^{n /2} + 1) (ω^{n /2} - 1) = 0.$ Since the order of $ω$ is $n$ , $ω^{n /2} \neq = 1.$ Therefore, $ω^{n /2} = - 1.$
$(ω^{\frac{n}{2} + i})^{2} = (ω^{i})^{2}$ ("Halving lemma"). Proof: $(ω^{\frac{n}{2} + i})^{2} = ω^{n + 2 i} = ω^{n} \cdot ω^{2 i} = ω^{2 i} = (ω^{i})^{2} .$ In other words, if we square each element in the $n$ th roots of unity, we would get back only half the elements, ${(ω_{n}^{i})^{2}} = {ω_{n /2}}$ (i.e. the $\frac{n}{2}$ th roots of unity). There is a two-to-one mapping between the elements and their squares.

References

Friedman, R. (n.d.) "Cyclic Groups and Elementary Number Theory II" (p. 5).

Polynomials

Let $A (X)$ be a polynomial over $F_{p}$ with formal indeterminate $X$ . As an example,

$A (X) = a_{0} + a_{1} X + a_{2} X^{2} + a_{3} X^{3}$

defines a degree- $3$ polynomial. $a_{0}$ is referred to as the constant term. Polynomials of degree $n - 1$ have $n$ coefficients. We will often want to compute the result of replacing the formal indeterminate $X$ with some concrete value $x$ , which we denote by $A (x)$ .

In mathematics this is commonly referred to as "evaluating $A (X)$ at a point $x$ ". The word "point" here stems from the geometrical usage of polynomials in the form $y = A (x)$ , where $(x, y)$ is the coordinate of a point in two-dimensional space. However, the polynomials we deal with are almost always constrained to equal zero, and $x$ will be an element of some field. This should not be confused with points on an elliptic curve, which we also make use of, but never in the context of polynomial evaluation.

Important notes:

Multiplication of polynomials produces a product polynomial that is the sum of the degrees of its factors. Polynomial division subtracts from the degree. $de g (A (X) B (X)) = de g (A (X)) + de g (B (X)),$ $de g (A (X) / B (X)) = de g (A (X)) - de g (B (X)) .$
Given a polynomial $A (X)$ of degree $n - 1$ , if we obtain $n$ evaluations of the polynomial at distinct points then these evaluations perfectly define the polynomial. In other words, given these evaluations we can obtain a unique polynomial $A (X)$ of degree $n - 1$ via polynomial interpolation.
$[a_{0}, a_{1}, \dots, a_{n - 1}]$ is the coefficient representation of the polynomial $A (X)$ . Equivalently, we could use its evaluation representation $[(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))]$ at $n$ distinct points. Either representation uniquely specifies the same polynomial.

(aside) Horner's rule

Horner's rule allows for efficient evaluation of a polynomial of degree $n - 1$ , using only $n - 1$ multiplications and $n - 1$ additions. It is the following identity: $a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1} = a_{0} + X (a_{1} + X (a_{2} + \dots + X (a_{n - 2} + X a_{n - 1}))),$

Fast Fourier Transform (FFT)

The FFT is an efficient way of converting between the coefficient and evaluation representations of a polynomial. It evaluates the polynomial at the $n$ th roots of unity ${ω^{0}, ω^{1}, \dots, ω^{n - 1}},$ where $ω$ is a primitive $n$ th root of unity. By exploiting symmetries in the roots of unity, each round of the FFT reduces the evaluation into a problem only half the size. Most commonly we use polynomials of length some power of two, $n = 2^{k}$ , and apply the halving reduction recursively.

Motivation: Fast polynomial multiplication

In the coefficient representation, it takes $O (n^{2})$ operations to multiply two polynomials $A (X) \cdot B (X) = C (X)$ :

$A (X) B (X) C (X) = a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1}, = b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}, = a_{0} \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}) + a_{1} X \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}) + \dots + a_{n - 1} X^{n - 1} \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}),$

where each of the $n$ terms in the first polynomial has to be multiplied by the $n$ terms of the second polynomial.

In the evaluation representation, however, polynomial multiplication only requires $O (n)$ operations:

$A B C : {(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))}, : {(x_{0}, B (x_{0})), (x_{1}, B (x_{1})), \dots, (x_{n - 1}, B (x_{n - 1}))}, : {(x_{0}, A (x_{0}) B (x_{0})), (x_{1}, A (x_{1}) B (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}) B (x_{n - 1}))},$

where each evaluation is multiplied pointwise.

This suggests the following strategy for fast polynomial multiplication:

Evaluate polynomials at all $n$ points;
Perform fast pointwise multiplication in the evaluation representation ( $O (n)$ );
Convert back to the coefficient representation.

The challenge now is how to evaluate and interpolate the polynomials efficiently. Naively, evaluating a polynomial at $n$ points would require $O (n^{2})$ operations (we use the $O (n)$ Horner's rule at each point):

$A (1) A (ω) A (ω^{2}) ⋮ A (ω^{n - 1}) = 111 ⋮ 1 1 ω ω^{2} ⋮ ω^{n - 1} 1 ω^{2} ω^{2 \cdot 2} ⋮ ω^{2 (n - 1)} \dots \dots \dots \dots 1 ω^{n - 1} ω^{2 \cdot (n - 1)} ⋮ ω^{(n - 1)^{2}} \cdot a_{0} a_{1} a_{2} ⋮ a_{n - 1} .$

For convenience, we will denote the matrices above as: $\hat{A} = V_{ω} \cdot A .$

( $\hat{A}$ is known as the Discrete Fourier Transform of $A$ ; $V_{ω}$ is also called the Vandermonde matrix.)

The (radix-2) Cooley-Tukey algorithm

Our strategy is to divide a DFT of size $n$ into two interleaved DFTs of size $n /2$ . Given the polynomial $A (X) = a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1},$ we split it up into even and odd terms:

$A_{even} A_{odd} = a_{0} + a_{2} X + \dots + a_{n - 2} X^{\frac{n}{2} - 1}, = a_{1} + a_{3} X + \dots + a_{n - 1} X^{\frac{n}{2} - 1} .$

To recover the original polynomial, we do $A (X) = A_{even} (X^{2}) + X A_{odd} (X^{2}) .$

Trying this out on points $ω_{n}^{i}$ and $ω_{n}^{\frac{n}{2} + i}$ , $i \in [0.. \frac{n}{2} - 1],$ we start to notice some symmetries:

$A (ω_{n}^{i}) A (ω_{n}^{\frac{n}{2} + i}) = A_{even} ((ω_{n}^{i})^{2}) + ω_{n}^{i} A_{odd} ((ω_{n}^{i})^{2}), = A_{even} ((ω_{n}^{\frac{n}{2} + i})^{2}) + ω_{n}^{\frac{n}{2} + i} A_{odd} ((ω_{n}^{\frac{n}{2} + i})^{2}) = A_{even} ((- ω_{n}^{i})^{2}) - ω_{n}^{i} A_{odd} ((- ω_{n}^{i})^{2}) \leftarrow (negation lemma) = A_{even} ((ω_{n}^{i})^{2}) - ω_{n}^{i} A_{odd} ((ω_{n}^{i})^{2}) .$

Notice that we are only evaluating $A_{even} (X)$ and $A_{odd} (X)$ over half the domain ${(ω_{n}^{0})^{2}, (ω_{n})^{2}, \dots, (ω_{n}^{\frac{n}{2} - 1})^{2}} = {ω_{n /2}^{i}}, i = [0.. \frac{n}{2} - 1]$ (halving lemma). This gives us all the terms we need to reconstruct $A (X)$ over the full domain ${ω^{0}, ω, \dots, ω^{n - 1}}$ : which means we have transformed a length- $n$ DFT into two length- $\frac{n}{2}$ DFTs.

We choose $n = 2^{k}$ to be a power of two (by zero-padding if needed), and apply this divide-and-conquer strategy recursively. By the Master Theorem¹, this gives us an evaluation algorithm with $O (n lo g_{2} n)$ operations, also known as the Fast Fourier Transform (FFT).

Inverse FFT

So we've evaluated our polynomials and multiplied them pointwise. What remains is to convert the product from the evaluation representation back to coefficient representation. To do this, we simply call the FFT on the evaluation representation. However, this time we also:

replace $ω^{i}$ by $ω^{- i}$ in the Vandermonde matrix, and
multiply our final result by a factor of $1/ n$ .

In other words: $A = \frac{1}{n} V_{ω^{- 1}} \cdot \hat{A} .$

(To understand why the inverse FFT has a similar form to the FFT, refer to Slide 13-1 of ². The below image was also taken from ².)

The Schwartz-Zippel lemma

The Schwartz-Zippel lemma informally states that "different polynomials are different at most points." Formally, it can be written as follows:

Let $p (x_{1}, x_{2}, \dots, x_{n})$ be a nonzero polynomial of $n$ variables with degree $d$ . Let $S$ be a finite set of numbers with at least $d$ elements in it. If we choose random $α_{1}, α_{2}, \dots, α_{n}$ from $S$ , $Pr [p (α_{1}, α_{2}, \dots, α_{n}) = 0] \leq \frac{d}{∣ S ∣} .$

In the familiar univariate case $p (X)$ , this reduces to saying that a nonzero polynomial of degree $d$ has at most $d$ roots.

The Schwartz-Zippel lemma is used in polynomial equality testing. Given two multi-variate polynomials $p_{1} (x_{1}, \dots, x_{n})$ and $p_{2} (x_{1}, \dots, x_{n})$ of degrees $d_{1}, d_{2}$ respectively, we can test if $p_{1} (α_{1}, \dots, α_{n}) - p_{2} (α_{1}, \dots, α_{n}) = 0$ for random $α_{1}, \dots, α_{n} \leftarrow S,$ where the size of $S$ is at least $∣ S ∣ \geq (d_{1} + d_{2}) .$ If the two polynomials are identical, this will always be true, whereas if the two polynomials are different then the equality holds with probability at most $\frac{m a x ( d _{1} , d _{2} )}{∣ S ∣}$ .

Vanishing polynomial

Consider the order- $n$ multiplicative subgroup $H$ with primitive root of unity $ω$ . For all $ω^{i} \in H, i \in [n - 1],$ we have $(ω^{i})^{n} = (ω^{n})^{i} = (ω^{0})^{i} = 1.$ In other words, every element of $H$ fulfils the equation

$Z_{H} (X) = X^{n} - 1 = (X - ω^{0}) (X - ω^{1}) (X - ω^{2}) \dots (X - ω^{n - 1}),$

meaning every element is a root of $Z_{H} (X) .$ We call $Z_{H} (X)$ the vanishing polynomial over $H$ because it evaluates to zero on all elements of $H .$

This comes in particularly handy when checking polynomial constraints. For instance, to check that $A (X) + B (X) = C (X)$ over $H,$ we simply have to check that $A (X) + B (X) - C (X)$ is some multiple of $Z_{H} (X)$ . In other words, if dividing our constraint by the vanishing polynomial still yields some polynomial $\frac{A ( X ) + B ( X ) - C ( X )}{Z _{H} ( X )} = H (X),$ we are satisfied that $A (X) + B (X) - C (X) = 0$ over $H .$

Lagrange basis functions

TODO: explain what a basis is in general (briefly).

Polynomials are commonly written in the monomial basis (e.g. $X, X^{2}, ... X^{n}$ ). However, when working over a multiplicative subgroup of order $n$ , we find a more natural expression in the Lagrange basis.

Consider the order- $n$ multiplicative subgroup $H$ with primitive root of unity $ω$ . The Lagrange basis corresponding to this subgroup is a set of functions ${L_{i}}_{i = 0}^{n - 1}$ , where

$L_{i} (ω^{j}) = {10 if i = j, otherwise.$

We can write this more compactly as $L_{i} (ω^{j}) = δ_{ij},$ where $δ$ is the Kronecker delta function.

Now, we can write our polynomial as a linear combination of Lagrange basis functions,

$A (X) = i = 0 \sum n - 1 a_{i} L_{i} (X), X \in H,$

which is equivalent to saying that $A (X)$ evaluates to $a_{0}$ at $ω^{0}$ , to $a_{1}$ at $ω^{1}$ , to $a_{2}$ at $ω^{2}, \dots,$ and so on.

When working over a multiplicative subgroup, the Lagrange basis function has a convenient sparse representation of the form

$L_{i} (X) = \frac{c _{i} \cdot ( X ^{n} - 1 )}{X - ω ^{i}},$

where $c_{i}$ is the barycentric weight. (To understand how this form was derived, refer to ³.) For $i = 0,$ we have $c = 1/ n ⟹ L_{0} (X) = \frac{1}{n} \frac{( X ^{n} - 1 )}{X - 1}$ .

Suppose we are given a set of evaluation points ${x_{0}, x_{1}, \dots, x_{n - 1}}$ . Since we cannot assume that the $x_{i}$ 's form a multiplicative subgroup, we consider also the Lagrange polynomials $L_{i}$ 's in the general case. Then we can construct:

$L_{i} (X) = j \neq = i \prod \frac{X - x _{j}}{x _{i} - x _{j}}, i \in [0.. n - 1] .$

Here, every $X = x_{j} \neq = x_{i}$ will produce a zero numerator term $(x_{j} - x_{j}),$ causing the whole product to evaluate to zero. On the other hand, $X = x_{i}$ will evaluate to $\frac{x _{i} - x _{j}}{x _{i} - x _{j}}$ at every term, resulting in an overall product of one. This gives the desired Kronecker delta behaviour $L_{i} (x_{j}) = δ_{ij}$ on the set ${x_{0}, x_{1}, \dots, x_{n - 1}}$ .

Lagrange interpolation

Given a polynomial in its evaluation representation

$A : {(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))},$

we can reconstruct its coefficient form in the Lagrange basis:

$A (X) = i = 0 \sum n - 1 A (x_{i}) L_{i} (X),$

where $X \in {x_{0}, x_{1}, \dots, x_{n - 1}} .$

References

Dasgupta, S., Papadimitriou, C. H., & Vazirani, U. V. (2008). "Algorithms" (ch. 2). New York: McGraw-Hill Higher Education.

Golin, M. (2016). "The Fast Fourier Transform and Polynomial Multiplication" [lecture notes], COMP 3711H Design and Analysis of Algorithms, Hong Kong University of Science and Technology.

Berrut, J. and Trefethen, L. (2004). "Barycentric Lagrange Interpolation."

Cryptographic groups

In the section Inverses and groups we introduced the concept of groups. A group has an identity and a group operation. In this section we will write groups additively, i.e. the identity is $O$ and the group operation is $+$ .

Some groups can be used as cryptographic groups. At the risk of oversimplifying, this means that the problem of finding a discrete logarithm of a group element $P$ to a given base $G$ , i.e. finding $x$ such that $P = [x] G$ , is hard in general.

Pedersen commitment

The Pedersen commitment [P99] is a way to commit to a secret message in a verifiable way. It uses two random public generators $G, H \in G,$ where $G$ is a cryptographic group of order $q$ . A random secret $r$ is chosen in $Z_{q}$ , and the message to commit to $m$ is from any subset of $Z_{q}$ . The commitment is

$c = Commit (m, r) = [m] G + [r] H .$

To open the commitment, the committer reveals $m$ and $r,$ thus allowing anyone to verify that $c$ is indeed a commitment to $m .$

Notice that the Pedersen commitment scheme is homomorphic:

$Commit (m, r) + Commit (m^{'}, r^{'}) = [m] G + [r] H + [m^{'}] G + [r^{'}] H = [m + m^{'}] G + [r + r^{'}] H = Commit (m + m^{'}, r + r^{'}) .$

Assuming the discrete log assumption holds, Pedersen commitments are also perfectly hiding and computationally binding:

hiding: the adversary chooses messages $m_{0}, m_{1} .$ The committer commits to one of these messages $c = Commit (m_{b}, r), b \in {0, 1} .$ Given $c,$ the probability of the adversary guessing the correct $b$ is no more than $\frac{1}{2}$ .
binding: the adversary cannot pick two different messages $m_{0} \neq = m_{1},$ and randomness $r_{0}, r_{1},$ such that $Commit (m_{0}, r_{0}) = Commit (m_{1}, r_{1}) .$

Vector Pedersen commitment

We can use a variant of the Pedersen commitment scheme to commit to multiple messages at once, $m = (m_{0}, \dots, m_{n - 1})$ . This time, we'll have to sample a corresponding number of random public generators $G = (G_{0}, \dots, G_{n - 1}),$ along with a single random generator $H$ as before (for use in hiding). Then, our commitment scheme is:

$Commit (m; r) = Commit ((m_{0}, \dots, m_{n - 1}); r) = [r] H + [m_{0}] G_{0} + \dots + [m_{n - 1}] G_{n - 1} = [r] H + i = 0 \sum n - 1 [m_{i}] G_{i} .$

TODO: is this positionally binding?

Diffie–Hellman

An example of a protocol that uses cryptographic groups is Diffie–Hellman key agreement [DH1976]. The Diffie–Hellman protocol is a method for two users, Alice and Bob, to generate a shared private key. It proceeds as follows:

Alice and Bob publicly agree on two prime numbers, $p$ and $G,$ where $p$ is large and $G$ is a primitive root $(mod p) .$ (Note that $g$ is a generator of the group $F_{p}^{\times} .$ )
Alice chooses a large random number $a$ as her private key. She computes her public key $A = [a] G (mod p),$ and sends $A$ to Bob.
Similarly, Bob chooses a large random number $b$ as his private key. He computes his public key $B = [b] G (mod p),$ and sends $B$ to Alice.
Now both Alice and Bob compute their shared key $K = [ab] G (mod p),$ which Alice computes as $K = [a] B (mod p) = [a] ([b] G) (mod p),$ and Bob computes as $K = [b] A (mod p) = [b] ([a] G) (mod p) .$

A potential eavesdropper would need to derive $K = [ab] g (mod p)$ knowing only $g, p, A = [a] G,$ and $B = [b] G$ : in other words, they would need to either get the discrete logarithm $a$ from $A = [a] G$ or $b$ from $B = [b] G,$ which we assume to be computationally infeasible in $F_{p}^{\times} .$

More generally, protocols that use similar ideas to Diffie–Hellman are used throughout cryptography. One way of instantiating a cryptographic group is as an elliptic curve. Before we go into detail on elliptic curves, we'll describe some algorithms that can be used for any group.

Multiscalar multiplication

TODO: Pippenger's algorithm

Reference: https://jbootle.github.io/Misc/pippenger.pdf

Elliptic curves

Elliptic curves constructed over finite fields are another important cryptographic tool.

We use elliptic curves because they provide a cryptographic group, i.e. a group in which the discrete logarithm problem (discussed below) is hard.

There are several ways to define the curve equation, but for our purposes, let $F_{p}$ be a large (255-bit) field, and then let the set of solutions $(x, y)$ to $y^{2} = x^{3} + b$ for some constant $b$ define the $F_{p}$ -rational points on an elliptic curve $E (F_{p})$ . These $(x, y)$ coordinates are called "affine coordinates". Each of the $F_{p}$ -rational points, together with a "point at infinity" $O$ that serves as the group identity, can be interpreted as an element of a group. By convention, elliptic curve groups are written additively.

"Three points on a line sum to zero, which is the point at infinity."

The group addition law is simple: to add two points together, find the line that intersects both points and obtain the third point, and then negate its $y$ -coordinate. The case that a point is being added to itself, called point doubling, requires special handling: we find the line tangent to the point, and then find the single other point that intersects this line and then negate. Otherwise, in the event that a point is being "added" to its negation, the result is the point at infinity.

The ability to add and double points naturally gives us a way to scale them by integers, called scalars. The number of points on the curve is the group order. If this number is a prime $q$ , then the scalars can be considered as elements of a scalar field, $F_{q}$ .

Elliptic curves, when properly designed, have an important security property. Given two random elements $G, H \in E (F_{p})$ finding $a$ such that $[a] G = H$ , otherwise known as the discrete log of $H$ with respect to $G$ , is considered computationally infeasible with classical computers. This is called the elliptic curve discrete log assumption.

If an elliptic curve group $G$ has prime order $q$ (like the ones used in Halo 2), then it is a finite cyclic group. Recall from the section on groups that this implies it is isomorphic to $Z / q Z$ , or equivalently, to the scalar field $F_{q}$ . Each possible generator $G$ fixes the isomorphism; then an element on the scalar side is precisely the discrete log of the corresponding group element with respect to $G$ . In the case of a cryptographically secure elliptic curve, the isomorphism is hard to compute in the $G \to F_{q}$ direction because the elliptic curve discrete log problem is hard.

It is sometimes helpful to make use of this isomorphism by thinking of group-based cryptographic protocols and algorithms in terms of the scalars instead of in terms of the group elements. This can make proofs and notation simpler.

For instance, it has become common in papers on proof systems to use the notation $[x]$ to denote a group element with discrete log $x$ , where the generator is implicit.

We also used this idea in the "distinct-x theorem", in order to prove correctness of optimizations for elliptic curve scalar multiplication in Sapling, and an endomorphism-based optimization in Appendix C of the original Halo paper.

Curve arithmetic

Point doubling

The simplest situation is doubling a point $(x_{0}, y_{0})$ . Continuing with our example $y^{2} = x^{3} + b$ , this is done first by computing the derivative $λ = \frac{d y}{d x} = \frac{3 x ^{2}}{2 y} .$

To obtain expressions for $(x_{1}, y_{1}) = (x_{0}, y_{0}) + (x_{0}, y_{0}),$ we consider

$\frac{- y _{1} - y _{0}}{x _{1} - x _{0}} = λ ⟹ - y_{1} = λ (x_{1} - x_{0}) + y_{0} ⟹ y_{1} = λ (x_{0} - x_{1}) - y_{0} .$

To get the expression for $x_{1},$ we substitute $y = λ (x_{0} - x) - y_{0}$ into the elliptic curve equation:

$y^{2} = x^{3} + b ⟹ (λ (x_{0} - x) - y_{0})^{2} = x^{3} + b ⟹ x^{3} - λ^{2} x^{2} + \dots = 0 \leftarrow (rearranging terms) = (x - x_{0}) (x - x_{0}) (x - x_{1}) \leftarrow (known roots x_{0}, x_{0}, x_{1}) = x^{3} - (x_{0} + x_{0} + x_{1}) x^{2} + \dots .$

Comparing coefficients for the $x^{2}$ term gives us $λ^{2} = x_{0} + x_{0} + x_{1} ⟹ x_{1} = λ^{2} - 2 x_{0} .$

Projective coordinates

This unfortunately requires an expensive inversion of $2 y$ . We can avoid this by arranging our equations to "defer" the computation of the inverse, since we often do not need the actual affine $(x^{'}, y^{'})$ coordinate of the resulting point immediately after an individual curve operation. Let's introduce a third coordinate $Z$ and scale our curve equation by $Z^{3}$ like so:

$Z^{3} y^{2} = Z^{3} x^{3} + Z^{3} b$

Our original curve is just this curve at the restriction $Z = 1$ . If we allow the affine point $(x, y)$ to be represented by $X = x Z$ , $Y = y Z$ and $Z \neq = 0$ then we have the homogenous projective curve

$Y^{2} Z = X^{3} + Z^{3} b .$

Obtaining $(x, y)$ from $(X, Y, Z)$ is as simple as computing $(X / Z, Y / Z)$ when $Z \neq = 0$ . (When $Z = 0,$ we are dealing with the point at infinity $O := (0 : 1 : 0)$ .) In this form, we now have a convenient way to defer the inversion required by doubling a point. The general strategy is to express $x^{'}, y^{'}$ as rational functions using $x = X / Z$ and $y = Y / Z$ , rearrange to make their denominators the same, and then take the resulting point $(X, Y, Z)$ to have $Z$ be the shared denominator and $X = x^{'} Z, Y = y^{'} Z$ .

Projective coordinates are often, but not always, more efficient than affine coordinates. There may be exceptions to this when either we have a different way to apply Montgomery's trick, or when we're in the circuit setting where multiplications and inversions are about equally as expensive (at least in terms of circuit size).

The following shows an example of doubling a point $(X, Y, Z) = (x Z, y Z, Z)$ without an inversion. Substituting with $X, Y, Z$ gives us $λ = \frac{3 x ^{2}}{2 y} = \frac{3 ( X / Z ) ^{2}}{2 ( Y / Z )} = \frac{3 X ^{2}}{2 Y Z}$

and gives us $x^{'} y^{'} = λ^{2} - 2 x = λ^{2} - \frac{2 X}{Z} = \frac{9 X ^{4}}{4 Y ^{2} Z ^{2}} - \frac{2 X}{Z} = \frac{9 X ^{4} - 8 X Y ^{2} Z}{4 Y ^{2} Z ^{2}} = \frac{18 X ^{4} Y Z - 16 X Y ^{3} Z ^{2}}{8 Y ^{3} Z ^{3}} = λ (x - x^{'}) - y = λ (\frac{X}{Z} - \frac{9 X ^{4} - 8 X Y ^{2} Z}{4 Y ^{2} Z ^{2}}) - \frac{Y}{Z} = \frac{3 X ^{2}}{2 Y Z} (\frac{X}{Z} - \frac{9 X ^{4} - 8 X Y ^{2} Z}{4 Y ^{2} Z ^{2}}) - \frac{Y}{Z} = \frac{3 X ^{3}}{2 Y Z ^{2}} - \frac{27 X ^{6} - 24 X ^{3} Y ^{2} Z}{8 Y ^{3} Z ^{3}} - \frac{Y}{Z} = \frac{12 X ^{3} Y ^{2} Z - 8 Y ^{4} Z ^{2} - 27 X ^{6} + 24 X ^{3} Y ^{2} Z}{8 Y ^{3} Z ^{3}}$

Notice how the denominators of $x^{'}$ and $y^{'}$ are the same. Thus, instead of computing $(x^{'}, y^{'})$ we can compute $(X, Y, Z)$ with $Z = 8 Y^{3} Z^{3}$ and $X, Y$ set to the corresponding numerators such that $X / Z = x^{'}$ and $Y / Z = y^{'}$ . This completely avoids the need to perform an inversion when doubling, and something analogous to this can be done when adding two distinct points.

Point addition

We now add two points with distinct $x$ -coordinates, $P = (x_{0}, y_{0})$ and $Q = (x_{1}, y_{1}),$ where $x_{0} \neq = x_{1},$ to obtain $R = P + Q = (x_{2}, y_{2}) .$ The line $\overline{PQ}$ has slope $λ = \frac{y _{1} - y _{0}}{x _{1} - x _{0}} ⟹ y - y_{0} = λ \cdot (x - x_{0}) .$

Using the expression for $\overline{PQ}$ , we compute $y$ -coordinate $- y_{2}$ of $- R$ as: $- y_{2} - y_{0} = λ \cdot (x_{2} - x_{0}) ⟹ y_{2} = λ (x_{0} - x_{2}) - y_{0} .$

Plugging the expression for $\overline{PQ}$ into the curve equation $y^{2} = x^{3} + b$ yields $y^{2} = x^{3} + b ⟹ (λ \cdot (x - x_{0}) + y_{0})^{2} = x^{3} + b ⟹ x^{3} - λ^{2} x^{2} + \dots = 0 \leftarrow (rearranging terms) = (x - x_{0}) (x - x_{1}) (x - x_{2}) \leftarrow (known roots x_{0}, x_{1}, x_{2}) = x^{3} - (x_{0} + x_{1} + x_{2}) x^{2} + \dots .$

Comparing coefficients for the $x^{2}$ term gives us $λ^{2} = x_{0} + x_{1} + x_{2} ⟹ x_{2} = λ^{2} - x_{0} - x_{1}$ .

Important notes:

There exist efficient formulae¹ for point addition that do not have edge cases (so-called "complete" formulae) and that unify the addition and doubling cases together. The result of adding a point to its negation using those formulae produces $Z = 0$ , which represents the point at infinity.
In addition, there are other models like the Jacobian representation where $(x, y) = (x Z^{2}, y Z^{3}, Z)$ where the curve is rescaled by $Z^{6}$ instead of $Z^{3}$ , and this representation has even more efficient arithmetic but no unified/complete formulae.
We can easily compare two curve points $(X_{1}, Y_{1}, Z_{1})$ and $(X_{2}, Y_{2}, Z_{2})$ for equality in the homogenous projective coordinate space by "homogenizing" their Z-coordinates; the checks become $X_{1} Z_{2} = X_{2} Z_{1}$ and $Y_{1} Z_{2} = Y_{2} Z_{1}$ .

Curve endomorphisms

Imagine that $F_{p}$ has a primitive cube root of unity, or in other words that $3∣ p - 1$ and so an element $ζ_{p}$ generates a $3$ -order multiplicative subgroup. Notice that a point $(x, y)$ on our example elliptic curve $y^{2} = x^{3} + b$ has two cousin points: $(ζ_{p} x, y), (ζ_{p}^{2} x, y)$ , because the computation $x^{3}$ effectively kills the $ζ$ component of the $x$ -coordinate. Applying the map $(x, y) \mapsto (ζ_{p} x, y)$ is an application of an endomorphism over the curve. The exact mechanics involved are complicated, but when the curve has a prime $q$ number of points (and thus a prime "order") the effect of the endomorphism is to multiply the point by a scalar in $F_{q}$ which is also a primitive cube root $ζ_{q}$ in the scalar field.

Curve point compression

Given a point on the curve $P = (x, y)$ , we know that its negation $- P = (x, - y)$ is also on the curve. To uniquely specify a point, we need only encode its $x$ -coordinate along with the sign of its $y$ -coordinate.

Serialization

As mentioned in the Fields section, we can interpret the least significant bit of a field element as its "sign", since its additive inverse will always have the opposite LSB. So we record the LSB of the $y$ -coordinate as sign.

Pallas and Vesta are defined over the $F_{p}$ and $F_{q}$ fields, which elements can be expressed in $255$ bits. This conveniently leaves one unused bit in a 32-byte representation. We pack the $y$ -coordinate sign bit into the highest bit in the representation of the $x$ -coordinate:

         <----------------------------------- x --------------------------------->
Enc(P) = [_ _ _ _ _ _ _ _] [_ _ _ _ _ _ _ _] ... [_ _ _ _ _ _ _ _] [_ _ _ _ _ _ _ sign]
          ^                <------------------------------------->                 ^
         LSB                              30 bytes                                MSB

The "point at infinity" $O$ that serves as the group identity, does not have an affine $(x, y)$ representation. However, it turns out that there are no points on either the Pallas or Vesta curve with $x = 0$ or $y = 0$ . We therefore use the "fake" affine coordinates $(0, 0)$ to encode $O$ , which results in the all-zeroes 32-byte array.

Deserialization

When deserializing a compressed curve point, we first read the most significant bit as ysign, the sign of the $y$ -coordinate. Then, we set this bit to zero to recover the original $x$ -coordinate.

If $x = 0, y = 0,$ we return the "point at infinity" $O$ . Otherwise, we proceed to compute $y = x^{3} + b .$ Here, we read the least significant bit of $y$ as sign. If sign == ysign, we already have the correct sign and simply return the curve point $(x, y)$ . Otherwise, we negate $y$ and return $(x, - y)$ .

Cycles of curves

Let $E_{p}$ be an elliptic curve over a finite field $F_{p},$ where $p$ is a prime. We denote this by $E_{p} / F_{p} .$ and we denote the group of points of $E_{p}$ over $F_{p},$ with order $q = # E (F_{p}) .$ For this curve, we call $F_{p}$ the "base field" and $F_{q}$ the "scalar field".

We instantiate our proof system over the elliptic curve $E_{p} / F_{p}$ . This allows us to prove statements about $F_{q}$ -arithmetic circuit satisfiability.

(aside) If our curve $E_{p}$ is over $F_{p},$ why is the arithmetic circuit instead in $F_{q}$ ? The proof system is basically working on encodings of the scalars in the circuit (or more precisely, commitments to polynomials whose coefficients are scalars). The scalars are in $F_{q}$ when their encodings/commitments are elliptic curve points in $E_{p} / F_{p}$ .

However, most of the verifier's arithmetic computations are over the base field $F_{p},$ and are thus efficiently expressed as an $F_{p}$ -arithmetic circuit.

(aside) Why are the verifier's computations (mainly) over $F_{p}$ ? The Halo 2 verifier actually has to perform group operations using information output by the circuit. Group operations like point doubling and addition use arithmetic in $F_{p}$ , because the coordinates of points are in $F_{p} .$

This motivates us to construct another curve with scalar field $F_{p}$ , which has an $F_{p}$ -arithmetic circuit that can efficiently verify proofs from the first curve. As a bonus, if this second curve had base field $E_{q} / F_{q},$ it would generate proofs that could be efficiently verified in the first curve's $F_{q}$ -arithmetic circuit. In other words, we instantiate a second proof system over $E_{q} / F_{q},$ forming a 2-cycle with the first:

TODO: Pallas-Vesta curves

Reference: https://github.com/zcash/pasta

Hashing to curves

Sometimes it is useful to be able to produce a random point on an elliptic curve $E_{p} / F_{p}$ corresponding to some input, in such a way that no-one will know its discrete logarithm (to any other base).

This is described in detail in the Internet draft on Hashing to Elliptic Curves. Several algorithms can be used depending on efficiency and security requirements. The framework used in the Internet Draft makes use of several functions:

hash_to_field: takes a byte sequence input and maps it to a element in the base field $F_{p}$
map_to_curve: takes an $F_{p}$ element and maps it to $E_{p}$ .

TODO: Simplified SWU

Reference: https://eprint.iacr.org/2019/403.pdf

References

Renes, J., Costello, C., & Batina, L. (2016, May). "Complete addition formulas for prime order elliptic curves." In Annual International Conference on the Theory and Applications of Cryptographic Techniques (pp. 403-428). Springer, Berlin, Heidelberg.

Polynomial commitment using inner product argument

We want to commit to some polynomial $p (X) \in F_{p} [X]$ , and be able to provably evaluate the committed polynomial at arbitrary points. The naive solution would be for the prover to simply send the polynomial's coefficients to the verifier: however, this requires $O (n)$ communication. Our polynomial commitment scheme gets the job done using $O (lo g n)$ communication.

`Setup`

Given a parameter $d = 2^{k},$ we generate the common reference string $σ = (G, G, H, F_{p})$ defining certain constants for this scheme:

$G$ is a group of prime order $p;$
$G \in G^{d}$ is a vector of $d$ random group elements;
$H \in G$ is a random group element; and
$F_{p}$ is the finite field of order $p .$

`Commit`

The Pedersen vector commitment $Commit$ is defined as

$Commit (σ, p (X); r) = ⟨ a, G ⟩ + [r] H,$

for some polynomial $p (X) \in F_{p} [X]$ and some blinding factor $r \in F_{p} .$ Here, each element of the vector $a_{i} \in F_{p}$ is the coefficient for the $i$ th degree term of $p (X),$ and $p (X)$ is of maximal degree $d - 1.$

`Open` (prover) and `OpenVerify` (verifier)

The modified inner product argument is an argument of knowledge for the relation

${((P, x, v); (a, r)) : P = ⟨ a, G ⟩ + [r] H, v = ⟨ a, b ⟩},$

where $b = (1, x, x^{2}, \dots, x^{d - 1})$ is composed of increasing powers of the evaluation point $x .$ This allows a prover to demonstrate to a verifier that the polynomial contained “inside” the commitment $P$ evaluates to $v$ at $x,$ and moreover, that the committed polynomial has maximum degree $d - 1.$

The inner product argument proceeds in $k = lo g_{2} d$ rounds. For our purposes, it is sufficient to know about its final outputs, while merely providing intuition about the intermediate rounds. (Refer to Section 3 in the Halo paper for a full explanation.)

Before beginning the argument, the verifier selects a random group element $U$ and sends it to the prover. We initialize the argument at round $k,$ with the vectors $a^{(k)} := a,$ $G^{(k)} := G$ and $b^{(k)} := b .$ In each round $j = k, k - 1, \dots, 1$ :

the prover computes two values $L_{j}$ and $R_{j}$ by taking some inner product of $a^{(j)}$ with $G^{(j)}$ and $b^{(j)}$ . Note that are in some sense "cross-terms": the lower half of $a$ is used with the higher half of $G$ and $b$ , and vice versa:

$L_{j} R_{j} = ⟨ a_{lo}^{(j)}, G_{hi}^{(j)} ⟩ + [l_{j}] H + [⟨ a_{lo}^{(j)}, b_{hi}^{(j)} ⟩] U = ⟨ a_{hi}^{(j)}, G_{lo}^{(j)} ⟩ + [r_{j}] H + [⟨ a_{hi}^{(j)}, b_{lo}^{(j)} ⟩] U$

the verifier issues a random challenge $u_{j}$ ;
the prover uses $u_{j}$ to compress the lower and higher halves of $a^{(j)}$ , thus producing a new vector of half the original length $a^{(j - 1)} = a_{hi}^{(j)} + a_{lo}^{(j)} \cdot u_{j}^{- 1} .$ The vectors $G^{(j)}$ and $b^{(j)}$ are similarly compressed to give $G^{(j - 1)}$ and $b^{(j - 1)}$ (using $u_{j}$ instead of $u_{j}^{- 1}$ ).
$a^{(j - 1)}$ , $G^{(j - 1)}$ and $b^{(j - 1)}$ are input to the next round $j - 1.$

Note that at the end of the last round $j = 1,$ we are left with $a := a^{(0)}$ , $G := G^{(0)}$ , $b := b^{(0)},$ each of length 1. The intuition is that these final scalars, together with the challenges ${u_{j}}$ and "cross-terms" ${L_{j}, R_{j}}$ from each round, encode the compression in each round. Since the prover did not know the challenges $U, {u_{j}}$ in advance, they would have been unable to manipulate the round compressions. Thus, checking a constraint on these final terms should enforce that the compression had been performed correctly, and that the original $a$ satisfied the relation before undergoing compression.

Note that $G, b$ are simply rearrangements of the publicly known $G, b,$ with the round challenges ${u_{j}}$ mixed in: this means the verifier can compute $G, b$ independently and verify that the prover had provided those same values.

Recursion

Alternative terms: Induction; Accumulation scheme; Proof-carrying data

However, the computation of $G$ requires a length- $2^{k}$ multiexponentiation $⟨ G, s ⟩,$ where $s$ is composed of the round challenges $u_{1}, \dots, u_{k}$ arranged in a binary counting structure. This is the linear-time computation that we want to amortise across a batch of proof instances. Instead of computing $G,$ notice that we can express $G$ as a commitment to a polynomial

$G = Commit (σ, g (X, u_{1}, \dots, u_{k})),$

where $g (X, u_{1}, \dots, u_{k}) := \prod_{i = 1}^{k} (u_{i} + u_{i}^{- 1} X^{2^{i - 1}})$ is a polynomial with degree $2^{k} - 1.$


	Since $G$ is a commitment, it can be checked in an inner product argument. The verifier circuit witnesses $G$ and brings $G, u_{1}, \dots, u_{k}$ out as public inputs to the proof $π .$ The next verifier instance checks $π$ using the inner product argument; this includes checking that $G = Commit (g (X, u_{1}, \dots, u_{k}))$ evaluates at some random point to the expected value for the given challenges $u_{1}, \dots, u_{k} .$ Recall from the previous section that this check only requires $lo g d$ work. At the end of checking $π$ and $G,$ the circuit is left with a new $G^{'},$ along with the $u_{1}^{'}, \dots, u_{k}^{'}$ challenges sampled for the check. To fully accept $π$ as valid, we should perform a linear-time computation of $G^{'} = ⟨ G, s^{'} ⟩$ . Once again, we delay this computation by witnessing $G^{'}$ and bringing $G^{'}, u_{1}^{'}, \dots, u_{k}^{'}$ out as public inputs to the proof $π^{'} .$ This goes on from one proof instance to the next, until we are satisfied with the size of our batch of proofs. We finally perform a single linear-time computation, thus deciding the validity of the whole batch.

Since

G

is a commitment, it can be checked in an inner product argument. The verifier circuit witnesses

G

and brings

G, u_{1}, \dots, u_{k}

out as public inputs to the proof

π .

The next verifier instance checks

π

using the inner product argument; this includes checking that

G = Commit (g (X, u_{1}, \dots, u_{k}))

evaluates at some random point to the expected value for the given challenges

u_{1}, \dots, u_{k} .

Recall from the previous section that this check only requires

lo g d

work.

At the end of checking

π

and

G,

the circuit is left with a new

G^{'},

along with the

u_{1}^{'}, \dots, u_{k}^{'}

challenges sampled for the check. To fully accept

π

as valid, we should perform a linear-time computation of

G^{'} = ⟨ G, s^{'} ⟩

. Once again, we delay this computation by witnessing

G^{'}

and bringing

G^{'}, u_{1}^{'}, \dots, u_{k}^{'}

out as public inputs to the proof

π^{'} .

This goes on from one proof instance to the next, until we are satisfied with the size of our batch of proofs. We finally perform a single linear-time computation, thus deciding the validity of the whole batch.

We recall from the section Cycles of curves that we can instantiate this protocol over a two-cycle, where a proof produced by one curve is efficiently verified in the circuit of the other curve. However, some of these verifier checks can actually be efficiently performed in the native circuit; these are "deferred" to the next native circuit (see diagram below) instead of being immediately passed over to the other curve.

The halo2 Book