# Variable-base scalar multiplication

In the Orchard circuit we need to check where and the scalar field is with .

We have and , for .

## Witness scalar

We're trying to compute for . Set and . Then we can compute

provided that , i.e. which covers the whole range we need because in fact .

Thus, given a scalar , we witness the boolean decomposition of (We use big-endian bit order for convenient input into the variable-base scalar multiplication algorithm.)

## Variable-base scalar multiplication

We use an optimized double-and-add algorithm, copied from "Faster variable-base scalar multiplication in zk-SNARK circuits" with some variable name changes:

Acc :=  T
for i from n-1 down to 0 {
P := k_{i+1} ? T : −T
Acc := (Acc + P) + Acc
}
return (k_0 = 0) ? (Acc - T) : Acc


It remains to check that the x-coordinates of each pair of points to be added are distinct.

When adding points in a prime-order group, we can rely on Theorem 3 from Appendix C of the Halo paper, which says that if we have two such points with nonzero indices wrt a given odd-prime order base, where the indices taken in the range are distinct disregarding sign, then they have different x-coordinates. This is helpful, because it is easier to reason about the indices of points occurring in the scalar multiplication algorithm than it is to reason about their x-coordinates directly.

So, the required check is equivalent to saying that the following "indexed version" of the above algorithm never asserts:

acc := 2
for i from n-1 down to 0 {
p = k_{i+1} ? 1 : −1
assert acc ≠ ± p
assert (acc + p) ≠ acc    // X
acc := (acc + p) + acc
assert 0 < acc ≤ (q-1)/2
}
if k_0 = 0 {
assert acc ≠ 1
acc := acc - 1
}


The maximum value of acc is:

    <--- n 1s --->
1011111...111111
= 1100000...000000 - 1


=

The assertion labelled X obviously cannot fail because . It is possible to see that acc is monotonically increasing except in the last conditional. It reaches its largest value when is maximal, i.e. .

So to entirely avoid exceptional cases, we would need . But we can use larger by if the last iterations use complete addition.

The first for which the algorithm using only incomplete addition fails is going to be , since . We need to make the wraparound technique above work.

sage: q = 0x40000000000000000000000000000000224698fc0994a8dd8c46eb2100000001
sage: 2^253 + 2^252 - 1 < (q-1)//2
False
sage: 2^252 + 2^251 - 1 < (q-1)//2
True


So the last three iterations of the loop () need to use complete addition, as does the conditional subtraction at the end. Writing this out using ⸭ for incomplete addition (as we do in the spec), we have:

Acc :=  T
for i from 253 down to 3 {
P := k_{i+1} ? T : −T
Acc := (Acc ⸭ P) ⸭ Acc
}
for i from 2 down to 0 {
P := k_{i+1} ? T : −T
Acc := (Acc + P) + Acc  // complete addition
}
return (k_0 = 0) ? (Acc + (-T)) : Acc  // complete addition


## Constraint program for optimized double-and-add

Define a running sum , where and:

The helper . After substitution of , and , this becomes:

Here, is assigned to a cell. This is unlike previous 's, which were implicitly derived from , but never actually assigned.

The bits are used in three further steps, using complete addition:

If the least significant bit we set otherwise we set . Then we return using complete addition.

Let

Output .

(Note that represents .)

We need six advice columns to witness . However, since are the same, we can perform two incomplete additions in a single row, reusing the same . We split the scalar bits used in incomplete addition into and halves and process them in parallel. This means that we effectively have two for loops:

• the first, covering the half for from down to , with a special case at ; and
• the second, covering the half for the remaining from down to , with a special case at .

For each and half, we have three sets of gates. Note that is going from ; is NOT indexing the rows.

This gate is only used on the first row (before the for loop). We check that are initialized to values consistent with the initial where

This gate is used on all rows corresponding to the for loop except the last.

where

This gate is used on the final iteration of the for loop, handling the special case where we check that the output has been witnessed correctly. where

We reuse the complete addition constraints to implement the final rounds of double-and-add. This requires two rows per round because we need 9 advice columns for each complete addition. In the 10th advice column we stash the other cells that we need to correctly implement the double-and-add:

• The base coordinate, so we can conditionally negate it as input to one of the complete additions.
• The running sum, which we constrain over two rows instead of sequentially.

### Constraints

In addition to the complete addition constraints, we define the following gate:

where .

where .

## Overflow check

cannot overflow for any , because it is a weighted sum of bits only up to , which is smaller than (and also ).

However, can overflow .

Since overflow can only occur in the final step that constrains , we have . It is then sufficient to also check that (so that ) and that . These conditions together imply that as an integer, and so as required.

Note: the bits do not represent a value reduced modulo , but rather a representation of the unreduced .

### Optimized check for

Since , we have

We may assume that .

Therefore,

Given , we prove equivalence of and as follows:

• shift the range by to give ;
• observe that is guaranteed to be in and therefore cannot overflow or underflow modulo ;
• using the fact that , observe that .

(We can see in a different way that this is correct by observing that it checks whether , so the upper bound is aligned as we would expect.)

Now, we can continue optimizing from :

Constraining to be all- or not-all- can be implemented almost "for free", as follows.

Recall that , so we have:

So are all exactly when .

Finally, we can merge the -bit decompositions for the and cases by checking that .

### Overflow check constraints

Let . The constraints for the overflow check are:

Define

Witness , and decompose as .

Then the needed gates are:

where can be computed by another running sum. Note that the factor of has no effect on the constraint, since the RHS is zero.

#### Running sum range check

We make use of a -bit lookup range check in the circuit to subtract the low bits of . The range check subtracts the first bits of and right-shifts the result to give