# Sinsemilla

## Overview

Sinsemilla is a collision-resistant hash function and commitment scheme designed to be efficient in algebraic circuit models that support lookups, such as PLONK or Halo 2.

The security properties of Sinsemilla are similar to Pedersen hashes; it is **not** designed to be used where a random oracle, PRF, or preimage-resistant hash is required. **The only claimed security property of the hash function is collision-resistance for fixed-length inputs.**

Sinsemilla is roughly 4 times less efficient than the algebraic hashes Rescue and Poseidon inside a circuit, but around 19 times more efficient than Rescue outside a circuit. Unlike either of these hashes, the collision resistance property of Sinsemilla can be proven based on cryptographic assumptions that have been well-established for at least 20 years. Sinsemilla can also be used as a computationally binding and perfectly hiding commitment scheme.

The general approach is to split the message into $k$-bit pieces, and for each piece, select from a table of $2_{k}$ bases in our cryptographic group. We combine the selected bases using a double-and-add algorithm. This ends up being provably as secure as a vector Pedersen hash, and makes advantageous use of the lookup facility supported by Halo 2.

## Description

This section is an outline of how Sinsemilla works: for the normative specification, refer to §5.4.1.9 Sinsemilla Hash Function in the protocol spec. The incomplete point addition operator, ⸭, that we use below is also defined there.

Let $G$ be a cryptographic group of prime order $q$. We write $G$ additively, with identity $O$, and using $[m]P$ for scalar multiplication of $P$ by $m$.

Let $k≥1$ be an integer chosen based on efficiency considerations (the table size will be $2_{k}$). Let $n$ be an integer, fixed for each instantiation, such that messages are $kn$ bits, where $2_{n}≤2q−1 $. We use zero-padding to the next multiple of $k$ bits if necessary.

$Setup$: Choose $Q$ and $P[0..2_{k}−1]$ as $2_{k}+1$ independent, verifiably random generators of $G$, using a suitable hash into $G$, such that none of $Q$ or $P[0..2_{k}−1]$ are $O$.

In Orchard, we define $Q$ to be dependent on a domain separator $D$. The protocol specification uses $Q(D)$ in place of $Q$ and $S(m)$ in place of $P[m]$.

$Hash(M)$:

- Split $M$ into $n$ groups of $k$ bits. Interpret each group as a $k$-bit little-endian integer $m_{i}$.
- let $Acc_{0}:=Q$
- for $i$ from $0$ up to $n−1$:
- let $Acc_{i+1}:=(Acc_{i}⸭P[m_{i+1}])⸭Acc_{i}$

- return $Acc_{n}$

Let $ShortHash(M)$ be the $x$-coordinate of $Hash(M)$. (This assumes that $G$ is a prime-order elliptic curve in short Weierstrass form, as is the case for Pallas and Vesta.)

It is slightly more efficient to express a double-and-add $[2]A+R$ as $(A+R)+A$. We also use incomplete additions: it is shown in the Sinsemilla security argument that in the case where $G$ is a prime-order short Weierstrass elliptic curve, an exceptional case for addition would lead to finding a discrete logarithm, which can be assumed to occur with negligible probability even for adversarial input.

### Use as a commitment scheme

Choose another generator $H$ independently of $Q$ and $P[0..2_{k}−1]$.

The randomness $r$ for a commitment is chosen uniformly on $[0,q)$.

Let $Commit_{r}(M)=Hash(M)⸭[r]H$.

Let $ShortCommit_{r}(M)$ be the $x-coordinate$ of $Commit_{r}(M)$. (This again assumes that $G$ is a prime-order elliptic curve in short Weierstrass form.)

Note that unlike a simple Pedersen commitment, this commitment scheme ($Commit$ or $ShortCommit$) is not additively homomorphic.

## Efficient implementation

The aim of the design is to optimize the number of bits that can be processed for each step of the algorithm (which requires a doubling and addition in $G$) for a given table size. Using a single table of size $2_{k}$ group elements, we can process $k$ bits at a time.

### Incomplete addition

In each step of Sinsemilla we want to compute $A_{i+1}:=(A_{i}⸭P_{i})⸭A_{i}$. Let $R_{i}:=A_{i}⸭P_{i}$ be the intermediate result such that $A_{i+1}:=A_{i}⸭R_{i}$. Recalling the incomplete addition formulae:

$x_{3}y_{3} =(x_{1}−x_{2}y_{1}−y_{2} )_{2}−x_{1}−x_{2}=x_{1}−x_{2}y_{1}−y_{2} ⋅(x_{1}−x_{3})−y_{1} $

Let $λ=x_{1}−x_{2}y_{1}−y_{2} $. Substituting the coordinates for each of the incomplete additions in turn, and rearranging, we get

$λ_{1,i}x_{R,i}y_{R,i} =x_{A,i}−x_{P,i}y_{A,i}−y_{P,i} ⟹y_{A,i}−y_{P,i}=λ_{1,i}⋅(x_{A,i}−x_{P,i})⟹y_{P,i}=y_{A,i}−λ_{1,i}⋅(x_{A,i}−x_{P,i})=λ_{1,i}−x_{A,i}−x_{P,i}=λ_{1,i}⋅(x_{A,i}−x_{R,i})−y_{A,i} $ and $λ_{2,i}x_{A,i+1}y_{A,i+1} =x_{A,i}−x_{R,i}y_{A,i}−y_{R,i} ⟹y_{A,i}−y_{R,i}=λ_{2,i}⋅(x_{A,i}−x_{R,i})⟹y_{A,i}−(λ_{1,i}⋅(x_{A,i}−x_{R,i})−y_{A,i})=λ_{2,i}⋅(x_{A,i}−x_{R,i})⟹2⋅y_{A,i}=(λ_{1,i}+λ_{2,i})⋅(x_{A,i}−x_{R,i})=λ_{2,i}−x_{A,i}−x_{R,i}=λ_{2,i}⋅(x_{A,i}−x_{A,i+1})−y_{A,i}. $

### Constraint program

Let $P={(j,x_{P[j]},y_{P[j]})forj∈{0..2_{k}−1}}$.

Input: $m_{1..=n}$. (The message words are 1-indexed here, as in the protocol spec, but we start the loop from $i=0$ so that $(x_{A,i},y_{A,i})$ corresponds to $Acc_{i}$ in the protocol spec.)

Output: $(x_{A,n},y_{A,n})$.

- $(x_{A,0},y_{A,0})=Q$
- for $i$ from $0$ up to $n−1$:
- $y_{P,i}=y_{A,i}−λ_{1,i}⋅(x_{A,i}−x_{P,i})$
- $x_{R,i}=λ_{1,i}−x_{A,i}−x_{P,i}$
- $2⋅y_{A,i}=(λ_{1,i}+λ_{2,i})⋅(x_{A,i}−x_{R,i})$
- $(m_{i+1},x_{P,i},y_{P,i})∈P$
- $λ_{2,i}=x_{A,i+1}+x_{R,i}+x_{A,i}$
- $λ_{2,i}⋅(x_{A,i}−x_{A,i+1})=y_{A,i}+y_{A,i+1}$

## PLONK / Halo 2 constraints

### Message decomposition

We have an $n$-bit message $m=m_{1}+2_{k}m_{2}+...+2_{k⋅(n−1)}m_{n}$. (Note that the message words are 1-indexed as in the protocol spec.)

Initialise the running sum $z_{0}=α$ and define $z_{i+1}:=2_{K}z_{i}−m_{i+1} $. We will end up with $z_{n}=0.$

Rearranging gives us an expression for each word of the original message $m_{i+1}=z_{i}−2_{k}⋅z_{i+1}$, which we can look up in the table. We position $z_{i}$ and $z_{i+1}$ in adjacent rows of the same column, so we can sequentially apply the constraint across the entire message.

In other words, $z_{n−i}=h=0∑i−1 2_{kh}⋅m_{h+1}$.

For a little-endian decomposition as used here, the running sum is initialized to the scalar and ends at 0. For a big-endian decomposition as used in variable-base scalar multiplication, the running sum would start at 0 and end with recovering the original scalar.

### Efficient packing

The running sum only applies to message words within a single field element. That means if $n≥PrimeField::NUM_BITS$ then we will need several disjoint running sums. A longer message can be constructed by splitting the message words across several field elements, and then running several instances of the constraints below.

The expression for $m_{i+1}$ above requires $n+1$ rows in the $z_{i}$ column, leaving a one-row gap in adjacent columns and making $Acc_{i}$ tricker to accumulate. In order to support chaining multiple field elements without a gap, we use a slightly more complicated expression for $m_{i+1}$ that includes a selector:

$m_{i+1}=z_{i}−2_{k}⋅q_{run,i}⋅z_{i+1}$

This effectively forces $z_{n}$ to zero for the last step of each element, which allows the cell that would have been $z_{n}$ to be used to reinitialize the running sum for the next element.

With this sorted out, the incomplete addition accumulator can eliminate $y_{A,i}$ almost entirely, by substituting for $x$ and $λ$ values in the current and next rows. The two exceptions are at the start of Sinsemilla (where we need to constrain the accumulator to have initial value $Q$), and the end (where we need to witness $y_{A,n}$ for use outside of Sinsemilla).

### Selectors

We need a total of four logical selectors to:

- Control the Sinsemilla gate and lookup.
- Distinguish between the last message word in a running sum and its earlier words.
- Mark the start of Sinsemilla.
- Mark the end of Sinsemilla.

We use regular selector columns for the Sinsemilla gate selector $q_{S1}$ and Sinsemilla start selector $q_{S4}.$ The other two selectors are synthesized from a single fixed column $q_{S2}$ as follows:

$q_{S3}q_{run} =q_{S2}⋅(q_{S2}−1)=q_{S2}−q_{S3} $

$q_{S2}012 q_{S3}002 q_{run}010 $

We set $q_{S2}$ to $1$ on most Sinsemilla rows, and $0$ for the last step of each element, except for the last element where it is set to $2$. We can then use $q_{S3}$ to toggle between constraining the substituted $y_{A,i+1}$ on adjacent rows, and the witnessed $y_{A,n}$ at the end of Sinsemilla:

$λ_{2,i}⋅(x_{A,i}−x_{A,i+1})=y_{A,i}+22−q_{S3} ⋅y_{A,i+1}+2q_{S3} ⋅y_{A,n}$

### Generator lookup table

The Sinsemilla circuit makes use of $2_{10}$ pre-computed random generators. These are loaded into a lookup table: $table_{idx}012⋮2_{10}−1 table_{x}x_{P[0]}x_{P[1]}x_{P[2]}⋮x_{P[2_{10}−1]} table_{y}y_{P[0]}y_{P[1]}y_{P[2]}⋮y_{P[2_{10}−1]} $

### Layout

$Step012⋮n−10_{′}1_{′}2_{′}⋮n−1_{′}n_{′} x_{A}x_{Q}x_{A,1}x_{A,2}⋮x_{A,n−1}x_{A,0}x_{A,1}x_{A,2}⋮x_{A,n−1}x_{A,n} x_{P}x_{P[m_{1}]}x_{P[m_{2}]}x_{P[m_{3}]}⋮x_{P[m_{n}]}x_{P[m_{1}]}x_{P[m_{2}]}x_{P[m_{3}]}⋮x_{P[m_{n}]} bitsz_{0}z_{1}z_{2}⋮z_{n−1}z_{0}z_{1}z_{2}⋮z_{n−1} λ_{1}λ_{1,0}λ_{1,1}λ_{1,2}⋮λ_{1,n−1}λ_{1,0}λ_{1,1}λ_{1,2}⋮λ_{1,n−1}y_{A,n} λ_{2}λ_{2,0}λ_{2,1}λ_{2,2}⋮λ_{2,n−1}λ_{2,0}λ_{2,1}λ_{2,2}⋮λ_{2,n−1} q_{S1}11111111110 q_{S2}11110111120 q_{S4}10000000000 fixed_y_Qy_{Q}0000000000 $

$x_{Q}$, $z_{0}$, $z_{0}$, etc. are copied in using equality constraints.

### Optimized Sinsemilla gate

$Fori∈[0,n),let x_{R,i}Y_{A,i}y_{P,i}m_{i+1}q_{run}q_{S3} ====== λ_{1,i}−x_{A,i}−x_{P,i}(λ_{1,i}+λ_{2,i})⋅(x_{A,i}−x_{R,i})Y_{A,i}/2−λ_{1,i}⋅(x_{A,i}−x_{P,i})z_{i}−q_{run,i}⋅z_{i+1}⋅2_{k}q_{S2}−q_{S3}q_{S2}⋅(q_{S2}−1) $

The Halo 2 circuit API can automatically substitute $y_{P,i}$, $x_{R,i}$, $Y_{A,i}$, and $Y_{A,i+1}$, so we don't need to do that manually.

- $x_{A,0}=x_{Q}$
- $2⋅y_{Q}=Y_{A,0}$
- for $i$ from $0$ up to $n−1$:
- $(m_{i+1},x_{P,i},y_{P,i})∈P$
- $λ_{2,i}=x_{A,i+1}+x_{R,i}+x_{A,i}$
- $4⋅λ_{2,i}⋅(x_{A,i}−x_{A,i+1})=2⋅Y_{A,i}+(2−q_{S3})⋅Y_{A,i+1}+2q_{S3}⋅y_{A,n}$

Note that each term of the last constraint is multiplied by $4$ relative to the constraint program given earlier. This is a small optimization that avoids divisions by $2$.

By gating the lookup expression on $q_{S1}$, we avoid the need to fill in unused cells with dummy values to pass the lookup argument. The optimized lookup value (using a default index of $0$) is:

$( q_{S1}⋅m_{i+1},q_{S1}⋅x_{P,i}+(1−q_{S1})⋅x_{P,0},q_{S1}⋅y_{P,i}+(1−q_{S1})⋅y_{P,0}) $

This increases the degree of the lookup argument to $6$.

$Degree4635 Constraintq_{S4}⋅(2⋅y_{Q}−Y_{A,0})=0q_{S1,i}⇒(m_{i+1},x_{P,i},y_{P,i})∈Pq_{S1,i}⋅(λ_{2,i}−(x_{A,i+1}+x_{R,i}+x_{A,i}))q_{S1,i}⋅(4⋅λ_{2,i}⋅(x_{A,i}−x_{A,i+1})−(2⋅Y_{A,i}+(2−q_{S3,i})⋅Y_{A,i+1}+2⋅q_{S3,i}⋅y_{A,n}))=0 $