Sinsemilla is a collision-resistant hash function and commitment scheme designed to be efficient in algebraic circuit models that support lookups, such as PLONK or Halo 2.
The security properties of Sinsemilla are similar to Pedersen hashes; it is not designed to be used where a random oracle, PRF, or preimage-resistant hash is required. The only claimed security property of the hash function is collision-resistance for fixed-length inputs.
Sinsemilla is roughly 4 times less efficient than the algebraic hashes Rescue and Poseidon inside a circuit, but around 19 times more efficient than Rescue outside a circuit. Unlike either of these hashes, the collision resistance property of Sinsemilla can be proven based on cryptographic assumptions that have been well-established for at least 20 years. Sinsemilla can also be used as a computationally binding and perfectly hiding commitment scheme.
The general approach is to split the message into -bit pieces, and for each piece, select from a table of bases in our cryptographic group. We combine the selected bases using a double-and-add algorithm. This ends up being provably as secure as a vector Pedersen hash, and makes advantageous use of the lookup facility supported by Halo 2.
This section is an outline of how Sinsemilla works: for the normative specification, refer to §184.108.40.206 Sinsemilla Hash Function in the protocol spec. The incomplete point addition operator, ⸭, that we use below is also defined there.
Let be a cryptographic group of prime order . We write additively, with identity , and using for scalar multiplication of by .
Let be an integer chosen based on efficiency considerations (the table size will be ). Let be an integer, fixed for each instantiation, such that messages are bits, where . We use zero-padding to the next multiple of bits if necessary.
: Choose and as independent, verifiably random generators of , using a suitable hash into , such that none of or are .
In Orchard, we define to be dependent on a domain separator . The protocol specification uses in place of and in place of .
- Split into groups of bits. Interpret each group as a -bit little-endian integer .
- for from up to :
Let be the -coordinate of . (This assumes that is a prime-order elliptic curve in short Weierstrass form, as is the case for Pallas and Vesta.)
It is slightly more efficient to express a double-and-add as . We also use incomplete additions: it is shown in the Sinsemilla security argument that in the case where is a prime-order short Weierstrass elliptic curve, an exceptional case for addition would lead to finding a discrete logarithm, which can be assumed to occur with negligible probability even for adversarial input.
Choose another generator independently of and .
The randomness for a commitment is chosen uniformly on .
Let be the of . (This again assumes that is a prime-order elliptic curve in short Weierstrass form.)
Note that unlike a simple Pedersen commitment, this commitment scheme ( or ) is not additively homomorphic.
The aim of the design is to optimize the number of bits that can be processed for each step of the algorithm (which requires a doubling and addition in ) for a given table size. Using a single table of size group elements, we can process bits at a time.
In each step of Sinsemilla we want to compute . Let be the intermediate result such that . Recalling the incomplete addition formulae:
Let . Substituting the coordinates for each of the incomplete additions in turn, and rearranging, we get
Input: . (The message words are 1-indexed here, as in the protocol spec, but we start the loop from so that corresponds to in the protocol spec.)
- for from up to :
We have an -bit message . (Note that the message words are 1-indexed as in the protocol spec.)
Initialise the running sum and define . We will end up with
Rearranging gives us an expression for each word of the original message , which we can look up in the table. We position and in adjacent rows of the same column, so we can sequentially apply the constraint across the entire message.
In other words, .
For a little-endian decomposition as used here, the running sum is initialized to the scalar and ends at 0. For a big-endian decomposition as used in variable-base scalar multiplication, the running sum would start at 0 and end with recovering the original scalar.
The running sum only applies to message words within a single field element. That means if then we will need several disjoint running sums. A longer message can be constructed by splitting the message words across several field elements, and then running several instances of the constraints below.
The expression for above requires rows in the column, leaving a one-row gap in adjacent columns and making tricker to accumulate. In order to support chaining multiple field elements without a gap, we use a slightly more complicated expression for that includes a selector:
This effectively forces to zero for the last step of each element, which allows the cell that would have been to be used to reinitialize the running sum for the next element.
With this sorted out, the incomplete addition accumulator can eliminate almost entirely, by substituting for and values in the current and next rows. The two exceptions are at the start of Sinsemilla (where we need to constrain the accumulator to have initial value ), and the end (where we need to witness for use outside of Sinsemilla).
We need a total of four logical selectors to:
- Control the Sinsemilla gate and lookup.
- Distinguish between the last message word in a running sum and its earlier words.
- Mark the start of Sinsemilla.
- Mark the end of Sinsemilla.
We use regular selector columns for the Sinsemilla gate selector and Sinsemilla start selector The other two selectors are synthesized from a single fixed column as follows:
We set to on most Sinsemilla rows, and for the last step of each element, except for the last element where it is set to . We can then use to toggle between constraining the substituted on adjacent rows, and the witnessed at the end of Sinsemilla:
The Sinsemilla circuit makes use of pre-computed random generators. These are loaded into a lookup table:
, , , etc. are copied in using equality constraints.
The Halo 2 circuit API can automatically substitute , , , and , so we don't need to do that manually.
- for from up to :
Note that each term of the last constraint is multiplied by relative to the constraint program given earlier. This is a small optimization that avoids divisions by .
By gating the lookup expression on , we avoid the need to fill in unused cells with dummy values to pass the lookup argument. The optimized lookup value (using a default index of ) is:
This increases the degree of the lookup argument to .