16bit table chip for SHA256
This chip implementation is based around a single 16bit lookup table. It requires a minimum of $2_{16}$ circuit rows, and is therefore suitable for use in larger circuits.
We target a maximum constraint degree of $9$. That will allow us to handle constraining carries and "small pieces" to a range of up to ${0..7}$ in one row.
Compression round
There are $64$ compression rounds. Each round takes 32bit values $A,B,C,D,E,F,G,H$ as input, and performs the following operations:
$Ch(E,F,G)Maj(A,B,C)Σ_{0}(A)Σ_{1}(E)H_{′}E_{new}A_{new} ======== (E∧F)⊕(¬E∧G)(A∧B)⊕(A∧C)⊕(B∧C)count(A,B,C)≥2(A⋙2)⊕(A⋙13)⊕(A⋙22)(E⋙6)⊕(E⋙11)⊕(E⋙25)H+Ch(E,F,G)+Σ_{1}(E)+K_{t}+W_{t}reduce_{6}(H_{′}+D)reduce_{7}(H_{′}+Maj(A,B,C)+Σ_{0}(A)) $
where $reduce_{i}$ must handle a carry $0≤carry<i$.
Define $spread$ as a table mapping a $16$bit input to an output interleaved with zero bits. We do not require a separate table for range checks because $spread$ can be used.
Modular addition
To implement addition modulo $2_{32}$, we note that this is equivalent to adding the operands using field addition, and then masking away all but the lowest 32 bits of the result. For example, if we have two operands $a$ and $b$:
$a⊞b=c,$
we decompose each operand (along with the result) into 16bit chunks:
$(a_{L}:Z_{2_{16}},a_{H}:Z_{2_{16}})⊞(b_{L}:Z_{2_{16}},b_{H}:Z_{2_{16}})=(c_{L}:Z_{2_{16}},c_{H}:Z_{2_{16}}),$
and then reformulate the constraint using field addition:
$carry⋅2_{32}+c_{H}⋅2_{16}+c_{L}=(a_{H}+b_{H})⋅2_{16}+a_{L}+b_{L}.$
More generally, any bitdecomposition of the output can be used, not just a decomposition into 16bit chunks. Note that this correctly handles the carry from $a_{L}+b_{L}$.
This constraint requires that each chunk is correctly rangechecked (or else an assignment could overflow the field).

The operand and result chunks can be constrained using $spread$, by looking up each chunk in the "dense" column within a subset of the table. This way we additionally get the "spread" form of the output for free; in particular this is true for the output of the bottomright $⊞$ which becomes $A_{new}$, and the output of the leftmost $⊞$ which becomes $E_{new}$. We will use this below to optimize $Maj$ and $Ch$.

$carry$ must be constrained to the precise range of allowed carry values for the number of operands. We do this with a small range constraint.
Maj function
$Maj$ can be done in $4$ lookups: $2spread∗2$ chunks
 As mentioned above, after the first round we already have $A$ in spread form $A_{′}$. Similarly, $B$ and $C$ are equal to the $A$ and $B$ respectively of the previous round, and therefore in the steady state we already have them in spread form $B_{′}$ and $C_{′}$. In fact we can also assume we have them in spread form in the first round, either from the fixed IV or from the use of $spread$ to reduce the output of the feedforward in the previous block.
 Add the spread forms in the field: $M_{′}=A_{′}+B_{′}+C_{′}$;
 We can add them as $32$bit words or in pieces; it's equivalent
 Witness the compressed even bits $M_{i}$ and the compressed odd bits $M_{i}$ for $i={0..1}$;
 Constrain $M_{′}=spread(M_{0})+2⋅spread(M_{0})+2_{32}⋅spread(M_{1})+2_{33}⋅spread(M_{1})$, where $M_{i}$ is the $Maj$ function output.
Note: by "even" bits we mean the bits of weight an evenpower of $2$, i.e. of weight $2_{0},2_{2},…$. Similarly by "odd" bits we mean the bits of weight an oddpower of $2$.
Ch function
TODO: can probably be optimized to $4$ or $5$ lookups using an additional table.
$Ch$ can be done in $8$ lookups: $4spread∗2$ chunks
 As mentioned above, after the first round we already have $E$ in spread form $E_{′}$. Similarly, $F$ and $G$ are equal to the $E$ and $F$ respectively of the previous round, and therefore in the steady state we already have them in spread form $F_{′}$ and $G_{′}$. In fact we can also assume we have them in spread form in the first round, either from the fixed IV or from the use of $spread$ to reduce the output of the feedforward in the previous block.
 Calculate $P_{′}=E_{′}+F_{′}$ and $Q_{′}=(evens−E_{′})+G_{′}$, where $evens=spread(2_{32}−1)$.
 We can add them as $32$bit words or in pieces; it's equivalent.
 $evens−E_{′}$ works to compute the spread of $¬E$ even though negation and $spread$ do not commute in general. It works because each spread bit in $E_{′}$ is subtracted from $1$, so there are no borrows.
 Witness $P_{i},P_{i},Q_{i},Q_{i}$ such that $P_{′}=spread(P_{0})+2⋅spread(P_{0})+2_{32}⋅spread(P_{1})+2_{33}⋅spread(P_{1})$, and similarly for $Q_{′}$.
 ${P_{i}+Q_{i}}_{i=0..1}$ is the $Ch$ function output.
Σ_0 function
$Σ_{0}(A)$ can be done in $6$ lookups.
To achieve this we first split $A$ into pieces $(a,b,c,d)$, of lengths $(2,11,9,10)$ bits respectively counting from the little end. At the same time we obtain the spread forms of these pieces. This can all be done in two PLONK rows, because the $10$ and $11$bit pieces can be handled using $spread$ lookups, and the $9$bit piece can be split into $3∗3$bit subpieces. The latter and the remaining $2$bit piece can be rangechecked by polynomial constraints in parallel with the two lookups, two small pieces in each row. The spread forms of these small pieces are found by interpolation.
Note that the splitting into pieces can be combined with the reduction of $A_{new}$, i.e. no extra lookups are needed for the latter. In the last round we reduce $A_{new}$ after adding the feedforward (requiring a carry of up to $7$ which is fine).
$(A⋙2)⊕(A⋙13)⊕(A⋙22)$ is equivalent to $(A⋙2)⊕(A⋙13)⊕(A⋘10)$:
Then, using $4$ more $spread$ lookups we obtain the result as the even bits of a linear combination of the pieces:
$R_{′}= (a(b(c4_{30}a4_{21}b4_{23}c ∣∣∣∣∣∣+++ dab4_{20}d4_{19}a4_{12}b ∣∣∣∣∣∣⇓+++ cda4_{11}c4_{9}d4_{10}a ∣∣∣∣∣∣+++ b)c)d)bcd ⊕⊕++ $
That is, we witness the compressed even bits $R_{i}$ and the compressed odd bits $R_{i}$, and constrain $R_{′}=spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1})$ where ${R_{i}}_{i=0..1}$ is the $Σ_{0}$ function output.
Σ_1 function
$Σ_{1}(E)$ can be done in $6$ lookups.
To achieve this we first split $E$ into pieces $(a,b,c,d)$, of lengths $(6,5,14,7)$ bits respectively counting from the little end. At the same time we obtain the spread forms of these pieces. This can all be done in two PLONK rows, because the $7$ and $14$bit pieces can be handled using $spread$ lookups, the $5$bit piece can be split into $3$ and $2$bit subpieces, and the $6$bit piece can be split into $2∗3$bit subpieces. The four small pieces can be rangechecked by polynomial constraints in parallel with the two lookups, two small pieces in each row. The spread forms of these small pieces are found by interpolation.
Note that the splitting into pieces can be combined with the reduction of $E_{new}$, i.e. no extra lookups are needed for the latter. In the last round we reduce $E_{new}$ after adding the feedforward (requiring a carry of up to $6$ which is fine).
$(E⋙6)⊕(E⋙11)⊕(E⋙25)$ is equivalent to $(E⋙6)⊕(E⋙11)⊕(E⋘7)$.
Then, using $4$ more $spread$ lookups we obtain the result as the even bits of a linear combination of the pieces, in the same way we did for $Σ_{0}$:
$R_{′}= (a(b(c4_{26}a4_{27}b4_{18}c ∣∣∣∣∣∣+++ dab4_{19}d4_{21}a4_{13}b ∣∣∣∣∣∣⇓+++ cda4_{5}c4_{14}d4_{7}a ∣∣∣∣∣∣+++ b)c)d)bcd ⊕⊕++ $
That is, we witness the compressed even bits $R_{i}$ and the compressed odd bits $R_{i}$, and constrain $R_{′}=spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1})$ where ${R_{i}}_{i=0..1}$ is the $Σ_{1}$ function output.
Block decomposition
For each block $M∈{0,1}_{512}$ of the padded message, $64$ words of $32$ bits each are constructed as follows:
 The first $16$ are obtained by splitting $M$ into $32$bit blocks $M=W_{0}∣∣W_{1}∣∣⋯∣∣W_{14}∣∣W_{15};$
 The remaining $48$ words are constructed using the formula: $W_{i}=σ_{1}(W_{i−2})⊞W_{i−7}⊞σ_{0}(W_{i−15})⊞W_{i−16},$ for $16≤i<64$.
Note: $0$based numbering is used for the $W$ word indices.
$σ_{0}(X)σ_{1}(X) == (X⋙7)⊕(X⋙18)⊕(X≫3)(X⋙17)⊕(X⋙19)⊕(X≫10) $
Note: $≫$ is a rightshift, not a rotation.
σ_0 function
$(X⋙7)⊕(X⋙18)⊕(X≫3)$ is equivalent to $(X⋙7)⊕(X⋘14)⊕(X≫3)$.
As above but with pieces $(a,b,c,d)$ of lengths $(3,4,11,14)$ counting from the little end. Split $b$ into two $2$bit subpieces.
$R_{′}= (0_{[3]}(b(c4_{28}b4_{21}c ∣∣∣∣∣∣++ dab4_{15}d4_{25}a4_{17}b ∣∣∣∣∣∣⇓+++ cda4_{4}c4_{11}d4_{14}a ∣∣∣∣∣∣+++ b)c)d)bcd ⊕⊕++ $
σ_1 function
$(X⋙17)⊕(X⋙19)⊕(X≫10)$ is equivalent to $(X⋘15)⊕(X⋘13)⊕(X≫10)$.
TODO: this diagram doesn't match the expression on the right. This is just for consistency with the other diagrams.
As above but with pieces $(a,b,c,d)$ of lengths $(10,7,2,13)$ counting from the little end. Split $b$ into $(3,2,2)$bit subpieces.
$R_{′}= (0_{[10]}(b(c4_{25}b4_{30}c ∣∣∣∣∣∣++ dab4_{9}d4_{15}a4_{23}b ∣∣∣∣∣∣⇓+++ cda4_{7}c4_{2}d4_{13}a ∣∣∣∣∣∣+++ b)c)d)bcd ⊕⊕++ $
Message scheduling
We apply $σ_{0}$ to $W_{1..48}$, and $σ_{1}$ to $W_{14..61}$. In order to avoid redundant applications of $spread$, we can merge the splitting into pieces for $σ_{0}$ and $σ_{1}$ in the case of $W_{14..48}$. Merging the piece lengths $(3,4,11,14)$ and $(10,7,2,13)$ gives pieces of lengths $(3,4,3,7,1,1,13)$.
If we can do the merged split in $3$ rows (as opposed to a total of $4$ rows when splitting for $σ_{0}$ and $σ_{1}$ separately), we save $35$ rows.
These might even be doable in $2$ rows; not sure. —Daira
We can merge the reduction mod $2_{32}$ of $W_{16..61}$ into their splitting when they are used to compute subsequent words, similarly to what we did for $A$ and $E$ in the round function.
We will still need to reduce $W_{62..63}$ since they are not split. (Technically we could leave them unreduced since they will be reduced later when they are used to compute $A_{new}$ and $E_{new}$  but that would require handling a carry of up to $10$ rather than $6$, so it's not worth the complexity.)
The resulting message schedule cost is:
 $2$ rows to constrain $W_{0}$ to $32$ bits
 This is technically optional, but let's do it for robustness, since the rest of the input is constrained for free.
 $13∗2$ rows to split $W_{1..13}$ into $(3,4,11,14)$bit pieces
 $35∗3$ rows to split $W_{14..48}$ into $(3,4,3,7,1,1,13)$bit pieces (merged with a reduction for $W_{16..48}$)
 $13∗2$ rows to split $W_{49..61}$ into $(10,7,2,13)$bit pieces (merged with a reduction)
 $4∗48$ rows to extract the results of $σ_{0}$ for $W_{1..48}$
 $4∗48$ rows to extract the results of $σ_{1}$ for $W_{14..61}$
 $2∗2$ rows to reduce $W_{62..63}$
 $=547$ rows.
Overall cost
For each round:
 $8$ rows for $Ch$
 $4$ rows for $Maj$
 $6$ rows for $Σ_{0}$
 $6$ rows for $Σ_{1}$
 $reduce_{6}$ and $reduce_{7}$ are always free
 $=24$ per round
This gives $24∗64=1792$ rows for all of "step 3", to which we need to add:
 $547$ rows for message scheduling
 $2∗8$ rows for $8$ reductions mod $2_{32}$ in "step 4"
giving a total of $2099$ rows.
Tables
We only require one table $spread$, with $2_{16}$ rows and $3$ columns. We need a tag column to allow selecting $(7,10,11,13,14)$bit subsets of the table for $Σ_{0..1}$ and $σ_{0..1}$.
spread
table
row  tag  table (16b)  spread (32b) 

$0$  0  0000000000000000  00000000000000000000000000000000 
$1$  0  0000000000000001  00000000000000000000000000000001 
$2$  0  0000000000000010  00000000000000000000000000000100 
$3$  0  0000000000000011  00000000000000000000000000000101 
...  0  ...  ... 
$2_{7}−1$  0  0000000001111111  00000000000000000001010101010101 
$2_{7}$  1  0000000010000000  00000000000000000100000000000000 
...  1  ...  ... 
$2_{10}−1$  1  0000001111111111  00000000000001010101010101010101 
...  2  ...  ... 
$2_{11}−1$  2  0000011111111111  00000000010101010101010101010101 
...  3  ...  ... 
$2_{13}−1$  3  0001111111111111  00000001010101010101010101010101 
...  4  ...  ... 
$2_{14}−1$  4  0011111111111111  00000101010101010101010101010101 
...  5  ...  ... 
$2_{16}−1$  5  1111111111111111  01010101010101010101010101010101 
For example, to do an $11$bit $spread$ lookup, we polynomialconstrain the tag to be in ${0,1,2}$. For the most common case of a $16$bit lookup, we don't need to constrain the tag. Note that we can fill any unused rows beyond $2_{16}$ with a duplicate entry, e.g. allzeroes.
Gates
Choice gate
Input from previous operations:
 $E_{′},F_{′},G_{′},$ 64bit spread forms of 32bit words $E,F,G$, assumed to be constrained by previous operations
 in practice, we'll have the spread forms of $E_{′},F_{′},G_{′}$ after they've been decomposed into 16bit subpieces
 $evens$ is defined as $spread(2_{32}−1)$
 $evens_{0}=evens_{1}=spread(2_{16}−1)$
E ∧ F
s_ch  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$ 

0  {0,1,2,3,4,5}  $P_{0}$  $spread(P_{0})$  $spread(E_{lo})$  $spread(E_{hi})$ 
1  {0,1,2,3,4,5}  $P_{0}$  $spread(P_{0})$  $spread(P_{1})$  
0  {0,1,2,3,4,5}  $P_{1}$  $spread(P_{1})$  $spread(F_{lo})$  $spread(F_{hi})$ 
0  {0,1,2,3,4,5}  $P_{1}$  $spread(P_{1})$ 
¬E ∧ G
s_ch_neg  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$ 

0  {0,1,2,3,4,5}  $Q_{0}$  $spread(Q_{0})$  $spread(E_{neg})$  $spread(E_{neg})$  $spread(E_{lo})$ 
1  {0,1,2,3,4,5}  $Q_{0}$  $spread(Q_{0})$  $spread(Q_{1})$  $spread(E_{hi})$  
0  {0,1,2,3,4,5}  $Q_{1}$  $spread(Q_{1})$  $spread(G_{lo})$  $spread(G_{hi})$  
0  {0,1,2,3,4,5}  $Q_{1}$  $spread(Q_{1})$ 
Constraints:
s_ch
(choice): $LHS−RHS=0$ $LHS=a_{3}ω_{−1}+a_{3}ω+2_{32}(a_{4}ω_{−1}+a_{4}ω)$
 $RHS=a_{2}ω_{−1}+2∗a_{2}+2_{32}(a_{2}ω+2∗a_{3})$
s_ch_neg
(negation):s_ch
with an extra negation check $spread$ lookup on $(a_{0},a_{1},a_{2})$
 permutation between $(a_{2},a_{3})$
Output: $Ch(E,F,G)=P_{odd}+Q_{odd}=(P_{0}+Q_{0})+2_{16}(P_{1}+Q_{1})$
Majority gate
Input from previous operations:
 $A_{′},B_{′},C_{′},$ 64bit spread forms of 32bit words $A,B,C$, assumed to be constrained by previous operations
 in practice, we'll have the spread forms of $A_{′},B_{′},C_{′}$ after they've been decomposed into $16$bit subpieces
s_maj  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$ 

0  {0,1,2,3,4,5}  $M_{0}$  $spread(M_{0})$  $spread(A_{lo})$  $spread(A_{hi})$  
1  {0,1,2,3,4,5}  $M_{0}$  $spread(M_{0})$  $spread(M_{1})$  $spread(B_{lo})$  $spread(B_{hi})$ 
0  {0,1,2,3,4,5}  $M_{1}$  $spread(M_{1})$  $spread(C_{lo})$  $spread(C_{hi})$  
0  {0,1,2,3,4,5}  $M_{1}$  $spread(M_{1})$ 
Constraints:
s_maj
(majority): $LHS−RHS=0$ $LHS=spread(M_{0})+2⋅spread(M_{0})+2_{32}⋅spread(M_{1})+2_{33}⋅spread(M_{1})$
 $RHS=A_{′}+B_{′}+C_{′}$
 $spread$ lookup on $(a_{0},a_{1},a_{2})$
 permutation between $(a_{2},a_{3})$
Output: $Maj(A,B,C)=M_{odd}=M_{0}+2_{16}M_{1}$
Σ_0 gate
$A$ is a 32bit word split into $(2,11,9,10)$bit chunks, starting from the little end. We refer to these chunks as $(a(2),b(11),c(9),d(10))$ respectively, and further split $c(9)$ into three 3bit chunks $c(9)_{lo},c(9)_{mid},c(9)_{hi}$. We witness the spread versions of the small chunks.
$Σ_{0}(A) == (A⋙2)⊕(A⋙13)⊕(A⋙22)(A⋙2)⊕(A⋙13)⊕(A⋘10) $
s_upp_sigma_0  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $c(9)_{lo}$  $spread(c(9)_{lo})$  $c(9)_{mid}$  $spread(c(9)_{mid})$ 
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(d(10))$  $spread(b(11))$  $c(9)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $a(2)$  $spread(a(2))$  $c(9)_{hi}$  $spread(c(9)_{hi})$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:
s_upp_sigma_0
($Σ_{0}$ constraint): $LHS−RHS+tag+decompose=0$
$tagdecomposeLHS === constrain_{1}(a_{0}ω_{−1})+constrain_{2}(a_{0}ω)a(2)+2_{2}b(11)+2_{13}c(9)_{lo}+2_{16}c(9)_{mid}+2_{19}c(9)_{hi}+2_{22}d(10)−Aspread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{30}spread(a(2))4_{21}spread(b(11))4_{29}spread(c(9)_{hi}) +++ 4_{20}spread(d(10))4_{19}spread(a(2))4_{26}spread(c(9)_{mid}) +++ 4_{17}spread(c(9)_{hi})4_{9}spread(d(10))4_{23}spread(c(9)_{lo}) +++ 4_{14}spread(c(9)_{mid})4_{6}spread(c(9)_{hi})4_{12}spread(b(11)) +++ 4_{11}spread(c(9)_{lo})4_{3}spread(c(9)_{mid})4_{10}spread(a(2)) +++ spread(b(11))spread(c(9)_{lo})spread(d(10)) ++ $
 $spread$ lookup on $a_{0},a_{1},a_{2}$
 2bit range check and 2bit spread check on $a(2)$
 3bit range check and 3bit spread check on $c(9)_{lo},c(9)_{mid},c(9)_{hi}$
(see section Helper gates)
Output: $Σ_{0}(A)=R_{even}=R_{0}+2_{16}R_{1}$
Σ_1 gate
$E$ is a 32bit word split into $(6,5,14,7)$bit chunks, starting from the little end. We refer to these chunks as $(a(6),b(5),c(14),d(7))$ respectively, and further split $a(6)$ into two 3bit chunks $a(6)_{lo},a(6)_{hi}$ and $b$ into (2,3)bit chunks $b(5)_{lo},b(5)_{hi}$. We witness the spread versions of the small chunks.
$Σ_{1}(E) == (E⋙6)⊕(E⋙11)⊕(E⋙25)(E⋙6)⊕(E⋙11)⊕(E⋘7) $
s_upp_sigma_1  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$  $a_{7}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $b(5)_{lo}$  $spread(b(5)_{lo})$  $b(5)_{hi}$  $spread(b(5)_{hi})$  $b(5)$ 
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(d(7))$  $spread(c(14))$  
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $a(6)_{lo}$  $spread(a(6)_{lo})$  $a(6)_{hi}$  $spread(a(6)_{hi})$  $a(6)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:
s_upp_sigma_1
($Σ_{1}$ constraint): $LHS−RHS+tag+decompose=0$
$tagdecomposeLHS === a_{0}ω_{−1}+constrain_{4}(a_{0}ω)a(6)_{lo}+2_{3}a(6)_{hi}+2_{6}b(5)_{lo}+2_{8}b(5)_{hi}+2_{11}c(14)+2_{25}d(7)−Espread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{29}spread(a(6)_{hi})4_{29}spread(b(5)_{hi})4_{18}spread(c(14)) +++ 4_{26}spread(a(6)_{lo})4_{27}spread(b(5)_{lo})4_{15}spread(b(5)_{hi}) +++ 4_{19}spread(d(7))4_{24}spread(a(6)_{hi})4_{13}spread(b(5)_{lo}) +++ 4_{5}spread(c(14))4_{21}spread(a(6)_{lo})4_{10}spread(a(6)_{hi}) +++ 4_{2}spread(b(5)_{hi})4_{14}spread(d(7))4_{7}spread(a(6)_{lo}) +++ spread(b(5)_{lo})spread(c(14))spread(d(7)) ++ $
 $spread$ lookup on $a_{0},a_{1},a_{2}$
 2bit range check and 2bit spread check on $b(5)_{lo}$
 3bit range check and 3bit spread check on $a(6)_{lo},a(6)_{hi},b(4)_{hi}$
(see section Helper gates)
Output: $Σ_{1}(E)=R_{even}=R_{0}+2_{16}R_{1}$
σ_0 gate
v1
v1 of the $σ_{0}$ gate takes in a word that's split into $(3,4,11,14)$bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a(3),b(4),c(11),d(14)).$ $b(4)$ is further split into two 2bit chunks $b(4)_{lo},b(4)_{hi}.$ We witness the spread versions of the small chunks. We already have $spread(c(11))$ and $spread(d(14))$ from the message scheduling.
$(X⋙7)⊕(X⋙18)⊕(X≫3)$ is equivalent to $(X⋙7)⊕(X⋘14)⊕(X≫3)$.
s_low_sigma_0  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $b(4)_{lo}$  $spread(b(4)_{lo})$  $b(4)_{hi}$  $spread(b(4)_{hi})$ 
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(c)$  $spread(d)$  $b(4)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $0$  $0$  $a$  $spread(a)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:
s_low_sigma_0
($σ_{0}$ v1 constraint): $LHS−RHS=0$
$LHS = spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{30}b(4)_{hi}4_{21}c(11) ++ 4_{15}d(14)4_{28}b(4)_{lo}4_{19}b(4)_{hi} +++ 4_{4}c(11)4_{25}a(3)4_{17}b(4)_{lo} +++ 4_{2}b(4)_{hi}4_{11}d(14)4_{14}a(3) +++ b(4)_{lo}c(11)d(14) ++ $
 check that
b
was properly split into subsections for 4bit pieces. $W_{b(4)lo}+2_{2}W_{b(4)hi}−W=0$
 2bit range check and 2bit spread check on $b(4)_{lo},b(4)_{hi}$
 3bit range check and 3bit spread check on $a(3)$
v2
v2 of the $σ_{0}$ gate takes in a word that's split into $(3,4,3,7,1,1,13)$bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a(3),b(4),c(3),d(7),e(1),f(1),g(13)).$ We already have $spread(d(7)),spread(g(13))$ from the message scheduling. The 1bit $e(1),f(1)$ remain unchanged by the spread operation and can be used directly. We further split $b(4)$ into two 2bit chunks $b(4)_{lo},b(4)_{hi}.$ We witness the spread versions of the small chunks.
$(X⋙7)⊕(X⋙18)⊕(X≫3)$ is equivalent to $(X⋙7)⊕(X⋘14)⊕(X≫3)$.
s_low_sigma_0_v2  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$  $a_{7}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $b(4)_{lo}$  $spread(b(4)_{lo})$  $b(4)_{hi}$  $spread(b(4)_{hi})$  
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(d(7))$  $spread(g(13))$  $b(4)$  $e(1)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $a(3)$  $spread(a(3))$  $c(3)$  $spread(c(3))$  $f(1)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:
s_low_sigma_0_v2
($σ_{0}$ v2 constraint): $LHS−RHS=0$
$LHS = spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{30}b(4)_{hi}4_{31}e(1) ++ 4_{16}g(13)4_{28}b(4)_{lo}4_{24}d(7) +++ 4_{15}f(1)4_{25}a(3)4_{21}c(3) +++ 4_{14}e(1)4_{12}g(13)4_{19}b(4)_{hi} +++ 4_{7}d(7)4_{11}f(1)4_{17}b(4)_{lo} +++ 4_{4}c(3)4_{10}e(1)4_{14}a(3) +++ 4_{2}b(4)_{hi}4_{3}d(7)4_{1}g(13) +++ b(4)_{lo}c(3)f(1) ++ $
 check that
b
was properly split into subsections for 4bit pieces. $W_{b(4)lo}+2_{2}W_{b(4)hi}−W=0$
 2bit range check and 2bit spread check on $b(4)_{lo},b(4)_{hi}$
 3bit range check and 3bit spread check on $a(3),c(3)$
σ_1 gate
v1
v1 of the $σ_{1}$ gate takes in a word that's split into $(10,7,2,13)$bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a(10),b(7),c(2),d(13)).$ $b(7)$ is further split into $(2,2,3)$bit chunks $b(7)_{lo},b(7)_{mid},b(7)_{hi}.$ We witness the spread versions of the small chunks. We already have $spread(a(10))$ and $spread(d(13))$ from the message scheduling.
$(X⋙17)⊕(X⋙19)⊕(X≫10)$ is equivalent to $(X⋘15)⊕(X⋘13)⊕(X≫10)$.
s_low_sigma_1  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $b(7)_{lo}$  $spread(b(7)_{lo})$  $b(7)_{mid}$  $spread(b(7)_{mid})$ 
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(a(10))$  $spread(d(13))$  $b(7)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $c(2)$  $spread(c(2))$  $b(7)_{hi}$  $spread(b(7)_{hi})$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:

s_low_sigma_1
($σ_{1}$ v1 constraint): $LHS−RHS=0$ $LHS = spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{29}b(7)_{hi}4_{30}c(2) ++ 4_{9}d(13)4_{27}b(7)_{mid}4_{27}b(7)_{hi} +++ 4_{7}c(2)4_{25}b(7)_{lo}4_{25}b(7)_{mid} +++ 4_{4}b(7)_{hi}4_{15}a(10)4_{23}b(7)_{lo} +++ 4_{2}b(7)_{mid}4_{2}d(13)4_{13}a(10) +++ b(7)_{lo}c(2)d(13) ++ $ 
check that
b
was properly split into subsections for 7bit pieces. $W_{b(7)lo}+2_{2}W_{b(7)mid}+2_{4}W_{b(7)hi}−W=0$

2bit range check and 2bit spread check on $b(7)_{lo},b(7)_{mid},c(2)$

3bit range check and 3bit spread check on $b(7)_{hi}$
v2
v2 of the $σ_{1}$ gate takes in a word that's split into $(3,4,3,7,1,1,13)$bit chunks (already constrained by message scheduling). We refer to these chunks respectively as $(a(3),b(4),c(3),d(7),e(1),f(1),g(13)).$ We already have $spread(d(7)),spread(g(13))$ from the message scheduling. The 1bit $e(1),f(1)$ remain unchanged by the spread operation and can be used directly. We further split $b(4)$ into two 2bit chunks $b(4)_{lo},b(4)_{hi}.$ We witness the spread versions of the small chunks.
$(X⋙17)⊕(X⋙19)⊕(X≫10)$ is equivalent to $(X⋘15)⊕(X⋘13)⊕(X≫10)$.
s_low_sigma_1_v2  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$  $a_{7}$ 

0  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $b(4)_{lo}$  $spread(b(4)_{lo})$  $b(4)_{hi}$  $spread(b(4)_{hi})$  
1  {0,1,2,3,4,5}  $R_{0}$  $spread(R_{0})$  $spread(R_{1})$  $spread(d(7))$  $spread(g(13))$  $b(4)$  $e(1)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$  $a(3)$  $spread(a(3))$  $c(3)$  $spread(c(3))$  $f(1)$ 
0  {0,1,2,3,4,5}  $R_{1}$  $spread(R_{1})$ 
Constraints:
s_low_sigma_1_v2
($σ_{1}$ v2 constraint): $LHS−RHS=0$
$LHS = spread(R_{0})+2⋅spread(R_{0})+2_{32}⋅spread(R_{1})+2_{33}⋅spread(R_{1}) $ $RHS= 4_{25}d(7)4_{31}f(1) ++ 4_{22}c(3)4_{30}e(1) ++ 4_{20}b(4)_{hi}4_{23}d(7) ++ 4_{9}g(13)4_{18}b(4)_{lo}4_{20}c(3) +++ 4_{8}f(1)4_{15}a4_{18}b(4)_{hi} +++ 4_{7}e(1)4_{2}g(13)4_{16}b(4)_{lo} +++ d(7)4_{1}f(1)4_{13}a +++ e(1)g(13) + $
 check that
b
was properly split into subsections for 4bit pieces. $W_{b(4)lo}+2_{2}W_{b(4)hi}−W=0$
 2bit range check and 2bit spread check on $b(4)_{lo},b(4)_{hi}$
 3bit range check and 3bit spread check on $a(3),c(3)$
Helper gates
Small range constraints
Let $constrain_{n}(x)=∏_{i=0}(x−i)$. Constraining this expression to equal zero enforces that $x$ is in $[0..n].$
2bit range check
$(a−3)(a−2)(a−1)(a)=0$
sr2  $a_{0}$ 

1  a 
2bit spread
$l_{1}(a)+4∗l_{2}(a)+5∗l_{3}(a)−a_{′}=0$
ss2  $a_{0}$  $a_{1}$ 

1  a  a' 
with interpolation polynomials:
 $l_{0}(a)=(−3)(−2)(−1)(a−3)(a−2)(a−1) $ ($spread(00)=0000$)
 $l_{1}(a)=(−2)(−1)(1)(a−3)(a−2)(a) $ ($spread(01)=0001$)
 $l_{2}(a)=(−1)(1)(2)(a−3)(a−1)(a) $ ($spread(10)=0100$)
 $l_{3}(a)=(1)(2)(3)(a−2)(a−1)(a) $ ($spread(11)=0101$)
3bit range check
$(a−7)(a−6)(a−5)(a−4)(a−3)(a−2)(a−1)(a)=0$
sr3  $a_{0}$ 

1  a 
3bit spread
$l_{1}(a)+4∗l_{2}(a)+5∗l_{3}(a)+16∗l_{4}(a)+17∗l_{5}(a)+20∗l_{6}(a)+21∗l_{7}(a)−a_{′}=0$
ss3  $a_{0}$  $a_{1}$ 

1  a  a' 
with interpolation polynomials:
 $l_{0}(a)=(−7)(−6)(−5)(−4)(−3)(−2)(−1)(a−7)(a−6)(a−5)(a−4)(a−3)(a−2)(a−1) $ ($spread(000)=000000$)
 $l_{1}(a)=(−6)(−5)(−4)(−3)(−2)(−1)(1)(a−7)(a−6)(a−5)(a−4)(a−3)(a−2)(a) $ ($spread(001)=000001$)
 $l_{2}(a)=(−5)(−4)(−3)(−2)(−1)(1)(2)(a−7)(a−6)(a−5)(a−4)(a−3)(a−1)(a) $ ($spread(010)=000100$)
 $l_{3}(a)=(−4)(−3)(−2)(−1)(1)(2)(3)(a−7)(a−6)(a−5)(a−4)(a−2)(a−1)(a) $ ($spread(011)=000101$)
 $l_{4}(a)=(−3)(−2)(−1)(1)(2)(3)(4)(a−7)(a−6)(a−5)(a−3)(a−2)(a−1)(a) $ ($spread(100)=010000$)
 $l_{5}(a)=(−2)(−1)(1)(2)(3)(4)(5)(a−7)(a−6)(a−4)(a−3)(a−2)(a−1)(a) $ ($spread(101)=010001$)
 $l_{6}(a)=(−1)(1)(2)(3)(4)(5)(6)(a−7)(a−5)(a−4)(a−3)(a−2)(a−1)(a) $ ($spread(110)=010100$)
 $l_{7}(a)=(1)(2)(3)(4)(5)(6)(7)(a−6)(a−5)(a−4)(a−3)(a−2)(a−1)(a) $ ($spread(111)=010101$)
reduce_6 gate
Addition $(mod2_{32})$ of 6 elements
Input:
 $E$
 ${e_{i},e_{i}}_{i=0}$
 $carry$
Check: $E=e_{0}+e_{1}+e_{2}+e_{3}+e_{4}+e_{5}(mod32)$
Assume inputs are constrained to 16 bits.
 Addition gate (sa):
 $a_{0}+a_{1}+a_{2}+a_{3}+a_{4}+a_{5}+a_{6}−a_{7}=0$
 Carry gate (sc):
 $2_{16}a_{6}ω_{−1}+a_{6}+[(a_{6}−5)(a_{6}−4)(a_{6}−3)(a_{6}−2)(a_{6}−1)(a_{6})]=0$
sa  sc  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$  $a_{4}$  $a_{5}$  $a_{6}$  $a_{7}$ 

1  0  $e_{0}$  $e_{1}$  $e_{2}$  $e_{3}$  $e_{4}$  $e_{5}$  $−carry∗2_{16}$  $E_{lo}$ 
1  1  $e_{0}$  $e_{1}$  $e_{2}$  $e_{3}$  $e_{4}$  $e_{5}$  $carry$  $E_{hi}$ 
Assume inputs are constrained to 16 bits.
 Addition gate (sa):
 $a_{0}ω_{−1}+a_{1}ω_{−1}+a_{2}ω_{−1}+a_{0}+a_{1}+a_{2}+a_{3}ω_{−1}−a_{3}=0$
 Carry gate (sc):
 $2_{16}a_{3}ω+a_{3}ω_{−1}=0$
sa  sc  $a_{0}$  $a_{1}$  $a_{2}$  $a_{3}$ 

0  0  $e_{0}$  $e_{1}$  $e_{2}$  $−carry∗2_{16}$ 
1  1  $e_{3}$  $e_{4}$  $e_{5}$  $E_{lo}$ 
0  0  $e_{0}$  $e_{1}$  $e_{2}$  $carry$ 
1  0  $e_{3}$  $e_{4}$  $e_{5}$  $E_{hi}$ 
reduce_7 gate
Addition $(mod2_{32})$ of 7 elements
Input:
 $E$
 ${e_{i},e_{i}}_{i=0}$
 $carry$
Check: $E=e_{0}+e_{1}+e_{2}+e_{3}+$