Polynomials

Let $A (X)$ be a polynomial over $F_{p}$ with formal indeterminate $X$ . As an example,

$A (X) = a_{0} + a_{1} X + a_{2} X^{2} + a_{3} X^{3}$

defines a degree- $3$ polynomial. $a_{0}$ is referred to as the constant term. Polynomials of degree $n - 1$ have $n$ coefficients. We will often want to compute the result of replacing the formal indeterminate $X$ with some concrete value $x$ , which we denote by $A (x)$ .

In mathematics this is commonly referred to as "evaluating $A (X)$ at a point $x$ ". The word "point" here stems from the geometrical usage of polynomials in the form $y = A (x)$ , where $(x, y)$ is the coordinate of a point in two-dimensional space. However, the polynomials we deal with are almost always constrained to equal zero, and $x$ will be an element of some field. This should not be confused with points on an elliptic curve, which we also make use of, but never in the context of polynomial evaluation.

Important notes:

Multiplication of polynomials produces a product polynomial that is the sum of the degrees of its factors. Polynomial division subtracts from the degree. $de g (A (X) B (X)) = de g (A (X)) + de g (B (X)),$ $de g (A (X) / B (X)) = de g (A (X)) - de g (B (X)) .$
Given a polynomial $A (X)$ of degree $n - 1$ , if we obtain $n$ evaluations of the polynomial at distinct points then these evaluations perfectly define the polynomial. In other words, given these evaluations we can obtain a unique polynomial $A (X)$ of degree $n - 1$ via polynomial interpolation.
$[a_{0}, a_{1}, \dots, a_{n - 1}]$ is the coefficient representation of the polynomial $A (X)$ . Equivalently, we could use its evaluation representation $[(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))]$ at $n$ distinct points. Either representation uniquely specifies the same polynomial.

(aside) Horner's rule

Horner's rule allows for efficient evaluation of a polynomial of degree $n - 1$ , using only $n - 1$ multiplications and $n - 1$ additions. It is the following identity: $a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1} = a_{0} + X (a_{1} + X (a_{2} + \dots + X (a_{n - 2} + X a_{n - 1}))),$

Fast Fourier Transform (FFT)

The FFT is an efficient way of converting between the coefficient and evaluation representations of a polynomial. It evaluates the polynomial at the $n$ th roots of unity ${ω^{0}, ω^{1}, \dots, ω^{n - 1}},$ where $ω$ is a primitive $n$ th root of unity. By exploiting symmetries in the roots of unity, each round of the FFT reduces the evaluation into a problem only half the size. Most commonly we use polynomials of length some power of two, $n = 2^{k}$ , and apply the halving reduction recursively.

Motivation: Fast polynomial multiplication

In the coefficient representation, it takes $O (n^{2})$ operations to multiply two polynomials $A (X) \cdot B (X) = C (X)$ :

$A (X) B (X) C (X) = a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1}, = b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}, = a_{0} \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}) + a_{1} X \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}) + \dots + a_{n - 1} X^{n - 1} \cdot (b_{0} + b_{1} X + b_{2} X^{2} + \dots + b_{n - 1} X^{n - 1}),$

where each of the $n$ terms in the first polynomial has to be multiplied by the $n$ terms of the second polynomial.

In the evaluation representation, however, polynomial multiplication only requires $O (n)$ operations:

$A B C : {(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))}, : {(x_{0}, B (x_{0})), (x_{1}, B (x_{1})), \dots, (x_{n - 1}, B (x_{n - 1}))}, : {(x_{0}, A (x_{0}) B (x_{0})), (x_{1}, A (x_{1}) B (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}) B (x_{n - 1}))},$

where each evaluation is multiplied pointwise.

This suggests the following strategy for fast polynomial multiplication:

Evaluate polynomials at all $n$ points;
Perform fast pointwise multiplication in the evaluation representation ( $O (n)$ );
Convert back to the coefficient representation.

The challenge now is how to evaluate and interpolate the polynomials efficiently. Naively, evaluating a polynomial at $n$ points would require $O (n^{2})$ operations (we use the $O (n)$ Horner's rule at each point):

$A (1) A (ω) A (ω^{2}) ⋮ A (ω^{n - 1}) = 111 ⋮ 1 1 ω ω^{2} ⋮ ω^{n - 1} 1 ω^{2} ω^{2 \cdot 2} ⋮ ω^{2 (n - 1)} \dots \dots \dots \dots 1 ω^{n - 1} ω^{2 \cdot (n - 1)} ⋮ ω^{(n - 1)^{2}} \cdot a_{0} a_{1} a_{2} ⋮ a_{n - 1} .$

For convenience, we will denote the matrices above as: $\hat{A} = V_{ω} \cdot A .$

( $\hat{A}$ is known as the Discrete Fourier Transform of $A$ ; $V_{ω}$ is also called the Vandermonde matrix.)

The (radix-2) Cooley-Tukey algorithm

Our strategy is to divide a DFT of size $n$ into two interleaved DFTs of size $n /2$ . Given the polynomial $A (X) = a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1},$ we split it up into even and odd terms:

$A_{even} A_{odd} = a_{0} + a_{2} X + \dots + a_{n - 2} X^{\frac{n}{2} - 1}, = a_{1} + a_{3} X + \dots + a_{n - 1} X^{\frac{n}{2} - 1} .$

To recover the original polynomial, we do $A (X) = A_{even} (X^{2}) + X A_{odd} (X^{2}) .$

Trying this out on points $ω_{n}^{i}$ and $ω_{n}^{\frac{n}{2} + i}$ , $i \in [0.. \frac{n}{2} - 1],$ we start to notice some symmetries:

$A (ω_{n}^{i}) A (ω_{n}^{\frac{n}{2} + i}) = A_{even} ((ω_{n}^{i})^{2}) + ω_{n}^{i} A_{odd} ((ω_{n}^{i})^{2}), = A_{even} ((ω_{n}^{\frac{n}{2} + i})^{2}) + ω_{n}^{\frac{n}{2} + i} A_{odd} ((ω_{n}^{\frac{n}{2} + i})^{2}) = A_{even} ((- ω_{n}^{i})^{2}) - ω_{n}^{i} A_{odd} ((- ω_{n}^{i})^{2}) \leftarrow (negation lemma) = A_{even} ((ω_{n}^{i})^{2}) - ω_{n}^{i} A_{odd} ((ω_{n}^{i})^{2}) .$

Notice that we are only evaluating $A_{even} (X)$ and $A_{odd} (X)$ over half the domain ${(ω_{n}^{0})^{2}, (ω_{n})^{2}, \dots, (ω_{n}^{\frac{n}{2} - 1})^{2}} = {ω_{n /2}^{i}}, i = [0.. \frac{n}{2} - 1]$ (halving lemma). This gives us all the terms we need to reconstruct $A (X)$ over the full domain ${ω^{0}, ω, \dots, ω^{n - 1}}$ : which means we have transformed a length- $n$ DFT into two length- $\frac{n}{2}$ DFTs.

We choose $n = 2^{k}$ to be a power of two (by zero-padding if needed), and apply this divide-and-conquer strategy recursively. By the Master Theorem¹, this gives us an evaluation algorithm with $O (n lo g_{2} n)$ operations, also known as the Fast Fourier Transform (FFT).

Inverse FFT

So we've evaluated our polynomials and multiplied them pointwise. What remains is to convert the product from the evaluation representation back to coefficient representation. To do this, we simply call the FFT on the evaluation representation. However, this time we also:

replace $ω^{i}$ by $ω^{- i}$ in the Vandermonde matrix, and
multiply our final result by a factor of $1/ n$ .

In other words: $A = \frac{1}{n} V_{ω^{- 1}} \cdot \hat{A} .$

(To understand why the inverse FFT has a similar form to the FFT, refer to Slide 13-1 of ². The below image was also taken from ².)

The Schwartz-Zippel lemma

The Schwartz-Zippel lemma informally states that "different polynomials are different at most points." Formally, it can be written as follows:

Let $p (x_{1}, x_{2}, \dots, x_{n})$ be a nonzero polynomial of $n$ variables with degree $d$ . Let $S$ be a finite set of numbers with at least $d$ elements in it. If we choose random $α_{1}, α_{2}, \dots, α_{n}$ from $S$ , $Pr [p (α_{1}, α_{2}, \dots, α_{n}) = 0] \leq \frac{d}{∣ S ∣} .$

In the familiar univariate case $p (X)$ , this reduces to saying that a nonzero polynomial of degree $d$ has at most $d$ roots.

The Schwartz-Zippel lemma is used in polynomial equality testing. Given two multi-variate polynomials $p_{1} (x_{1}, \dots, x_{n})$ and $p_{2} (x_{1}, \dots, x_{n})$ of degrees $d_{1}, d_{2}$ respectively, we can test if $p_{1} (α_{1}, \dots, α_{n}) - p_{2} (α_{1}, \dots, α_{n}) = 0$ for random $α_{1}, \dots, α_{n} \leftarrow S,$ where the size of $S$ is at least $∣ S ∣ \geq (d_{1} + d_{2}) .$ If the two polynomials are identical, this will always be true, whereas if the two polynomials are different then the equality holds with probability at most $\frac{m a x ( d _{1} , d _{2} )}{∣ S ∣}$ .

Vanishing polynomial

Consider the order- $n$ multiplicative subgroup $H$ with primitive root of unity $ω$ . For all $ω^{i} \in H, i \in [n - 1],$ we have $(ω^{i})^{n} = (ω^{n})^{i} = (ω^{0})^{i} = 1.$ In other words, every element of $H$ fulfils the equation

$Z_{H} (X) = X^{n} - 1 = (X - ω^{0}) (X - ω^{1}) (X - ω^{2}) \dots (X - ω^{n - 1}),$

meaning every element is a root of $Z_{H} (X) .$ We call $Z_{H} (X)$ the vanishing polynomial over $H$ because it evaluates to zero on all elements of $H .$

This comes in particularly handy when checking polynomial constraints. For instance, to check that $A (X) + B (X) = C (X)$ over $H,$ we simply have to check that $A (X) + B (X) - C (X)$ is some multiple of $Z_{H} (X)$ . In other words, if dividing our constraint by the vanishing polynomial still yields some polynomial $\frac{A ( X ) + B ( X ) - C ( X )}{Z _{H} ( X )} = H (X),$ we are satisfied that $A (X) + B (X) - C (X) = 0$ over $H .$

Lagrange basis functions

TODO: explain what a basis is in general (briefly).

Polynomials are commonly written in the monomial basis (e.g. $X, X^{2}, ... X^{n}$ ). However, when working over a multiplicative subgroup of order $n$ , we find a more natural expression in the Lagrange basis.

Consider the order- $n$ multiplicative subgroup $H$ with primitive root of unity $ω$ . The Lagrange basis corresponding to this subgroup is a set of functions ${L_{i}}_{i = 0}^{n - 1}$ , where

$L_{i} (ω^{j}) = {10 if i = j, otherwise.$

We can write this more compactly as $L_{i} (ω^{j}) = δ_{ij},$ where $δ$ is the Kronecker delta function.

Now, we can write our polynomial as a linear combination of Lagrange basis functions,

$A (X) = i = 0 \sum n - 1 a_{i} L_{i} (X), X \in H,$

which is equivalent to saying that $A (X)$ evaluates to $a_{0}$ at $ω^{0}$ , to $a_{1}$ at $ω^{1}$ , to $a_{2}$ at $ω^{2}, \dots,$ and so on.

When working over a multiplicative subgroup, the Lagrange basis function has a convenient sparse representation of the form

$L_{i} (X) = \frac{c _{i} \cdot ( X ^{n} - 1 )}{X - ω ^{i}},$

where $c_{i}$ is the barycentric weight. (To understand how this form was derived, refer to ³.) For $i = 0,$ we have $c = 1/ n ⟹ L_{0} (X) = \frac{1}{n} \frac{( X ^{n} - 1 )}{X - 1}$ .

Suppose we are given a set of evaluation points ${x_{0}, x_{1}, \dots, x_{n - 1}}$ . Since we cannot assume that the $x_{i}$ 's form a multiplicative subgroup, we consider also the Lagrange polynomials $L_{i}$ 's in the general case. Then we can construct:

$L_{i} (X) = j \neq = i \prod \frac{X - x _{j}}{x _{i} - x _{j}}, i \in [0.. n - 1] .$

Here, every $X = x_{j} \neq = x_{i}$ will produce a zero numerator term $(x_{j} - x_{j}),$ causing the whole product to evaluate to zero. On the other hand, $X = x_{i}$ will evaluate to $\frac{x _{i} - x _{j}}{x _{i} - x _{j}}$ at every term, resulting in an overall product of one. This gives the desired Kronecker delta behaviour $L_{i} (x_{j}) = δ_{ij}$ on the set ${x_{0}, x_{1}, \dots, x_{n - 1}}$ .

Lagrange interpolation

Given a polynomial in its evaluation representation

$A : {(x_{0}, A (x_{0})), (x_{1}, A (x_{1})), \dots, (x_{n - 1}, A (x_{n - 1}))},$

we can reconstruct its coefficient form in the Lagrange basis:

$A (X) = i = 0 \sum n - 1 A (x_{i}) L_{i} (X),$

where $X \in {x_{0}, x_{1}, \dots, x_{n - 1}} .$

References

Dasgupta, S., Papadimitriou, C. H., & Vazirani, U. V. (2008). "Algorithms" (ch. 2). New York: McGraw-Hill Higher Education.

Golin, M. (2016). "The Fast Fourier Transform and Polynomial Multiplication" [lecture notes], COMP 3711H Design and Analysis of Algorithms, Hong Kong University of Science and Technology.

Berrut, J. and Trefethen, L. (2004). "Barycentric Lagrange Interpolation."

The halo2 Book