Dixon's factorization method

☆ Save On Wikipedia ↗

In number theory, Dixon's factorization method (also Dixon's random squares method[1] or Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method. Unlike for other factor base methods, its run-time bound comes with a rigorous proof that does not rely on conjectures about the smoothness properties of the values taken by a polynomial.

The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.[2]

Basic idea

Dixon's method is based on finding a congruence of squares modulo the integer N which is intended to factor. We explain the basic idea here by first describing Fermat's factorization method, then how Dixon's method proceeds differently.

Fermat's factorization method finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x2 mod N is a nontrivial perfect square (in the integers):

x 2 ≡ y 2 ( mod  N ) , x ≢ ± y ( mod  N ) . {\displaystyle x^{2}\equiv y^{2}\quad ({\hbox{mod }}N),\qquad x\not \equiv \pm y\quad ({\hbox{mod }}N).} {\displaystyle x^{2}\equiv y^{2}\quad ({\hbox{mod }}N),\qquad x\not \equiv \pm y\quad ({\hbox{mod }}N).}

For example, if N = 84923, (by starting at 292, the first number greater than N and counting up) the 5052 mod 84923 is 256, the square of 16. So (505 16)(505 + 16) = 0 mod 84923. Computing the greatest common divisor of 505 16 and N using Euclid's algorithm gives 163, which is a factor of N.

In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are only N squares less than N.

Dixon's method replaces the condition "is a nontrivial perfect square" with the much weaker one "has only small prime factors"; for example, there are 292 squares smaller than 84923; 662 numbers smaller than 84923 whose prime factors are only 2,3,5 or 7; and 4767 whose prime factors are all less than 30. (Such numbers are called 7-smooth and 30-smooth, respectively, since they are B-smooth with respect to some bound B.)

Once enough of these B-smooth values are found, linear algebra can be used to search for nontrivial perfect squares for Fermat's factorization method. If there are many numbers a 1 … a n {\displaystyle a_{1}\ldots a_{n}} {\displaystyle a_{1}\ldots a_{n}} whose squares can be factorized as a i 2 mod N = ∏ j = 1 m b j e i j {\displaystyle a_{i}^{2}\mod N=\prod _{j=1}^{m}b_{j}^{e_{ij}}} {\displaystyle a_{i}^{2}\mod N=\prod _{j=1}^{m}b_{j}^{e_{ij}}} for a fixed set b 1 … b m {\displaystyle b_{1}\ldots b_{m}} {\displaystyle b_{1}\ldots b_{m}} of small primes, linear algebra modulo 2 on the matrix e i j {\displaystyle e_{ij}} {\displaystyle e_{ij}} will give a subset of the a i {\displaystyle a_{i}} {\displaystyle a_{i}} whose squares combine to a product of small primes to an even power that is, a subset of the a i {\displaystyle a_{i}} {\displaystyle a_{i}} whose squares multiply to the square of a (hopefully different) number mod N.

Method

Suppose the composite number N is being factored. Bound B is chosen, and the factor base is identified (which is called P), the set of all primes less than or equal to B. Next, positive integers z are sought such that z2 mod N is B-smooth. Therefore we can write, for suitable exponents ai,

z 2  mod  N = ∏ p i ∈ P p i a i {\displaystyle z^{2}{\text{ mod }}N=\prod _{p_{i}\in P}p_{i}^{a_{i}}} {\displaystyle z^{2}{\text{ mod }}N=\prod _{p_{i}\in P}p_{i}^{a_{i}}}

When enough of these relations have been generated (it is generally sufficient that the number of relations be a few more than the size of P), the methods of linear algebra, such as Gaussian elimination, can be used to multiply together these various relations in such a way that the exponents of the primes on the right-hand side are all even:

z 1 2 z 2 2 ⋯ z k 2 ≡ ∏ p i ∈ P p i a i , 1 + a i , 2 + ⋯ + a i , k   ( mod N ) ( where  a i , 1 + a i , 2 + ⋯ + a i , k ≡ 0 ( mod 2 ) ) {\displaystyle {z_{1}^{2}z_{2}^{2}\cdots z_{k}^{2}\equiv \prod _{p_{i}\in P}p_{i}^{a_{i,1}+a_{i,2}+\cdots +a_{i,k}}\ {\pmod {N}}\quad ({\text{where }}a_{i,1}+a_{i,2}+\cdots +a_{i,k}\equiv 0{\pmod {2}})}} {\displaystyle {z_{1}^{2}z_{2}^{2}\cdots z_{k}^{2}\equiv \prod _{p_{i}\in P}p_{i}^{a_{i,1}+a_{i,2}+\cdots +a_{i,k}}\ {\pmod {N}}\quad ({\text{where }}a_{i,1}+a_{i,2}+\cdots +a_{i,k}\equiv 0{\pmod {2}})}}

This yields a congruence of squares of the form a2b2 (mod N), which can be turned into a factorization of N, N = gcd(a + b, N) × (N/gcd(a + b, N)). This factorization might turn out to be trivial (i.e. N = N × 1), which can only happen if a ≡ ±b (mod N), in which case another try must be made with a different combination of relations; but if a nontrivial pair of factors of N is reached, the algorithm terminates.

Pseudocode

This section is taken directly from Dixon (1981).

Dixon's algorithm

Initialization. Let L be a list of integers in the range [1, n], and let P = {p1, ..., ph} be the list of the h primes ≤ v. Let B and Z be initially empty lists (Z will be indexed by B).

Step 1. If L is empty, exit (algorithm unsuccessful). Otherwise, take the first term z from L, remove it from L, and proceed to Step 2.

Step 2. Compute w as the least positive remainder of z2 mod n. Factor w as:

w = w ′ ∏ i p i a i {\displaystyle w=w'\prod _{i}p_{i}^{a_{i}}} {\displaystyle w=w'\prod _{i}p_{i}^{a_{i}}}

where w has no factor in P. If w = 1, proceed to Step 3; otherwise, return to Step 1.

Step 3. Let a ← (a1, ..., ah). Add a to B and z to Z. If B has at most h elements, return to Step 1; otherwise, proceed to Step 4.

Step 4. Find the first vector c in B that is linearly dependent (mod 2) on earlier vectors in B. Remove c from B and z c {\displaystyle z_{c}} {\displaystyle z_{c}} from Z. Compute coefficients f b {\displaystyle f_{b}} {\displaystyle f_{b}} such that:

c ≡ ∑ b ∈ B f b b ( mod 2 ) {\displaystyle \mathbf {c} \equiv \sum _{b\in B}f_{b}\mathbf {b} {\pmod {2}}} {\displaystyle \mathbf {c} \equiv \sum _{b\in B}f_{b}\mathbf {b} {\pmod {2}}}

Define:

d = ( d 1 , … , d n ) ← 1 2 ( c + ∑ f b b ) {\displaystyle \mathbf {d} =(d_{1},\dots ,d_{n})\gets {\frac {1}{2}}\left(\mathbf {c} +\sum f_{b}\mathbf {b} \right)} {\displaystyle \mathbf {d} =(d_{1},\dots ,d_{n})\gets {\frac {1}{2}}\left(\mathbf {c} +\sum f_{b}\mathbf {b} \right)}

Proceed to Step 5.

Step 5. Compute:

x ← z c ∏ b z b f b , y ← ∏ i p i d i {\displaystyle x\gets z_{c}\prod _{b}z_{b}^{f_{b}},\quad y\gets \prod _{i}p_{i}^{d_{i}}} {\displaystyle x\gets z_{c}\prod _{b}z_{b}^{f_{b}},\quad y\gets \prod _{i}p_{i}^{d_{i}}}

so that:

x 2 ≡ ∏ i p i 2 d i = y 2 mod n . {\displaystyle x^{2}\equiv \prod _{i}p_{i}^{2d_{i}}=y^{2}\mod n.} {\displaystyle x^{2}\equiv \prod _{i}p_{i}^{2d_{i}}=y^{2}\mod n.}

If x ≡ y {\displaystyle x\equiv y} {\displaystyle x\equiv y} or x ≡ − y ( mod n ) {\displaystyle x\equiv -y{\pmod {n}}} {\displaystyle x\equiv -y{\pmod {n}}}, return to Step 1. Otherwise, return:

gcd ( n , x + y ) {\displaystyle \gcd(n,x+y)} {\displaystyle \gcd(n,x+y)}

which provides a nontrivial factor of n, and terminate successfully.

Step-by-step example

In this example, we factorize (n = 84923) using Dixon's algorithm. This example is lightly adapted from the LeetArxiv substack.[3] Credit is given to the original author.

  • Initialization:
    • Define a list of numbers L, ranging from 1 to 84923:
L = { 1 , … , 84923 } {\displaystyle L=\{1,\dots ,84923\}} {\displaystyle L=\{1,\dots ,84923\}}
  • Define a value v, which is the smoothness factor:
v = 7 {\displaystyle v=7} {\displaystyle v=7}
  • Define a list P containing all the prime numbers less than or equal to v:
P = 2 , 3 , 5 , 7 {\displaystyle P={2,3,5,7}} {\displaystyle P={2,3,5,7}}
  • Define B and Z, two empty lists. B is a list of powers, while Z is a list of accepted integers:
B = [ ] {\displaystyle B=[]} {\displaystyle B=[]}
Z = [ ] {\displaystyle Z=[]} {\displaystyle Z=[]}
  • Step 1: Iterating z {\displaystyle z} {\displaystyle z} values
    • Start a for loop that indexes the list L {\displaystyle L} {\displaystyle L}. The current element in L {\displaystyle L} {\displaystyle L} is labeled as z {\displaystyle z} {\displaystyle z}. The for loop exits at the end of the list, or when Step 5 breaks the loop.
int n = 84923;
for (int i = 1; i <= n; i++)
{
    int z = i;
    // Remaining steps go here.
    // Step 4 can trigger Step 5.
    // Step 5 can break the loop.
}
  • Step 2: Computing z 2 mod n {\displaystyle z^{2}\mod n} {\displaystyle z^{2}\mod n} and v-smooth Prime Factorization
    • To proceed, compute z 2 mod 84923 {\displaystyle z^{2}\mod 84923} {\displaystyle z^{2}\mod 84923} for each z, then express the result as a prime factorization.
1 2 mod 84923 ≡ 1 mod 84923 = 2 0 ⋅ 3 0 ⋅ 5 0 ⋅ 7 0 mod 84923 {\displaystyle 1^{2}\mod 84923\equiv 1\mod 84923=2^{0}\cdot 3^{0}\cdot 5^{0}\cdot 7^{0}\mod 84923} {\displaystyle 1^{2}\mod 84923\equiv 1\mod 84923=2^{0}\cdot 3^{0}\cdot 5^{0}\cdot 7^{0}\mod 84923}
⋮ {\displaystyle \vdots } {\displaystyle \vdots }
513 2 mod 84923 = 8400 mod 84923 = 2 4 ⋅ 3 1 ⋅ 5 2 ⋅ 7 1 mod 84923 {\displaystyle 513^{2}\mod 84923=8400\mod 84923=2^{4}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}\mod 84923} {\displaystyle 513^{2}\mod 84923=8400\mod 84923=2^{4}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}\mod 84923}
⋮ {\displaystyle \vdots } {\displaystyle \vdots }
537 2 mod 84923 = 33600 mod 84923 = 2 6 ⋅ 3 1 ⋅ 5 2 ⋅ 7 1 mod 84923 {\displaystyle 537^{2}\mod 84923=33600\mod 84923=2^{6}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}\mod 84923} {\displaystyle 537^{2}\mod 84923=33600\mod 84923=2^{6}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}\mod 84923}
538 2 mod 84923 = 34675 mod 84923 = 5 2 ⋅ 19 1 ⋅ 73 1 mod 84923 {\displaystyle 538^{2}\mod 84923=34675\mod 84923=5^{2}\cdot 19^{1}\cdot 73^{1}\mod 84923} {\displaystyle 538^{2}\mod 84923=34675\mod 84923=5^{2}\cdot 19^{1}\cdot 73^{1}\mod 84923}
This step continues for all values of z in the range.
  • Step 3: Appending v-smooth results
    • If z 2 mod 84923 {\displaystyle z^{2}\mod 84923} {\displaystyle z^{2}\mod 84923} is 7-smooth, then append its powers to list B {\displaystyle B} {\displaystyle B} and append z {\displaystyle z} {\displaystyle z} to list Z {\displaystyle Z} {\displaystyle Z}.
    • If B {\displaystyle B} {\displaystyle B} has at most h = 4 {\displaystyle h=4} {\displaystyle h=4} elements, return to Step 1. Otherwise, proceed to Step 4.
    • For example, after 537 iterations, we have:
Z = { 1 , … , 513 , 537 } {\displaystyle Z=\{1,\ldots ,513,537\}} {\displaystyle Z=\{1,\ldots ,513,537\}}
B = { [ 0 , 0 , 0 , 0 ] , … , [ 4 , 1 , 2 , 1 ] , [ 6 , 1 , 2 , 1 ] } {\displaystyle B=\{[0,0,0,0],\ldots ,[4,1,2,1],[6,1,2,1]\}} {\displaystyle B=\{[0,0,0,0],\ldots ,[4,1,2,1],[6,1,2,1]\}}
and we proceed on to Step 4 since we would have more than h = 4 {\displaystyle h=4} {\displaystyle h=4} vectors in B {\displaystyle B} {\displaystyle B}.
  • Step 4: This step is split into two parts.
    • Part 1: Finding B {\displaystyle B} {\displaystyle B} modulo 2
B = ( 0 0 0 0 4 1 2 1 6 1 2 1 ) mod 2 ≡ B = ( 0 0 0 0 0 1 0 1 0 1 0 1 ) {\displaystyle B={\begin{pmatrix}0&0&0&0\\4&1&2&1\\6&1&2&1\end{pmatrix}}\mod 2\equiv B={\begin{pmatrix}0&0&0&0\\0&1&0&1\\0&1&0&1\end{pmatrix}}} {\displaystyle B={\begin{pmatrix}0&0&0&0\\4&1&2&1\\6&1&2&1\end{pmatrix}}\mod 2\equiv B={\begin{pmatrix}0&0&0&0\\0&1&0&1\\0&1&0&1\end{pmatrix}}}
  • Part 2: Finding a row combination of B {\displaystyle B} {\displaystyle B} that sums to even numbers
For example, summing Row 2 {\displaystyle 2} {\displaystyle 2} and Row 3 {\displaystyle 3} {\displaystyle 3} gives us a vector of even numbers.
R 2 = { 0 , 1 , 0 , 1 } {\displaystyle R_{2}=\{0,1,0,1\}} {\displaystyle R_{2}=\{0,1,0,1\}} and R 3 = { 0 , 1 , 0 , 1 } {\displaystyle R_{3}=\{0,1,0,1\}} {\displaystyle R_{3}=\{0,1,0,1\}}
then
R 2 + R 3 = { 0 , 1 , 0 , 1 } + { 0 , 1 , 0 , 1 } {\displaystyle R_{2}+R_{3}=\{0,1,0,1\}+\{0,1,0,1\}} {\displaystyle R_{2}+R_{3}=\{0,1,0,1\}+\{0,1,0,1\}}
R 2 + R 3 = { 0 , 2 , 0 , 2 } {\displaystyle R_{2}+R_{3}=\{0,2,0,2\}} {\displaystyle R_{2}+R_{3}=\{0,2,0,2\}}.


  • Step 5: This step is split into four parts.
    • Part 1: Computing x
      • Multiply the corresponding z {\displaystyle z} {\displaystyle z} values for the rows found in Step 4, mod n {\displaystyle n} {\displaystyle n} to get x {\displaystyle x} {\displaystyle x}.
Rows 2 and 3 correspond with 513 and 537, so x = ( 513 ⋅ 537 ) = 20712 mod 84923 {\displaystyle x=(513\cdot 537)=20712\mod 84923} {\displaystyle x=(513\cdot 537)=20712\mod 84923}
  • Part 2: Computing y {\displaystyle y} {\displaystyle y}
  • Add the rows found in Step 4.
  • Divide by 2. (Since these represent exponents, this effectively takes a square root.)
  • Apply as exponents to the primes in P {\displaystyle P} {\displaystyle P} to get y {\displaystyle y} {\displaystyle y}.
  • Return to Step 1 if x = ± y {\displaystyle x=\pm y} {\displaystyle x=\pm y}.
Find the half the sum of Rows 2 and 3:
4 1 2 1 + 6 1 2 1 10 2 4 2 ÷ 2 5 1 2 1 {\displaystyle {\begin{array}{ccccc}&4&1&2&1\\+&6&1&2&1\\\hline &10&2&4&2\\\div 2\\\hline &5&1&2&1\end{array}}} {\displaystyle {\begin{array}{ccccc}&4&1&2&1\\+&6&1&2&1\\\hline &10&2&4&2\\\div 2\\\hline &5&1&2&1\end{array}}}
Apply those as exponents to P = { 2 , 3 , 5 , 7 } {\displaystyle P=\{2,3,5,7\}} {\displaystyle P=\{2,3,5,7\}}:
y = 2 5 ⋅ 3 1 ⋅ 5 2 ⋅ 7 1 = 16800 {\displaystyle y=2^{5}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}=16800} {\displaystyle y=2^{5}\cdot 3^{1}\cdot 5^{2}\cdot 7^{1}=16800}
  • Since x ≠ ± y {\displaystyle x\neq \pm y} {\displaystyle x\neq \pm y}, we proceed to Part 3.
  • Part 3: Computing x + y {\displaystyle x+y} {\displaystyle x+y} and x − y {\displaystyle x-y} {\displaystyle x-y} where x = 20712 {\displaystyle x=20712} {\displaystyle x=20712} and y = 16800 {\displaystyle y=16800} {\displaystyle y=16800}
x + y = 20712 + 16800 = 37512 {\displaystyle x+y=20712+16800=37512} {\displaystyle x+y=20712+16800=37512}
x − y = 20712 − 16800 = 3912 {\displaystyle x-y=20712-16800=3912} {\displaystyle x-y=20712-16800=3912}
  • Part 4: Computing gcd ( x + y , n ) {\displaystyle \gcd(x+y,n)} {\displaystyle \gcd(x+y,n)} and gcd ( x − y , n ) {\displaystyle \gcd(x-y,n)} {\displaystyle \gcd(x-y,n)} where n = 84923 {\displaystyle n=84923} {\displaystyle n=84923}, x + y = 292281 {\displaystyle x+y=292281} {\displaystyle x+y=292281} and x − y = 258681 {\displaystyle x-y=258681} {\displaystyle x-y=258681}
gcd ( 37512 , 84923 ) = 521 gcd ( 3912 , 84923 ) = 163 {\displaystyle {\begin{array}{ll}\gcd(37512,84923)=521\\\gcd(3912,84923)=163\end{array}}} {\displaystyle {\begin{array}{ll}\gcd(37512,84923)=521\\\gcd(3912,84923)=163\end{array}}}

Quick check shows 84923 = 521 ⋅ 163 {\displaystyle 84923=521\cdot 163} {\displaystyle 84923=521\cdot 163}.

The pseudocode says to only calculate x + y {\displaystyle x+y} {\displaystyle x+y} and return gcd ( x + y , n ) {\displaystyle \gcd(x+y,n)} {\displaystyle \gcd(x+y,n)}. This is because it is the faster GCD to compute, and once you have it, dividing n {\displaystyle n} {\displaystyle n} by the result is faster than computing gcd ( x − y , n ) {\displaystyle \gcd(x-y,n)} {\displaystyle \gcd(x-y,n)}.

Optimizations

The quadratic sieve is an optimization of Dixon's method. It selects values of x close to the square root of N such that x2 modulo N is small, thereby largely increasing the chance of obtaining a smooth number.

Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number z cannot have more than log 2 ⁡ z {\displaystyle \log _{2}z} {\displaystyle \log _{2}z} factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.

A more sophisticated analysis, using the approximation that a number has all its prime factors less than N 1 / a {\displaystyle N^{1/a}} {\displaystyle N^{1/a}} with probability about a − a {\displaystyle a^{-a}} {\displaystyle a^{-a}} (an approximation to the Dickman–de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of exp ⁡ ( log ⁡ N log ⁡ log ⁡ N ) {\displaystyle \exp \left({\sqrt {\log N\log \log N}}\right)} {\displaystyle \exp \left({\sqrt {\log N\log \log N}}\right)}.

The optimal complexity of Dixon's method is

O ( exp ⁡ ( 2 2 log ⁡ n log ⁡ log ⁡ n ) ) {\displaystyle O\left(\exp \left(2{\sqrt {2}}{\sqrt {\log n\log \log n}}\right)\right)} {\displaystyle O\left(\exp \left(2{\sqrt {2}}{\sqrt {\log n\log \log n}}\right)\right)}

in big-O notation, or

L n [ 1 / 2 , 2 2 ] {\displaystyle L_{n}[1/2,2{\sqrt {2}}]} {\displaystyle L_{n}[1/2,2{\sqrt {2}}]}

in L-notation.

References

  1. Kleinjung, Thorsten; et al. (2010). "Factorization of a 768-Bit RSA Modulus". Advances in Cryptology – CRYPTO 2010. Lecture Notes in Computer Science. Vol. 6223. pp. 333–350. doi:10.1007/978-3-642-14623-7_18. ISBN 978-3-642-14622-0. S2CID 11556080.
  2. Dixon, J. D. (1981). "Asymptotically fast factorization of integers" (PDF). Math. Comp. 36 (153): 255–260. doi:10.1090/S0025-5718-1981-0595059-1. JSTOR 2007743.
  3. Kibicho, Murage (2025). Hand-Written Paper Implementation: Asymptotically Fast Factorization of Integers.