3 Mar, 22 tweets, 4 min read
The security of RSA is based on the fact it is fairly easy to multiply to numbers (like given 13 and 17, find 13*17=221) but much harder to do the reverse, so given a number, find the prime factors.
There are several algorithms that can be used to factor these numbers. The easiest one is called trial division, which, pretty much exactly as it says, just attempts to divide by one number after another until it finds a factor or has to give up.
This is not very efficient. To factor a number n, it takes O(sqrt(n)) steps. We usually discuss these algorithms with respect to the digits of the number (which you get by taking the logarithm) so the complexity looks like O(1/2 exp(k)), where k is the number of digits.
An algorithm like that we call exponential.
There are more sophisticated algorithms, the most important ones are called Pollard-Rho, Quadratic Sieve, and Numberfield Sieve.
Pollard-Rho is much faster than trial division, but still exponential, so it doesn't really change much in the nature of the problem, as every single bit you add to the key sizes makes things twice as hard
Now the Quadratic Sieve and the Numberfield Sieve are two very different beasts to that. They are what's called subexponential, they are (for large numbers) always going to be faster than the exponential algorithms.
They are however also superpolynomial, so they will be slower than a polynomial algorithm would be, if the numbers are large enough.
When discussing these kinds of algorithms, we use what's called the L-notation, which measures the where between polynomial and exponential an algorithm falls. That notation has two constants, one called alpha and one called c.
We usually disregard c, and focus on alpha, as that captures the important bit. Alpha = 0 is a polynomial algorithm, alpha = 1 is an exponential one. The quadratic sieve has alpha = 1/2, the numberfield sieve works out to be alpha = 1/3
Making alpha smaller has an disproportionate effect on RSA key sizes, which now have to grow much more to stay secure. At alpha = 0, RSA would be completely broken, but even an advancement of an alpha smaller than a third would have effects on which keys can be considered safe
Now to the paper. The paper makes several claims, but strangely does not discuss this all important asymptotic runtime complexity. There are some hints that the author thinks that it might be polynomial, but it's not stated outright.
Instead it lists explicit numbers of arithmetic instructions required to factor 400 and 800 bit numbers. It however doesn't actually include any proof for these numbers, like actual numbers that had been factored
The numbers on the other vary by the version of the paper, but would put factoring such an 800 bit number in the time scale of minutes on a single core. The current record in factoring with the numberfield sieve for factoring a number that size requires 2700 computer core years.
Such a bold claim requires extraordinary proof. The algorithm proposed however, has been around for a while now, and seems not to have been substantially modified by the paper. The paper cites efforts to factor a 43 bit number in thirty seconds time
With the main source stating that while improvements have been made, factoring even a 64 bit number is still more or less unfeasible with this algorithm.
The problem is, a 43 bit number can be factored by trial division in half of the time.
Even a 64 bit number can still be factored by trial division in a semi reasonable timespan (although you definitely want to switch to Pollard-Rho at that point). This points to the actual implementation not having even subexponential runtime.
The paper itself does not address this issue at all, and happily computes runtime estimates for very large numbers, without providing sufficient reason to back those up. It has some glaring holes, with parameters that significantly influence the runtime not being discussed at all
And that isn't even to speak of the editorial shortcomings of the paper. It has random bits of German that don't relate to the content strewn in, defines the most basic concepts that usually would be assumed to be understood by the reader, while leaving out important details
In short, it does not prove what the abstract claims it proves.
Appended to add: there has been further analysis of the paper (especially the new version, which is more readable and predates this thread) both an implementation and theoretical discussions. They confirm the initial suspicions, the algorithm is not subexponential
There is a mistake in the proof given, where a factor of n! is dropped, changing the runtime complexity

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

# More from @SchmiegSophie

4 Mar
It'll need some category theory, but I'll try.
So the question was, why elliptic curves are in some sense the "most abstract" group possible for Diffie-Hellman style dlog based schemes. For this we need to look at the dlog, and its inverse, the canonical map of ℤ into any given group with a marked element (the generator)
That canonical map maps an integer to the generator applied to itself the given number of times (for negative integers, we take the inverse). Canonical means that given the generator, and the information that 1 is mapped to it the map cannot have any other form.
3 Mar
The more closely I read the paper, the worse it gets. I'm gonna stop now.
Once you read an approximation of e in order to define ln in a math paper it starts being hard to take the rest seriously, even though in theory it shouldn't impact your assessment…
Here is why I'm saying it hasn't been proofread by anyone at all
3 Mar
To elaborate: after having read this in two version paper, a German master thesis that seems to be talking about the same algorithm without any of the outrageous claims, but the same performance numbers for factoring 10^20, I mostly think it's some sort of mistake, but 🤷🏼‍♀️
There is some other weirdness to the various papers, like explaining triple L in an amount of detail that is somewhat unusual for a research paper, while completely glossing over other parts of the claims.
The master thesis claims that numbers beyond 10^20 are not feasible, pointing to an asymptomatic runtime nowhere near close to the L[1/3,c] range that the number field sieve has. I don't see where the paper would improve on the results of the master thesis, though