Programmer's Guide To Theory - Where Do The Big Os Come From
Written by Mike James   
Monday, 16 March 2020
Article Index
Programmer's Guide To Theory - Where Do The Big Os Come From
Finding The Best Algorithm

Finding the Best Algorithm

We have reached a stage where the reasoning undergoes a subtle change that is often neglected in textbooks. Up until now we have been talking about algorithms belonging to P, but we would really like to answer a slightly more general question. Is it possible to find an algorithm that achieves some result in polynomial time or is such a thing impossible? This shifts the emphasis from finding the order of any given algorithm to finding the order of the best algorithm for the job, which is much more difficult.

You can also ask what the best algorithm is if you specify your computing resources - a Turing machine, for example, or a finite state machine and so on. There are algorithms that run in P for some classes of machine but not for others. This is where things can get complicated and subtle. For the moment we will ignore the type of computer used for the algorithm, apart from saying that it is a standard machine with random access memory - a RAM machine. 

It is interesting to note, however, that algorithms that run in sub-polynomial time can change their complexity when run on different types of machine.

How Fast Can You Multiply?

So the really interesting questions are not about how fast any particular algorithm is, but what is the fastest algorithm for any particular task. For example, it isn't difficult to show that multiplying two numbers together using the usual “shift and add” algorithm is O(n2) where n is the number of bits used to represent the numbers. This means that multiplication is certainly in P, but is O(n2) the best we can do? Is there an O(n) algorithm?

Certainly you can't produce an algorithm that is faster than O(n) because you at least have to look at each bit, but can you actually reach O(n)? Before you spend a lot of time on the problem, I'd better tell you that it is difficult and the best practical algorithm that anyone has come up with is the odd-looking O(nlogn loglogn), which is a lot better than O(n2) but still not as good as O(n). A recent paper has proposed a O(nlog n) algorithm, but for it to be worth using n has to be a huge value, well beyond anything reasonable, see Galactic Algorithms in Chapter 17. Is there a faster practical algorithm? Who knows? You would either need a proof that said that it was impossible to perform multiplication faster than the stated algorithm or you would need to demonstrate a faster algorithm.

You can see that in a very deep sense the order of the best algorithm tells you a lot about the nature of the task being performed. Is it just an easy task that happens to look difficult in this form? Or is it irreducibly difficult? This relates back to Kolmogorov complexity and finding the smallest program that generates a sequence. In this case we need the fastest program.

theoryicon

Prime Testing

The example of multiplication is easy to understand and it is easy to prove that it is in P by describing an O(n2) algorithm. So, while we might want to argue about how fast it can be done, there is no question that it is in P.

Now consider an equally innocent looking problem - proving that a number is prime (that is has no factors). For smallish numbers this is a trivial job. For example, 12 isn't a prime because it can be divided by 2 and 3 to name just two candidates, but 13 is prime because it has no factors. 

The simplest method of proving that any number, z say, is prime is to try to divide it by 2, 3 and then all the odd numbers up to SQRT(z). If you analyze this method you will discover that it is O(2n) where n is the number of bits used to represent the number and this is exponential and so not in P i.e. prime testing using this algorithm isn't in P. 

But this doesn't prove that testing for primes is or isn't in P because there might be a better way of doing the job. But given the exponential complexity of factoring a number it seems very reasonable that a polynomial time algorithm for testing primality isn’t possible.

However, recently the problem was solved and now we know that testing a number for primality is in P – i.e. there is a polynomial time algorithm that will tell you if a number is prime or not. This wasn’t an easy result and we still don’t know if there is an even faster algorithm. The current fastest algorithm is O((log n)6), which is sub-polynomial.

You should be able to see that the problem is that if you can find an algorithm for a task that runs in polynomial time then you have proved that the problem is in P, but if you can't it might be that you just haven't looked hard enough. In this case we tackled proving primality by factoring, which was exponential time, but it turned out that there was a very non-obvious algorithm that could prove primality or not without factoring the number.

It is also worth thinking for a moment about what this means for the related task of factoring a number. To prove that a number is non-prime you simply have to find a factor and searching for a factor seems to require exponential time. If the number is a prime then you have to search through all of the possible factors to prove that it doesn't have one. At first sight proving a number is a prime seems harder than proving that a number isn't a prime and yet we have just claimed that proving a number prime is in P. This is very strange as the property of being a prime is defined in terms of factors and yet somehow you can prove it without finding any factors at all. All the proof of primality gives you as far as factoring is concerned is a clear decision on how many factors there are, i.e. zero for a prime and at least two for a non-prime. It clearly tells you nothing about what the factors are and this is good from the point of view of many cryptographic procedures that rely on the fact that factoring is hard.

To summarize: finding factors is exponential in time but the question "does this number have zero or more than zero factors” is answerable in polynomial time.

The route that we took to find an algorithm for testing a number for primality in polynomial time is also interesting. At first we invented probabilistic tests for primality, i.e. tests that only prove that a number is prime with a given probability. This is an unusual idea that deserves explaining further. Currently the most useful of the probabilistic tests is the Rabin’s Strong Pseudoprimality test.

If the number we want to test is n, assumed odd, first we have to find d and s such that n-1 = 2sd where d is also odd (this is easy to do, try it). Next we pick an x smaller than n and form the sequence:

x^d, x^{2d}, x^{4d}, x^{8d} ... x^{2^{n-1}d}

all calculated modulo n.

Once you have the sequence of values – called a “Miller-Rabin sequence” – you can be sure that if n is a prime the sequence starts with a 1 or has a -1 before the end of the sequence. As once the sequence is a 1 or a -1 the remaining values are 1 respectively you can specify the condition as

For a prime the Miller Rabin sequence is either all ones or it has a minus one followed by all ones.

This sounds conclusive, but some non-prime numbers also have sequences that satisfy this condition. Note that if the sequence doesn't satisfy the condition you have a certain proof that the number isn't a prime. It is the case where it is prime that is difficult.

At this point it looks like a hopeless task to say any more about the number if the sequence satisfies the condition. It might be prime or it might not. There is a proof that if n isn't prime then three quarters of the x values that you could pick don't have sequences that satisfy the condition.

To be more exact, if the number isn't prime the test has a 75% probability of not starting with a 1 or containing a -1. So after one test that satisfied the condition you can conclude that there is a 25% chance of the number isn't a prime. After another test the doesn't satisfy the condition, the chance that the number isn't prime is 6.25% or, putting it the other way, the probability that it is prime is 93.75%. If you keep testing you can make the probability as close to one as you want.

For example, if you get sequences that satisfy the condition with 100 random values then the probability that you have a non-prime number is (1/4)100 which is roughly 1/1061 which is exceeding unlikely and so you are safe in concluding that the number is indeed prime.

Such is the strange reasoning behind randomized testing for primes, and many other randomized algorithms. They never produce certainty, but what is the practical difference between a probability of 1/1061 and certain proof? There is more chance of a machine malfunction giving you the wrong answer computing a certain proof of primality than the Strong Pseudoprimality test failing!

The existence of a probability-based algorithm that was in P was an indication that there might just be a deterministic algorithm in P. This turned out to be true. Interestingly the probability-based test is still in use because in practice it is so much quicker than the deterministic test for moderate n. A great deal of computer science is only useful as theory even when it appears to be practical.

Summary

  • Algorithms scale in different ways as the number of items they have to deal with changes. For small values details of implementation matter, but as the number gets bigger they reveal their true nature.

  • The way algorithms scale is summed up in the "Big O" notation where O(f(n)) means that for large n the algorithm scales like f(n).

  • We are mostly concerned with average performance, but worst and best case performance are also of interest.

  • Many algorithms run in O(nx) so-called polynomial time and these can be considered to be "reasonable" algorithms, even if for any particular algorithm the time to compute it might be out of reach.

  • Algorithms that run in O(an) or worse are called exponential time and increase their demands so quickly that they quickly become impractical.

  • The difference between polynomial time and exponential time is so large that it is qualitative.

  • The way algorithms scale results from their programmatic structure. Polynomial algorithms are equivalent to a fixed number of nested loops. Exponential algorithms are equivalent to a variable number of nested loops.

  • Proving that an algorithm is as fast as it possibly could be is very difficult and in general we have to regard any estimate as a lower bound on behavior - a better algorithm may be waiting to be discovered.

  • An example of this discovery of faster algorithms is proving primality. Linked with factoring it was first thought to be exponential and then a polynomial time probabilistic test was extended to a more complex deterministic test.

 

A Programmers Guide To Theory

Now available as a paperback and ebook from Amazon.

cover600

Contents

  1. What Is Computer Science?
    Part I What Is Computable?
  2. What Is Computation?
  3. The Halting Problem
  4. Finite State Machines
    Extract 1: Finite State Machines
  5. Practical Grammar
  6. Numbers, Infinity and Computation
    Extract 1: Numbers 
    Extract 2: Aleph Zero The First Transfinite
    Extract 3: In Search Of Aleph-One
    Extract 4: Transcendental Numbers
  7. Kolmogorov Complexity and Randomness
    Extract 1:Kolmogorov Complexity 
  8. The Algorithm of Choice
  9. Gödel’s Incompleteness Theorem ***NEW!
  10. Lambda Calculus
    Part II Bits, Codes and Logic
  11. Information Theory 
  12. Splitting the Bit 
  13. Error Correction 
  14. Boolean Logic
    Part III Computational Complexity
  15. How Hard Can It Be?
    Extract 1: Where Do The Big Os Come From
  16. Recursion
    Extract 1: What Is Recursion
    Extract 2: Why Recursion
  17. NP Versus P Algorithms
    Extract 1: NP & Co-NP
    Extract 2: NP Complete

<ASIN:1871962439>

<ASIN:1871962587>

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Google Intensive AI Course - Free On Kaggle
05/11/2024

Google is offering a 5-Day Gen AI Intensive Course designed to equip data scientists with the knowledge and skills to tackle generative AI projects with confidence. It runs on the Kaggle platform from [ ... ]



Rust And C++ Should Be Friends?
20/11/2024

The Rust Foundation has just released a statement on Rust and C++ interoperability and Google is ponying up $1000,000 to see that it gets done.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info



Last Updated ( Saturday, 21 March 2020 )