What is even stranger is that this is a 'divide and conquer' type algorithm and so it is likely to be fast. You can show that this method will (on average) find the median of n elements in a time proportional to 2n - which is much better than performing a full sort.

An implementation of this median-finding method in C# might help understand exactly what is going on.

The function split implements the scanning action.

The n values are in the array a[ ], x is the splitting value, the left scan pointer is i and the right scan pointer is j. Notice that the two scan pointers have to be passed as ref because their updated values are used by the calling program to discover where they crossed.

private void split ( float[] a, int n, float x, ref int i,ref int j) {

do the left and right scan

do {

while (a[i] < x)i++; while (a[j] > x)j--;

Once the two scans stop we swap values if they are in the wrong part:

float t = a[i]; a[i] = a[j]; a[j] = t; and continue the scan until the pointers cross:

} while (i <= j); }

A function that makes use of the basic split method to find the median is:

private void median( float[] a, int n, ref int k) { int L = 0; int R = n-1; k = n / 2; int i;int j; while (L < R) { float x = a[k]; i = L; j = R; split(a, n, x,ref i,ref j); if (j < k) L = i; if (k < i) R = j; } }

This initially sets the scan to the whole of the array and x to the middle value. It then calls split repeatedly on appropriate portions of the array until the two pointers meet in the middle of the array when the value in a[k] is the median.

Notice that this will only give you median if n is odd and it doesn't work at all if there are duplicate values in the array.

If you are interested in finding percentiles other than the median then all you have to do is change the initial value of k. The value that you end up with in a[k] is always bigger than the number of element to its left and smaller than the number of elements to its right.

For example, if you set k=n/3 then you will find a value that divides the array into 1/3rd, 2/3rds.

Don't bother to use this algorithm unless the number of values is large. For small arrays almost any method will do and sorting is invariably simpler.

Even when it isn't practical, this quick median method is always beautiful.

Listing

float[] a = { 1, 2, 3, 0, 8, 9, 5 }; int k = 0; median(a, 7, ref k);

private void split( float[] a, int n, float x, ref int i, ref int j) { do { while (a[i] < x ) i++; while (a[j] > x) j--; float t = a[i]; a[i] = a[j]; a[j] = t; } while (i < j); }

private void median( float[] a, int n, ref int k) { int L = 0; int R = n - 1; k = n / 2; int i; int j; while (L < R) { float x = a[k]; i = L; j = R; split(a, n, x, ref i, ref j); if (j <= k) L = i; if (i >= k) R = j; } } }

Charles Babbage was born in the eighteenth century - the age of the Industrial Revolution. The calculating machines he invented, although never fully realized in his lifetime, are rightly seen a [ ... ]

What are the limits to computation? The computer science theory of computation can be intimidating because of its use of logic but taking a programmer's approach makes it seem much simpler. So if you [ ... ]

A friend of mine was writing a program to find the median. What's themedian?The simple answer is that it is the valuethatis biggerthanhalf of the values in a list. In a senseitisa middle value and it is important not only in statistics butalso inlots of everyday programming tasks. For example, if youwant tochoosea value to split a set of elements intotwoequal groups then the mean is the break point. The most obvious wayof findingthe median of a set of numbers is to sort the listinto orderand then look at the one half way down the list. Theonly trouble is that sorting isn't a very fast operation but atleast the algorithm is simple and stable.

Thereisanother way to find a median that isbothfastand fascinating. If you are a collector of algorithms this is one you shouldhave pinned on the wall. It was invented byC.A.RHoare and is closely related to Quicksort another of his mindboggling algorithms.Itisbased on a partitioningmethodthatmoves elementsofthe array around to produce two regions-aleft portionthat is smaller than of equal to a given value xanda right portion that is larger than or equal to x.

---------------------

|<= x |>=x|

---------------------

Youcan think up lots of methods of partioning an array inthis way but the simplest is:

1) start a scan off from the left of the array and stop whenyou findavalue that isn't smaller than x. You know it isinthe wrong place so remember where you got to and..

2) start a scan off from the right of the array and stop when you findavalue that isn't bigger than x. You know itisinthe wrong place so swap it with the value that you found in step 1.

Continuethe scans from where they left off. When the twoscans meetthe array is correctly partioned - because ever elementto the left of the where they met is smaller than or equal to x,it wouldhave been moved if it wasn't and every thing to theright is larger than or equal to x for the same reason.

Thispartioning by scanning is a nice algorithm but what hasit gotto do with the median? Well imagine that you choose avalue inthemiddleof the array and perform the scan.Ifbysome changethe two portions produced were of equal size xwouldbe the median. That is if x is the median the partion is

---------------------

|<= x |>=x|

---------------------

^

half

way

Of course you would be very lucky to have found the median in the middleof the array and in most cases this wouldn't work.Ifx was smaller than the median then the two scans would cross before the middle of the array -

---------------------

| <= x |>=x|

---------------------

^

half

way

Inthis case you can pick the element that is now in themiddle of the array (it must have changed but can you see why?) as a new valueof x and repeat the procedure again but only on theright handportion.You only have to re-partion theportiontothe right because the portion to the left is already smaller than the new value of x - but can you see why?

If the initial value of x was bigger than the median thepicture would be

---------------------

|<= x|>=x |

---------------------

^

half

way

andbythe same argument you can pick the newelementinthe middleandrepeat the partitioning but only onthelefthand portion.

If you repeat this partioning process you will eventually finda pair of scans the do meet in the middle and the value of x is the median. I don't know about you but I get an uneasy feelingabout thisalgorithm.Ifyoureally wanttomakesurethatyou understandthealgorithm follow it to the pointthatyoucan understandhow and why the value in the middle changesateach scan and why you only have to re-partion one of the portions.It is enough to keep you awake at night. Eventually the value stored at the middle of the array becomes the median and the values that it is bigger than or equal to are to the left an the valuesthat itis smaller than or equal to are to the right.Verystrange. What is even stranger is that this is a `divide and conquer' type algorithmand so it is likely to be fast. You can show thatthe this method will (on average( find the median of n elements ina time proportional to 2n - which is much better than performinga full sort.

Animplementation of this median finding method inQuickBASIC might help understand exactly what is going on.

Subroutinesplit implements the scanning action. Then%values are in the array a(), x is the splitting value and the leftscan pointer is i% and the right scan pointer is j%.

SUB split (a(), n%, x, i%, j%)

DO

DO WHILE a(i%) < x

i% = i% + 1

LOOP

DO WHILE x < a(j%)

j% = j% - 1

LOOP

IF i% <= j% THEN

SWAP a(i%), a(j%)

i% = i% + 1

j% = j% - 1

END IF

LOOP UNTIL i% > j%

END SUB

The first WHILE loop scans from the left and the second fromthe right.When two suitable values are found they are swappedover and the pointers are moved on past the values. The scan continues until the pointers cross.

The subroutine that makes use of split is

SUB median (a(), n%, k%)

l% = 1: r% = n%

k% = n% / 2

DO WHILE l% < r%

x = a(k%)

i% = l%: j% = r%

CALL split(a(), n%, x, i%, j%)

IF j% < k% THEN l% = i%

IF k% < i% THEN r% = j%

LOOP

END SUB

Thisinitially sets the scan to the whole of the array and xto themiddle value. It then calls split repeatedly onappropriate portionsof the array until the two pointers meet in themiddle of the array when the value in a(k%) is the median.

Ifyouareinterested in finding percentilesotherthanthe median then all you have to do is change the initial value of k%. The value that you end up with in a(k%) is always bigger than the numberofelement to its left and smaller thanthenumberof elementsto its right. For example, if you set k%=n%/3 thenyou will find a value that divides the array into 1/3rd 2/3rds and so on.

Don't bother to use this algorithm unless the number of values is large. For small arrays almost any method will do and sortingis invariablysimpler.Even when it isn't practicalquickmedian method is always beautiful.