I like problems that look simple and turn out to be really difficult. It's the way that something simple can hide a complexity that you never guessed at. Fortunately for me the universe seems to be built in this way!

One particularly fascinating problem that also has applications in cryptography is the knapsack or sum partitioning problem. It is very like the challenge posed on some quiz shows where the contestants have use a randomly picked set of numbers and the four arithmetic operators to get as close as possible to a target.

On the face of it the knapsack problem is even simpler than this – given a set of positive integers 3, 5, 6, 10, 34 say, find a subset that sums to exactly to a given value, 50 say. In this case you should spot at once that 34+10 is 44 and 44+6 is 50 so the subset in question is 6,10,34.

This is called the knapsack problem because it is the same as trying to pack a knapsack with a range of items, i.e. the positive integers, so that it is just full, i.e. reaches the value in question.

Easy as I said but now try to write a computer program that, given an arbitrary set of *n* positive integers, finds a subset that sums to a given integer.There are many clever ways of doing this but the most direct is simply to examine all the possible ways of adding the numbers up. This approach may not be clever but it demonstrates why the task is difficult and reveals a number of deeper principles.

The first problem that you face is that you need to try every possible subset of *n* numbers. If you want all possible subsets of three numbers then the easiest way in C# is to write three for loops:

for (int i = 1; i <= 3; i++)

for (int j = 1; j <= 3; j++)

for (int k = 1; k <= 3; k++)

Console.WriteLine(i.ToString()

+ "," + j.ToString() +

"," + k.ToString());

Actually this generates more than the subsets because it generates values like 1,1,1 and 1,2,3 and 3,2,1 – but it works and you can get rid of the sillies and the duplicates easily. The problem is that for *n* numbers you can't have a variable number of for loops. Whenever you find yourself looking at a variable number of for loops you have to solve the problem using recursion.

You could say that it a problem involving *n* objects requiring *n* nested loops to solve it is a "smell" that recursion is in the air.

Before moving on to the knapsack problem let's implement a simple function that uses recursion to create N nested loops - where N is a variable.

The function we need creates a new loop each time it is called, i.e. increasing the number of loops and hence the nesting by one.

void Loop(int nest, int N)

{

The first parameter records the current nesting level and the second the desired nesting level. That is, the function will create N nested loops.

If the current nesting level is equal or greater than N we don't want to create another loop:

if (nest >= N)return;

If we do need a new loop add one to the nesting level and create a new loop:

nest++;

int i;

for (i = 0; i < N; i++)

{

To see that the nested loops are doing what we think they are the loop prints its nesting level and current value:

Console.WriteLine("Nesting " +

nest.ToString() +

" Value " + i.ToString());

Finally we use the Loop function once again, i.e. recursively to create another loop inside the current loop:

Loop(nest,N);

}

Console.WriteLine();

nest--;

When the loop ends we simple throw a new line and decrement the nesting count.

The complete function is:

void Loop(int nest, int N)

{

if (nest >= N)return;

int i;

nest++;

for (i = 0; i < N; i++)

{

Console.WriteLine("Nesting " +

nest.ToString() + " Value " +

i.ToString());

Loop(nest,N);

}

Console.WriteLine();

nest--;

}

and to use it you simple set the current nesting level to 0 i.e. there are no loops when you start and the desired nesting level to three say:

Loop(0,3);

The result in this case is:

Nesting 1 Value 0

Nesting 2 Value 0

Nesting 3 Value 0

Nesting 3 Value 1

Nesting 3 Value 2

Nesting 2 Value 1

Nesting 3 Value 0

Nesting 3 Value 1

Nesting 3 Value 2

Nesting 2 Value 2

Nesting 3 Value 0

Nesting 3 Value 1

Nesting 3 Value 2

...

and so on

Notice that the highest nested loop index changes fastest which is what you would expect of a set of nested loops.

<ASIN:3540402861>

<ASIN:0471816523>

<ASIN:1848000693>

Before we can create a Knapsack function we need some data to test it.

All we have to do is to load an array with a reasonable set of integers to try to make the target up from and it then calls the recursive SUM function which tries to find a subset:

int N = 50;

Random R = new Random();

int Target = R.Next(1,N * N);

int[] A = new int[N];

for (int i = 0; i < N; i++)

A[i] = R.Next(1,N * N / 2);

Console.WriteLine("Target: " +

Target.ToString());

int CurrentSum = 0;

Sum(0,N,ref CurrentSum,A,Target);

Console.WriteLine("done");

Sum is recursive but not difficult to follow – it is just a set of nested for loops one per call of the function and is based on the Loop function listed earlier.

Now as well as nest and N we also need the CurrentSum, i.e. what the selected elements add up to, the array, and the Target value:

void Sum(int nest, int N,

ref int CurrentSum, int[] A,int Target)

{

if (nest >= N)return;

int i;

nest++;

for (i = 0; i < N; i++)

{

The CurrentSum has to be passed by ref because it is changed by the function and this change needs to be returned to show what the result so far is. It is also what brings all of the loops to a stop. When we find a set of values that add to the Target that's enough, i.e. we are looking for the first solution not all of the solutions.

The logic in each loop is to first test to see if adding the next selected element will take the CurrentSum over the Target. If so we simply move on to the next element in the loop and see if this is worth considering:

if (CurrentSum + A[i] > Target)

continue;

If the current element can be added to the CurrentSum without exceeding the Target then we update the CurrentSum and call the function to find the next element using the next nested loop:

CurrentSum += A[i];

Sum(nest,N,ref CurrentSum,A,Target);

When this function returns it has either tested all of the elements and failed or it has found the correct sum, i.e. CurrentSum equals the Target and we test for this:

if (CurrentSum == Target)

{

Console.WriteLine(nest.ToString() +

"," + A[i].ToString());

break;

}

If the CurrentSum is equal to the target we print the current index and the array element that was used in the sum and break out of the loop. Notice that once the CurrentSum is equal to the target it stays equal to the target and all of the loops come to an end by executing the break. If the inner loop fails to find a solution we have to correct CurrentSum by subtracting the failed element and moving on to try the next element in the loop.

CurrentSum -= A[i];

}

nest--;

}

If you run it and it finds a solution it prints the elements of the array. If it fails to find a solution you just see the magic word "done" – you can't always make the target up from a subset of the values.

There are minor problems in this program – in particular it can generate repeated values in the set of numbers but this gets increasingly unlikely as N gets bigger.

If you try this program out you will find that at first everything works quickly with answers flashing up after a few seconds. However if you try N=30,000 say then things slow to a stop. The problem is that the knapsack algorithm slows down as 2^N which makes the problem for 30,000 number about 10^10000 (yes 1 followed by 10,000 zeros) times longer to solve than the 1000 number problem – so don't wait for a solution too long.

It is the huge number of different combinations that have to be considered that makes the task so difficult. However, how do you know that there isn't a short cut that means you don't have to consider every possible combination?

There are algorithms that attempt to solve the problem more quickly. For example there are variations on the greedy algorithm - pick the first number closest to the target, then the next closest and so on. A more sophisticated greedy algorithm is based on dynamic programming. The key factor in all of these algorithms is that they usually find the solution in reasonable time but there is no promise that they will ever find the solution without examining every possible combination. They may be advanced but they are still "heuristics", i.e. algorithms that usually work.

The knapsack problem is the basis for one method of public key cryptography. A set of numbers in a given order is made public and any user can code a message by multiplying each number by a zero or a one according to the messages binary representation. All of the values are added up to give a single value and this is of course the "target" value in a knapsack problem using the given set of numbers. Of course you can't decode the message because this would mean solving the knapsack problem for very very large N. However the intended recipient of the message can decode it because they have a "shadow" set of numbers with a special properties that make the solution of the knapsack problem very easy. So anyone can code a message using the public numbers but only the recipient can decode it without solving the full knapsack problem. Unfortunately a few years ago it was discovered that the need to use a "shadow" set of values made the knapsack problem posed by the public values rather easier to solve than was originally thought – i.e. there was a way of doing the job much faster than 2^N. So now the knapsack code is no longer used but it is still interesting and with a better set of shadow values might even be resurrected one day.

{loadposition moreprojectsART}

<ASIN:9810221207>

<ASIN:059651610X>

<ASIN:3540613560>

]]>In computer science and discrete mathematics a graph is a set of points – called vertices or nodes and a set of connecting lines called paths, arcs or edges. This definition of a graph includes the familiar temperature charts etc but only as a special case. In this sense a graph is anything from a hierarchy chart to a computer network.

*A simple graph of vertices and arcs*

Graphs are important in computer science as a basic tool of theory and they are of practical importance. Before we can go on to explain why we need some additional definitions:

- A path is a route from one vertex to another via arcs.
- If there is a path between every pair of vertices the graph is a "connected graph".
- A path from a given vertex and back again is called a cycle.

*A connected graph showing a path and a cycle*

Now we can define a tree – a tree is a connected graph with no cycles. There are many equivalent definitions of a tree including the one that gives it its name – a root node connected to leaf nodes by branches.

*Check that this is a connected graph with no cycles – aka a tree!*

A slightly more advanced form of graph is the network. A network is a graph which has weights assigned to each of its paths. You can think of each weight as either being a distance or a cost associated with the path. With networks you can ask questions such as "find the shortest path", "find the shortest round trip" and so on.

One important version of this "shortest" type of question is – what is the shortest connector. That is, the sub-graph with the shortest total distance that connects all of the vertices.

It doesn't take much to see the shortest sub-graph that connects all of the vertices is going to be a tree because if it contains any cycles you can get a shorter graph by deleting at least one arc without altering the connectivity. As a result this problem is often called finding the "minimum spanning tree", MST.

Why is the MST important?

Consider the situation that you have to connect three computer sites into a network. All you care about is that every computer is connected and the cost of connection is proportional to distance. In this case the minimum spanning tree is going to be a good starting point for a practical implementation of the network.

Another, slightly more esoteric, example of the usefulness of an MST is that it provides an upper bound to the travelling salesman problem. The travelling salesman's round route has to be shorter than twice the MST for the network.

<ASIN:1584883960>

<ASIN:0201000237>

How do you find a minimum spanning tree given a network? It turns out that there are two general algorithms – Prim's and Kruskal's. Of the two Prim's is the easier to implement and to understand, so it makes a very good starting place to understand any graph algorithm. It is also an excellent example of a "greedy" algorithm, i.e one that always takes the best option at each step, that actually works!

This algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník and independently in 1957 by computer scientist Robert C. Prim and was rediscovered by Edsger Dijkstra in 1959:

0) Place all of the vertices in an "excluded" set and initialise an "included" set to be empty.

1) Select any vertex as the starting vertex of the tree. Place this vertex in the "included" set.

2) Find a vertex which is closest to any vertex in the "included" set. If there is more than one such vertex select one at random.

3) Repeat 1 and 2 until all of the vertices are in the included set and the excluded set is empty.

The result is a minimum spanning tree – as long as you remember to store which path between which pair of nodes was the shortest distance at each step!

The above may seem simple but there are some interesting problems in implementing any graph algorithm using C#.

The first is how are we to represent a graph or network. The simplest solution is to use a two-dimensional array to record the distance between each node. For example:

Network[a,b];

would be used to store the distance between vertex a and vertex b. Notice that this matrix is symmetric and the diagonal is zero because the distance between "a" and "a" is zero.

The next problem is how to generate a network to try the algorithm out on?

In a real world problem the network would be entered as data to the array but it is a common requirement to generate valid examples of any data structure that a program is working with.

In this case the problem isn't difficult in that all we have to do is generate a set of random numbers between particular limits and make sure that the array is symmetrical and has a zero diagonal. This would produce a fully connected network which is difficult to represent graphically. To produce something that had fewer connections per vertex it would be necessary to set to zero entries picked at random.

While on the subject of representing the network graphically it is worth commenting that the distances in the network matrix don't have to be Euclidean distances. For example they might well be costs or subjective judgements. The algorithm works even if the distances aren't Euclidean but you can't draw the vertices in a 2D space with the given distances between them. This is a problem only if we want a to produce a graphical display – but in this case we do!

The solution is to work the other way around. First generate the vertices as random points in a 2D space and then work out the distances between them to store in the network matrix. If we also keep the locations of the points then drawing the network is even easier.

I did consider implementing a fully object-oriented version of the algorithm but this seems to go against the idea of a "classic" algorithm. The object-oriented approach works well when the problem breaks down into natural groupings of methods and property but it is often less good at representing algorithms. In this case the focus is on the algorithm, i.e. exactly how the code implements the idea, and this is best achieved by breaking the algorithm down into functions.

Starting with a new C# WPF project we need to add a button labelled "Generate Network" and a Canvas object. A Canvas object is often a good thing to use when you need to draw using x,y co-ordinates but there are other ways of doing the same job.

Moving to the code. We need an array to hold the co-ordinates of the points:

private Point[] Positions =

new Point[size];

size is a constant that sets the size of the network in number of vertices.

const int size = 10;

The array to hold the network data can be defined as

private Single[,] Network =

new Single[size, size];

We are also going to need a random number generator throughout the program so creating a global object is a good idea:

private Random R = new Random();

We can now write a subroutine that fills both Position and Network with random values:

private void setnet(

Single[,] Net,Point[] Pos)

{

int maxlength = (int)(Math.Min(

canvas1.Width,

canvas1.Height) * 0.9);

int minlength = maxlength / size;

for (int i = 0; i < size; i++)

{

Pos[i].X =

R.Next(minlength, maxlength);

Pos[i].Y =

R.Next(minlength, maxlength);

for (int j = 0; j <= i; j++)

{

Net[i, j] = distance(Pos[i], Pos[j]);

Net[j, i] = Net[i, j];

if (i == j) Net[i, j] = 0;

}

}

}

The points are generated randomly with co-ordinates between minlength and maxlength which are derived from the size of the Canvas object.

Notice that the distance array is made to be symmetric and the diagonal is also set to zero. The distance between points is computed using an additional function.

The distance function is simply the usual Euclidean formula:

private Single distance(Point a, Point b)

{

return (Single)Math.Sqrt((a.X - b.X)*

(a.X - b.X) + (a.Y - b.Y) *

(a.Y - b.Y));

}

Recall that C# has no power operator and simply multiplying things together is better than having to use the Maths object to square a value.

After generating the network it would be nice to view it. A drawing routine is quite easy – as long as you don't worry too much about efficiency. We can simply use Line objects to draw lines on the Canvas.

First we clear the Canvas of any child nodes it may already contain:

private void shownet(Single[,] Net)

{

canvas1.Children.Clear();

Next we create a new Line object for each pair of points that have a non-zero distance between them:

for (int i = 0; i < size; i++)

{

for (int j = 0; j < i; j++)

{

if (Net[i, j] != 0)

{

myLine = new Line();

myLine.Stroke = Brushes.Black;

myLine.X1 = Positions[i].X;

myLine.X2 = Positions[j].X;

myLine.Y1 = Positions[i].Y;

myLine.Y2 = Positions[j].Y;

myLine.StrokeThickness = 1;

canvas1.Children.Add(myLine);

}

}

}

Next we can plot suitable marker shapes at the position of each node:

Rectangle myMarker;

for (int i = 0; i < size; i++)

{

myMarker = new Rectangle();

myMarker.Stroke =Brushes.Black;

myMarker.Fill = Brushes.Red;

myMarker.Height = 10;

myMarker.Width = 10;

myMarker.SetValue(

Canvas.TopProperty,

Positions[i].Y - myMarker.Height / 2);

myMarker.SetValue(

Canvas.LeftProperty,

Positions[i].X - myMarker.Width / 2);

canvas1.Children.Add(myMarker);

}

}

Notice the way the that the position of the rectangle is set using an attached property on the Canvas object. This is a feature of WPF that beginners often find difficult. Attached properties are a good idea from the implementation point of view but they often don't help with readability and clarity. You might also ask why the Line object sets its own position without the need for an attached property and yet most of the other WFP shapes can't.

To try these out place a button on the form, if you already haven't and enter the following as the button's click routine:

private void button1_Click(

object sender,

RoutedEventArgs e)

{

canvas1.Width = 600;

canvas1.Height = 600;

setnet(Network, Positions);

shownet(Network);

}

You can now try the program out and see if it really does generate reasonable graphs. It should! You can size the canvas control to make best use of the space and the graph will adjust itself to fill it, more or less. It is fun just to see the range of graphs that can be produced.

*A typical random graph with ten vertices*

<ASIN:1565924533>

<ASIN:1584883960>

Now we have to move on to implement Prim's algorithm proper. As well as just implementing the method, an animated display will be used to show the paths as they are being considered for inclusion. Place a second button on the form and change its caption to "Find MST". It's click routine is simply

private void button2_Click(

object sender, RoutedEventArgs e)

{

prims();

}

The work is all done by the prims function.

The first thing we need are two arrays to store the list of vertices that have been included and are still to be processed:

void prims()

{

int[] included = new int[size];

int[] excluded = new int[size];

We also need another network array to hold the distances that form the minimum spanning tree:

Single[,] finished =

new Single[size, size];

Two integer variables are used to hold the start and finish node numbers of each path added to the tree:

int start = 0;

int finish = 0;

The two arrays need to be initialised:

for (int i = 0; i < size; i++)

{

excluded[i] = i;

included[i] = -1;

}

At the start of the algorithm all of the vertices are excluded, i.e. vertices 0 to size-1, and none of the vertices are included, i.e. all entries -1. Notice that -1 is used to flag the fact that there is no entry in the array.

The first point in the MST is selected at random:

included[0] = excluded[R.Next(size)];

excluded[included[0]] = -1;

Notice that the vertex that has been added to "included" has to be removed from "excluded" by setting that element to -1.

At last we can start the loop that finds the closest vertex to the currently included set and adds it to "included".

for (int n = 1; n < size; n++)

{

closest(n, ref start, ref finish,

included, excluded);

where *start* and *finish* are the indices of the vertices of the path that is to be added to the tree. That is *finish *is the closest of the currently excluded vertices and *start* is the vertex in the included set it is closest to - adding the arc from *start *to *finish *to the MST.

To be clear, the function closest finds the closest vertex listed in the excluded array to any of the vertices listed in the included array.

The end point of the path is the new vertex that has to be added to the included set, i.e the vertex indexed by *start *is already in the included array:

included[n] = excluded[finish];

and hence removed from the excluded set:

excluded[finish] = -1;

The distance associated with the path also has to be added to the "finished" array to indicate that it has to be drawn as part of the MST:

finished[included[n], included[start]] =

Network[included[n], included[start]];

and the finished array has to be made symmetric just in case someone wants to make use of the upper triangle rather than the lower:

finished[included[start], included[n]] =

Network[included[start], included[n]];

}

and finally we can show the finished MST:

shownet(finished);

}

All that is needed to complete the program is the "closest" function:

private void closest(int n,

ref int start, ref int finish,

int[] included, int[] excluded)

{

Single smallest = -1;

for (int i = 0; i < n; i++)

{

for (int j = 0; j < size; j++)

{

if (excluded[j] == -1) continue;

if (smallest == -1) smallest =

Network[included[i], excluded[j]];

if (Network[included[i], excluded[j]]

> smallest) continue;

smallest =

Network[included[i], excluded[j]];

start = i;

finish = j;

}

}

}

This is essentially a search loop for the smallest value but it has to take into account whether the vertex is valid or not. A vertex that isn't in the excluded list isn't valid and one with a zero distance isn't connected to the node in question and once again isn't valid. Apart from this complication each node in the excluded list is used to find the smallest distance.

If you put all of this together and run the program you will discover that not only can you generate random graphs but also their MSTs.

You can learn a great deal by watching it. If you press the "Find MST" button again the whole process is repeated from a different starting point. Notice that the MST isn't unique so if you discover that starting from a different vertex gives you a different MST this isn't an error.

*A minimum spanning tree.*

You can add additional facilities to this program quite easily. For example, a display of the total length of the MST would be interesting and a slider control to modify the number of arcs would also be worth the effort. You could also animate the display to show the arcs being selected one at a time.

If you would like the code for this project then register and click on CodeBin.

{loadposition moreprojectsART}

<ASIN:1584885491>

]]>