Programmer's Python Data - A Custom Data Class

Written by Mike James

Monday, 15 January 2024

Article Index
Programmer's Python Data - A Custom Data Class
Node Numbering
Equality and Hash
The Complete Tree

Page 3 of 4

You can also define methods that implement comparisons between instances of your custom object or with some other data type. The comparison methods are all very obvious, for example __eq__ for == and __ne__ for !=. Notice, however, that unlike the general operators there is no need for a right and a left version as x < y is either x.__lt__(y) or y.__gt__(x).

Implementing a comparison operation is usually only problematic because working out what it means for a complex object can be difficult. For example, you might want to define two trees as being equal if they have the same nodes with the same values. To implement this we have to define the __eq__ method:

def __eq__(self,otherTree):
    if len(self)!=len(otherTree):
        return False
    for i in range(0,len(self)):
            if self.getNode(i).value!= 
                      otherTree.getNode(i).value:
                return False
    return True

The first test checks that the trees are the same size, if not then False is returned. If they are the same size each node and value is checked for equality. If any are different then False is returned and if they are all equal then True is returned. After this you can write things like:

tree1=Tree(0)
for i in range(1,15):
    tree1.appendBreathFirst(i)
tree2=Tree(0)
for i in range(1,15):
    tree2.appendBreathFirst(i)
print(tree1==tree2)
tree1[10]=42
print(tree1==tree2)

which displays True followed by False.

Notice that the method will fail if the two trees have the same number of nodes, but not the same structure. However, as the only way implemented to build a Tree always gives the same structure for the same number of nodes, this isn’t currently a problem. Also notice that != works perfectly without having to implement __ne__ as the default __ne__ method calls __eq_ and negates the result.

Defining __eq__ is a necessary step in making your data object hashable and hence usable as a key in a dictionary. It is unlikely, but not impossible, that you would want to use a Tree as a key, but if you do then two trees that are considered equal by __eq__ should give the same hash and hence act like identical keys, even if they are different instances. Of course, the other condition on a hash value is that it shouldn’t change and this implies that the object it is being computed on is immutable. For the sake of an example, we will assume that Tree is immutable.

We have already defined __eq__, which is required for a hash function, but if we don’t also define a __hash__ method then the object isn’t hashable because the default __hash__ is marked as not implemented. So if you create an __eq__ method you have to define a __hash__ method if you want your object to be hashable.

If you don’t define an __eq__ method then you get default methods which only consider an object equal to itself and return a hash value based on this.

The good news is that you don’t have to invent a basic hash function as you can use the supplied __hash__ function defined on the components of your data object. The built-in hash function accepts a single object and returns its hash value. However, a tuple already has a built-in __hash__ function which combines all of its elements into a single hash value. So the simplest way to get a hash for a custom object is to put together a tuple of its values and anything that defines it as unique and hash the tuple. For example, in the case of Tree, we have already defined equality to mean that the same nodes have the same values, so for a hash we should use all of the node values:

    def __hash__(self):
        temp=[]
        for i in range(0,len(self)):
            temp.append(self.getNode(i).value)
        return hash(tuple(temp))

All that this method does is to build up a list of all of the node values and then call hash with this converted to a tuple. The result is that you get the same hash for trees that are considered equal by the __eq__ method.

What about trees that have the same values but stored in different nodes? These test as not-equal, but the tuple that is produced contains the same values albeit in a different order. Fortunately, the fact that tuples are compared in position order means that the two tuples are not equal and so the hash value produced doesn’t have to be equal and so our overall hash function works.

If you try this out you will find that two trees that test equal always give the same hash value. but trees that differ even in the order of their values give different values with a high probability:

print(tree1==tree2)
print(hash(tree1))
print(hash(tree2))
tree1[2],tree1[3]=tree1[3],tree1[2]
print(tree1==tree2)
print(hash(tree1))
print(hash(tree2))

The first test proves True and the two trees are equal and hence so are the hash functions. Then the values in nodes 2 and 3 are swapped in tree1 and the two trees test as not-equal and the two hash values are different – but tree2 still has the same hash value.

Even though Tree is not immutable, you can now write:

d={tree1:"tree1",tree2:"tree2"}
print(d[tree1])
print(d[tree2])

If tree1 and tree2 are not equal you will see tree1, tree2 displayed, but if they are equal you will see tree2, tree2.

In chapter but not in extract

Abstract Classes, Duck Typing and Interfaces
Abstract Base Classes ABC
Abstract Properties
Registering and Defining Subclasses
Using ABC – numbers and collections
Dynamically Adding Attributes

<< Prev - Next >>

Last Updated ( Monday, 15 January 2024 )