Programmer's Python Data - Bits and BigNum

Written by Mike James

Monday, 25 March 2024

Article Index
Programmer's Python Data - Bits and BigNum
The Bitwise Operators
NOT

Page 1 of 3

Bits are at the bottom of it all but Python is high level so how do you work with bits in Python? Find out how it all works in this extract from Programmer's Python: Everything is Data.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

Python – A Lightning Tour
The Basic Data Type – Numbers
Extract: Bignum
Truthy & Falsey
Dates & Times
Sequences, Lists & Tuples
Extract Sequences
Strings
Extract Unicode Strings
Regular Expressions
The Dictionary
Extract The Dictionary
Iterables, Sets & Generators
Extract Iterables
Comprehensions
Extract Comprehensions
Data Structures & Collections
Bits & Bit Manipulation
Extract Bits and BigNum ***NEW!!!
Bytes
Extract Bytes And Strings
Extract Byte Manipulation
Binary Files
Text Files
Creating Custom Data Classes
Extract A Custom Data Class
Python and Native Code
Extract Native Code
Appendix I Python in Visual Studio Code
Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:187196265X>

Bits & Bit Manipulation

In previous chapters we have focused on sophisticated high-level data abstractions – but while it is true to say that in Python everything is an object it is universally true that in programming everything is a bit pattern.

What exactly does this mean?

The Bit Pattern

At the lowest level computers work in binary. More accurately they work with just two states signified by 0 and 1. Any data stored in the computer has to be in terms of a pattern of 0s and 1s – this is the only possibility.

We often think of these patterns as binary numbers, but this is just convention. In reality the bit pattern can be used to represent anything you want it to. For example, you could designate a particular bit pattern 101 as representing the letter A. You can also read the bit pattern as if it was a binary number or as 5 in decimal. This is often assumed to be a more fundamental interpretation of the bit pattern 101, but this is because most computers have hardware that will perform binary arithmetic.

This fact leads us to think that binary numbers are somehow fundamental, but it would be perfectly possible, if not reasonable, to build a computer with no hardware dedicated to binary arithmetic. Then we would have to write programs that manipulated the bit patterns as if they were numbers just as we have to write programs that manipulate the bit patterns as if they were text or a representation of the state of a set of on/off switches.

The fundamental data entity is the bit pattern and what makes it useful is how we decide to interpret and manipulate it.

Hexadecimal

Having said that bit patterns are not necessarily binary numbers, it has to be added that we often do make use of the correspondence to specify or communicate a bit pattern. That is, the bit pattern 101 is often treated as if it was identical to the decimal number 5 – because this is the bit pattern that represents 5 in binary. As a result we spend a lot of time learning about binary representations and how to perform binary arithmetic. While this is important there is a much better way to specify bit patterns using hexadecimal numbers.

The basic idea is that instead of counting up to 9 with unique symbols we count up to 15 using the symbols:

0 1 2 3 4 5 6 7 8 9 A B C D E F

There is no particular reason for using A to F but then again there is no particular reason for using 0 to 9 apart from custom and practice – any symbols would do the job.

The reason hex is important is that you can convert each hex symbol into a unique 4-bit pattern:

0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0000	0001	0010	0011	0100	0101	0110	0111	1000	1001	1010	1011	1100	1101	1110	1111

If you know binary you will recognize each bit pattern as the binary number that corresponds to the hexadecimal symbol. That is, hex 5 is 0101 which is also 5 in decimal. The hex value E is 1110 which is the binary representation of 14.

You can use hex symbols to specify any bit pattern of any length simply by concatenating symbols. For example, suppose you want to specify the bit pattern 0111100 then starting from the leftmost four bits 1100 we can use the table to work out that this is C. The next four bits are 0011, as we always add zero bits to make the group up to four bits, and this corresponds to 3. So the bit pattern is specified in hex as 0x3C. The leading 0x is the standard way in many languages including Python of specifying a hex literal. This works in both directions, so any set of hex symbols specifies a bit pattern. For example, 0xABCDE specifies:

101010111100110111101111

Converting Binary

Specifying bit patterns makes hex so much easier than using alternatives such as decimal. After all, what is the decimal representation of 0111100? To answer this question you have to do a full bit-by-bit conversion to decimal to get 60. You can’t convert binary to decimal or vice versa so many bits or digits at a time. This only works if the base you are converting to is a power of two, so it works for octal (base eight) and hex (base sixteen).

If you want to use octal literals then start the value with a 0o where the second character is the lower-case letter o. That is, 0o7 is 7 in octal and 0o100 is 100 in octal and 64 in decimal.

Octal has the same property as hex in being used to specify bit patterns, but in this case each octal symbol determines three bits:

0	1	2	3	4	5	6	7
000	001	010	011	100	101	110	111

Octal can be easier to use in some situations, but notice that you need more octal symbols to specify a given bit pattern. Similarly you can specify a binary literal using the prefix 0b. For example, 0b101 is decimal 5.

You might think that the easiest way to specify a bit pattern is to use a binary literal but typing a lot of zeros and ones is very error prone. Hex is a shorter representation and hence easier to type in correctly and easier to verify.

Bit Patterns In Python

In most languages the standard way to work with bit patterns is to use a fixed-size integer. For example, in C a typical variable references a 16- or 32-bit integer and this means every bit pattern you work with is exactly 16 or 32 bits long. This leads, not so much to problems, but to ways of thinking. You tend to think of working with bits at a fixed size and then converting down if fewer bits are needed or putting multiple variables together for more bits. This makes working with bits more difficult until you get used to it.

Python, on the other hand, doesn’t have a fixed-size integer, instead as discussed in Chapter 2 the storage allocated to an integer, referred to as “bignum”, grows as needed to store the value it holds. As a result you can do integer arithmetic in Python without any worries about overflow until you run out of memory to hold all of the bits of the result.

You can also use hex and binary to set up bit patterns using the bignum representation. As a result you can set up and work with bit patterns of any size, limited only by the amount of memory the machine has. If you are familiar with other languages and their use of bit patterns this is very different. If you want to limit the number of bits you are working with you have to do so explicitly, as will be explained later. What all this means is that when you are working with bit patterns you have an unlimited number of bits. For example:

a = 0xAAAAAAAAAAAAAAAAAA
print(a)
print(hex(a))
print(bin(a))

displays:

3148244321913096809130
0xaaaaaaaaaaaaaaaaaa
0b101010101010101010101010101010101010101010101010101010101010101010101010

As you would expect, 0xA is 1010 which makes it useful for testing alternating bit pattern.

Notice that hex(value) will convert a bignum to a hex string and bin(value) converts it to a binary string. Both return strings and not numeric values, but the strings are formatted correctly as literals. To convert a correctly formatted literal string to an integer you can use the function:

int(string, base)

This converts the string to an integer assuming that it is a representation using the specified base. If you specify zero for the base then the format of the string determines the base. For example:

a = 0xAAAAAAAAAAAAAAAAAA
print(a)
print(int(hex(a),0))
print(int(bin(a),0))

displays 3148244321913096809130 three times. If the string isn’t a valid literal with prefix 0x, 0o or 0b then you have to specify the base i.e. 16, 8 or 2.

Prev - Next >>

Last Updated ( Tuesday, 26 March 2024 )

Programmer's PythonEverything is Data