Page 1 of 2 Bytes are at the most primitive of data type and hence universal but can you manipulate them? Find out how it all works in this extract from my new book Programmer's Python: Everything is Data.
Programmer's Python Everything is Data
Is now available as a print book: Amazon
Contents
- Python – A Lightning Tour
- The Basic Data Type – Numbers
Extract: Bignum
- Truthy & Falsey
- Dates & Times
Extract Naive Dates ***NEW!!!
- Sequences, Lists & Tuples
Extract Sequences
- Strings
Extract Unicode Strings
- Regular Expressions
- The Dictionary
Extract The Dictionary
- Iterables, Sets & Generators
Extract Iterables
- Comprehensions
Extract Comprehensions
- Data Structures & Collections
Extract Stacks, Queues and Deques Extract Named Tuples and Counters
- Bits & Bit Manipulation
Extract Bits and BigNum
- Bytes
Extract Bytes And Strings Extract Byte Manipulation
- Binary Files
- Text Files
- Creating Custom Data Classes
Extract A Custom Data Class
- Python and Native Code
Extract Native Code Appendix I Python in Visual Studio Code Appendix II C Programming Using Visual Studio Code
<ASIN:1871962765>
<ASIN:1871962749>
<ASIN:1871962595>
<ASIN:B0CK71TQ17>
<ASIN:187196265X>
In chapter but not in this extract
- Bytes
- Bytes and Bytearray
- Bytes As Strings
- Decode Encode
Byte Manipulation
The need to perform bit manipulation on multiple bytes is a common requirement. There are two ways to approach this problem. We could convert the bytes to a single bignum representation, perform the bitwise operation and then convert back. Alternatively we could process the sequence directly, using for loops, to produce a new sequence.
If you want to convert a byte sequence to a bignum you can use the from_bytes class method:
int.from_bytes(bytes, byteorder =, signed = False)
where bytes is a bytes or bytearray object and byteorder determines the order in which the bytes are to be used to create the integer and can be set to big or little.
This matter of order is something we have been able to ignore up to this point, but no longer. The problem is, where is the most significant byte – at the start of the sequence or at the end? This is the well known “endian” problem and it is a fundamental choice in computer architecture. Bytes, or groupings of bytes, are generally stored in a single memory location, but to make use of them you generally have to assemble them into a single bit pattern and there are two ways of doing this – big first or little first. For example, consider:
myBytes=bytes([0xAA,0x55])
as a possible representation of a two-byte integer. Our two choices are to take the first element as the most significant byte:
myBytes[0]+myBytes[1] = 0xAA55
this is big endian or we could take the last element as the most significant byte:
myBytes[1]+myBytes[0] = 0x55AA
which is little endian. You can see that the selection of big or little endian produces two very different integer values and two very different bit patterns.
The endian problem occurs whenever you have to put a sequence of bytes, or other discrete bit patterns, together to form a larger bit pattern. For example:
myBytes = bytes([0xFF,0xAA,0x55])
bits = int.from_bytes(myBytes,byteorder = 'big')
print(hex(bits))
displays:
0xffaa55
and changing to byteorder = ’little’ displays:
0x55aaff
If you want to use the byte order that the current machine uses for its memory access then specify byteorder = sys.byteorder
To convert the bignum back to a bytes object you can use the to_bytes int method:
to_bytes(length ,byteorder =,signed = False)
again you have to specify the byteorder and the number of elements in the bytearray. For example:
myBytes=bits.to_bytes(3,byteorder='big')
print(myBytes)
displays:
b'\xff\xaaU'
The need to specify the number of elements in the array is irritating because if you get it wrong and the integer cannot be represented in the number of elements it generates an exception. To generate as many elements as needed you can use the int method bit_length that returns the number of bits stored in the bignum. To convert this into the number of bytes needed to accommodate this number of bits we can use:
(bit_length()+7)//8
Using this we can rewrite the previous example as:
myBytes = bits.to_bytes((bits.bit_length()+7)//8, byteorder = 'big')
Finally we have to deal with the problem of negative values. In most cases you can ignore this because you are only interested in working with bit patterns and, in general, bit patterns are usually extended using zero bits. The only time this is not the case is if the bit pattern really is an integer value in two’s complement form.
When converting from bytes to bignums, setting the signed parameter to True has the same effect as putting a minus sign in front of the value, i.e. it sets the sign bit to 1. As a side effect it will also appear to remove any leading ones from the value as these are treated as negative sign bits. For example:
myBytes=bytes([0xFF,0xAA,0x55])
bits=int.from_bytes(myBytes,byteorder='big',signed=True)
print(hex(bits))
displays:
-0x55ab
which, in two's complement, is equivalent to:
FFFF AA55
with as many leading ones as required by the operation. Notice that the bit pattern isn’t actually changed when stored in the bignum, it simply sets the sign bit.
Going the other way, from an integer to a bytes object works in much the same way, but if you try to convert a negative integer without signed = True an exception occurs because negative integers have to be treated as two's complement. For example:
bits=-1
myBytes=bits.to_bytes(1,byteorder='big',signed=True)
print(myBytes.hex())
displays ff as -1 is ff in two's complement.
In most cases when doing byte manipulation you can ignore problems with negative numbers because you can treat everything as positive integers.
|