Programmer's Python Data

Programmer's Python Data - Native Code

Written by Mike James

Monday, 20 March 2023

Article Index
Programmer's Python Data - Native Code
Marshaling
Complex Data Types
Unicode
Unions

Page 5 of 5

Working with C structs is the most challenging part of working with C from Python, but the Structure ctype makes things easier than you might expect. You can create arrays of Structures and you can include other Structures as fields within a new Structure. It all works and to understand it all you have to keep in mind is that the Structure class simply converts each of the fields into a byte sequence and stores it in its buffer. The buffer is then transferred to the C program, i.e. passed by value, for it to work with.

One complication that you need to be aware of is that the fields in a struct often need to be padded with additional bytes to make them line up on address boundaries. The ctypes module will make use of the default alignments and byte order, but you can override this if you need to. However, this isn’t a common requirement. All that really matters is that you realize that the byte sequence produced by ctypes Structure might not be what you expect due to additional padding bytes being added.

A more common complication is the need to pass the entire struct by reference, that is pass a pointer to a struct. In this case the C function would read:

__declspec(dllexport) float updateScore(struct Person*);
float updateScore(struct Person *p)
{
    p->name[0]='m';
    p->score=p->score+1.0;
    return p->score;
}

and the only change to the Python program is the need to pass the Structure by reference:

me=Person(b"Mike",42,3.4)
lib.updateScore.restype=ctypes.c_float
s=lib.updateScore(ctypes.byref(me))

If you run the modified program you should find the same results.

As well as Structure, you can use Union to define storage that can be treated as one of a number of possible structures. This is one of the many ways that C reuses memory in different forms. Essentially, a Union follows the same form as a Structure, but each of its fields is a different data type and all of the data types are stored in the same area of memory since all of the Union fields share the same memory. The area of memory used is the size of the largest of the data types and, of course, only one of the data types can be stored at any given time. The idea is the same block of memory and hence the same variable and can be used for any of the data types. Which data type you get depends on which field you reference and this also gives us a way to convert between representations. For example we can define a Union that has an int and a float field:

class Conv(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
              ("float",ctypes.c_float)]

Both c_int32 and c_float use four bytes so the Union is four bytes in size. You can now create an instance and store a value in one of the fields:

conv=Conv()
conv.integer=42
print(conv.integer)
print(conv.float)

displays:

42
5.885453550164232e-44

In other words, the bit pattern for 42 represents a very small floating point number.

Usually unions are not used for representation conversion, but because a function can process either a datatype or a Structure a Union can support a field with multiple meanings. For example, if you have a Union that can store either an integer or a float score:

class Score(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
                ("float",ctypes.c_float)]

then you can create a Person structure that has a score that is either integer or float:

class Person(ctypes.Structure):
    _fields_= [("name",ctypes.c_char_p),
               ("id",ctypes.c_int),
               ("score",Score)]

You can use this to store either sort of score:

me=Person(b"Mike",42,Score(float=3.14))
print(me.score.float)

There has to be some way for the program to know which interpretation to use and this is usually based on some other field in the structure. For example perhaps people with ids>100 have float scores.

In general, you don’t have to implement a Union to call a C function that uses one. The reason is that for a given call only one of the types defined in the Union is used. In this case, you can simply pass the type that is going to be used.

For example, suppose we have the C struct:

struct Person{
    char *name;
    int  id;
    union{int myinteger;
          float myfloat;
          } score;
     };

You can see that the score field is now a union of an int and a float. We could use the previous ctypes Structure which included a Union, but as we know that the function we are calling is going to treat Person.score as a float, we can use a Structure definition without a Union:

class Score(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
                ("float",ctypes.c_float)]
me=Person(b"Mike",42,3.14)
print(me.score)

This works and it works even if the C function decides to treat the float as an int. The result might not make sense, but all the C function needs is four bytes.

There are a few more variations in ways of using structures, but this covers most of the things you will encounter. One complication that is worth looking at is how to define structures that contain pointers to structures.

In chapter but not in this extract

Pointers
Callbacks
Memory Manipulation
Error Handling
Calling System Functions
Windows
Linux

The General Approach

In practice working out how to call a function and how to process its return value is a matter of stepwise refinement. The definitions of functions given in the C documentation generally contain types and macros that are not defined at the same location. In fact, they are sometimes not even defined in the documentation because they are simply declared as system-dependent. You can generally work out what they are by searching the documentation, but if this fails locating and reading the C header files where they are defined is the surest way to a correct definition.

Even when you know the definition, C programmers are capable of creating data types that can be baffling to non-C programmers – pointers to pointers, arrays of pointers, arrays of structs that contain pointers, and so on. In all cases the key idea is that even the most complex type definition has to reduce to a sequence of bytes with a very simple meaning. Considering what the memory layout should be and what each group of bytes means is the surest way of making the function call work.

Summary

Python often needs to use existing code written in other languages, usually C and this is achieved using the ctypes module.
The code that the ctypes module works with is stored in a shared library – a DLL under Windows and a .so file under Linux.
Before you can use a shared library you have to load it using one of the library classes. If the library loads successfully, you can use the functions it exports via the new attributes of the class.
The ctypes module provides a range of classes that connect Python data types to C data types.
The Python data is usually stored in the value attribute and the C data is stored in an internal buffer ready to be sent to any C functions you call.
As well as the basic data types, ctypes also provides ways of creating classes that wrap C arrays, strings, structs and unions.
Arrays in C are pointers and to pass a ctypes array to a C function you need to use byref.
Working with strings in C is particularly challenging because C doesn’t automatically handle Unicode. You have to select between char and wchar_t and implement your own encoding/decoding technique.
Structures are passed by value, but you can pass fields that are pointers or a pointer to the entire structure.
You can also create Pointers to pass the address of buffers to C functions.
Callbacks are pointers to functions which the C function can call at a later time.
Under Windows, C errors are converted to Python exceptions.
As well as calling custom C functions, you can also call system-provided functions and here the problem is usually trying to work out the data types in use.
In principle, you can always arrange to call a C function from Python because its parameters are always nothing but byte sequences. As long as you construct a meaningful byte sequence, you can pass it to the function.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

Python – A Lightning Tour
The Basic Data Type – Numbers
Extract: Bignum
Truthy & Falsey
Dates & Times
Extract Naive Dates
Sequences, Lists & Tuples
Extract Sequences
Strings
Extract Unicode Strings
Regular Expressions
Extract Simple Regular Expressions
The Dictionary
Extract The Dictionary
Iterables, Sets & Generators
Extract Iterables
Comprehensions
Extract Comprehensions
Data Structures & Collections
Extract Stacks, Queues and Deques
Extract Named Tuples and Counters
Bits & Bit Manipulation
Extract Bits and BigNum
Bytes
Extract Bytes And Strings
Extract Byte Manipulation
Binary Files
Extract Files and Paths ***NEW!!!
Text Files
Creating Custom Data Classes
Extract A Custom Data Class
Python and Native Code
Extract Native Code
Appendix I Python in Visual Studio Code
Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<< Prev - Next

Last Updated ( Wednesday, 22 March 2023 )

In chapter but not in this extract

The General Approach

Summary

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

Contents

Related Articles

Comments

In chapter but not in this extract

The General Approach

Summary

Programmer's PythonEverything is Data

Is now available as a print book: Amazon

Contents

Related Articles

Comments

Programmer's Python
Everything is Data