Programmer's Python Data - Native Code
Written by Mike James   
Monday, 20 March 2023
Article Index
Programmer's Python Data - Native Code
Marshaling
Complex Data Types
Unicode
Unions

Working with C structs is the most challenging part of working with C from Python, but the Structure ctype makes things easier than you might expect. You can create arrays of Structures and you can include other Structures as fields within a new Structure. It all works and to understand it all you have to keep in mind is that the Structure class simply converts each of the fields into a byte sequence and stores it in its buffer. The buffer is then transferred to the C program, i.e. passed by value, for it to work with.

One complication that you need to be aware of is that the fields in a struct often need to be padded with additional bytes to make them line up on address boundaries. The ctypes module will make use of the default alignments and byte order, but you can override this if you need to. However, this isn’t a common requirement. All that really matters is that you realize that the byte sequence produced by ctypes Structure might not be what you expect due to additional padding bytes being added.

A more common complication is the need to pass the entire struct by reference, that is pass a pointer to a struct. In this case the C function would read:

__declspec(dllexport) float updateScore(struct Person*);
float updateScore(struct Person *p)
{
    p->name[0]='m';
    p->score=p->score+1.0;
    return p->score;
}

and the only change to the Python program is the need to pass the Structure by reference:

me=Person(b"Mike",42,3.4)
lib.updateScore.restype=ctypes.c_float
s=lib.updateScore(ctypes.byref(me))

If you run the modified program you should find the same results.

As well as Structure, you can use Union to define storage that can be treated as one of a number of possible structures. This is one of the many ways that C reuses memory in different forms. Essentially, a Union follows the same form as a Structure, but each of its fields is a different data type and all of the data types are stored in the same area of memory since all of the Union fields share the same memory. The area of memory used is the size of the largest of the data types and, of course, only one of the data types can be stored at any given time. The idea is the same block of memory and hence the same variable and can be used for any of the data types. Which data type you get depends on which field you reference and this also gives us a way to convert between representations. For example we can define a Union that has an int and a float field:

class Conv(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
              ("float",ctypes.c_float)]

Both c_int32 and c_float use four bytes so the Union is four bytes in size. You can now create an instance and store a value in one of the fields:

conv=Conv()
conv.integer=42
print(conv.integer)
print(conv.float)

displays:

42
5.885453550164232e-44

In other words, the bit pattern for 42 represents a very small floating point number.

Usually unions are not used for representation conversion, but because a function can process either a datatype or a Structure a Union can support a field with multiple meanings. For example, if you have a Union that can store either an integer or a float score:

class Score(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
                ("float",ctypes.c_float)]

then you can create a Person structure that has a score that is either integer or float:

class Person(ctypes.Structure):
    _fields_= [("name",ctypes.c_char_p),
               ("id",ctypes.c_int),
               ("score",Score)]

You can use this to store either sort of score:

me=Person(b"Mike",42,Score(float=3.14))
print(me.score.float)

There has to be some way for the program to know which interpretation to use and this is usually based on some other field in the structure. For example perhaps people with ids>100 have float scores.

In general, you don’t have to implement a Union to call a C function that uses one. The reason is that for a given call only one of the types defined in the Union is used. In this case, you can simply pass the type that is going to be used.

For example, suppose we have the C struct:

struct Person{
    char *name;
    int  id;
    union{int myinteger;
          float myfloat;
          } score;
     };

You can see that the score field is now a union of an int and a float. We could use the previous ctypes Structure which included a Union, but as we know that the function we are calling is going to treat Person.score as a float, we can use a Structure definition without a Union:

class Score(ctypes.Union):
    _fields_=[("integer",ctypes.c_int32),
                ("float",ctypes.c_float)]
me=Person(b"Mike",42,3.14)
print(me.score)

This works and it works even if the C function decides to treat the float as an int. The result might not make sense, but all the C function needs is four bytes.

There are a few more variations in ways of using structures, but this covers most of the things you will encounter. One complication that is worth looking at is how to define structures that contain pointers to structures.

In chapter but not in this extract

  • Pointers
  • Callbacks
  • Memory Manipulation
  • Error Handling
  • Calling System Functions
  • Windows
  • Linux

The General Approach

In practice working out how to call a function and how to process its return value is a matter of stepwise refinement. The definitions of functions given in the C documentation generally contain types and macros that are not defined at the same location. In fact, they are sometimes not even defined in the documentation because they are simply declared as system-dependent. You can generally work out what they are by searching the documentation, but if this fails locating and reading the C header files where they are defined is the surest way to a correct definition.

Even when you know the definition, C programmers are capable of creating data types that can be baffling to non-C programmers – pointers to pointers, arrays of pointers, arrays of structs that contain pointers, and so on. In all cases the key idea is that even the most complex type definition has to reduce to a sequence of bytes with a very simple meaning. Considering what the memory layout should be and what each group of bytes means is the surest way of making the function call work.

 

Summary

  • Python often needs to use existing code written in other languages, usually C and this is achieved using the ctypes module.

  • The code that the ctypes module works with is stored in a shared library – a DLL under Windows and a .so file under Linux.

  • Before you can use a shared library you have to load it using one of the library classes. If the library loads successfully, you can use the functions it exports via the new attributes of the class.

  • The ctypes module provides a range of classes that connect Python data types to C data types.

  • The Python data is usually stored in the value attribute and the C data is stored in an internal buffer ready to be sent to any C functions you call.

  • As well as the basic data types, ctypes also provides ways of creating classes that wrap C arrays, strings, structs and unions.

  • Arrays in C are pointers and to pass a ctypes array to a C function you need to use byref.

  • Working with strings in C is particularly challenging because C doesn’t automatically handle Unicode. You have to select between char and wchar_t and implement your own encoding/decoding technique.

  • Structures are passed by value, but you can pass fields that are pointers or a pointer to the entire structure.

  • You can also create Pointers to pass the address of buffers to C functions.

  • Callbacks are pointers to functions which the C function can call at a later time.

  • Under Windows, C errors are converted to Python exceptions.

  • As well as calling custom C functions, you can also call system-provided functions and here the problem is usually trying to work out the data types in use.

  • In principle, you can always arrange to call a C function from Python because its parameters are always nothing but byte sequences. As long as you construct a meaningful byte sequence, you can pass it to the function.

 

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
       Extract Naive Dates ***NEW!!!
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
       Extract Stacks, Queues and Deques
      
    Extract Named Tuples and Counters
  12. Bits & Bit Manipulation
       Extract Bits and BigNum 
  13. Bytes
       Extract Bytes And Strings
       Extract Byte Manipulation 
  14. Binary Files
  15. Text Files
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Related Articles

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

espbook

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner

 



Last Updated ( Wednesday, 22 March 2023 )