Programmer's Python Data - Text Files & CSV
Written by Mike James   
Tuesday, 10 June 2025
Article Index
Programmer's Python Data - Text Files & CSV
Text Formats
The CSV Module
CSV Dialects

Files are fundamental to computing and text files are human readable - most of the time. Find out how to understand and work with CSV files in this extract from Programmer's Python: Everything is Data.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
       Extract Naive Dates
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
       Extract Simple Regular Expressions 
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
       Extract Stacks, Queues and Deques
      
    Extract Named Tuples and Counters
  12. Bits & Bit Manipulation
       Extract Bits and BigNum 
  13. Bytes
       Extract Bytes And Strings
       Extract Byte Manipulation 
  14. Binary Files
       Extract Files and Paths 
  15. Text Files
       Extract Text Files & CSV ***NEW!!!
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

While text mode is usually regarded as the simpler option for using files, there are arguments that it is the more complex due to the variations that are possible in data representation and meaning. It is also the case that most people suggest that you should use text mode for your custom files because they are human readable and editable using nothing but a text editor. This is an advantage over binary format files, but it also makes it possible for users to attempt to manually modify files, often with unexpected outcomes and errors. It also used to be argued that binary files were better because they were more compact and hence faster to work with and used less space. Today this is hardy an advantage with storage no longer being in short supply.

The current situation is that text files do have the advantage of being human readable and editable, but this isn’t always desirable. Binary files for internal consumption still have advantages and, even if you don’t create them yourself, you cannot avoid encountering them.

If you do decide to use text files to store data you have the problem of extracting the data and converting it to internal data types. In most cases this requires the file to have a fixed format so that you can parse it. What this means is that discussing text files leads on naturally to the consideration of standard data file formats and in this chapter we also look at CSV, JSON, XML and pickle.

Opening a Text File

A text file is nothing more than a binary file that is treated as if it was an encoding of a text string. You can achieve the same result using a binary file and explicit calls to decode/encode, but opening a file in text mode performs this automatically and the read and write both work in terms of Python strings.

If you open a file in text mode you have to specify the encoding in use, i.e. how the bytes in the file represent the Unicode text. You have to specify the encoding parameter in the call to open.

For example:

open(path, mode='rt', encoding=”utf8”)

or

path.open(mode='rt', encoding=”utf8”)

to work with UTF-8 encoding.

Once the file has been opened in text mode the write method accepts a string and the read method returns a string. It really is this simple – the string is decoded to a UTF-8 bytes stream when written to the file and encoded back to a Unicode string when read from the file, for example:

with path.open(mode="wt") as f:
    f.write("Hello World")
with path.open(mode="rt") as f:
    myString=f.read()
print(myString)

In this case we write a string to the file and then read the entire file back – which is simply the string we wrote. In general, there will be a conversion between representations and as such there is the possibility that the conversion cannot be performed, i.e. that a character in one of the encodings cannot be represented in the other. In this case by default you will generate a ValueError exception. You can control what happens by setting the errors parameter in the open function which works in exactly the same way as in decode/encode, see Chapter 6.



Last Updated ( Tuesday, 10 June 2025 )