Page 1 of 4 Files are fundamental to computing and text files are human readable - most of the time. Find out how to understand and work with CSV files in this extract from Programmer's Python: Everything is Data.
Programmer's Python Everything is Data
Is now available as a print book: Amazon
Contents
- Python – A Lightning Tour
- The Basic Data Type – Numbers
Extract: Bignum
- Truthy & Falsey
- Dates & Times
Extract Naive Dates
- Sequences, Lists & Tuples
Extract Sequences
- Strings
Extract Unicode Strings
- Regular Expressions
Extract Simple Regular Expressions
- The Dictionary
Extract The Dictionary
- Iterables, Sets & Generators
Extract Iterables
- Comprehensions
Extract Comprehensions
- Data Structures & Collections
Extract Stacks, Queues and Deques Extract Named Tuples and Counters
- Bits & Bit Manipulation
Extract Bits and BigNum
- Bytes
Extract Bytes And Strings Extract Byte Manipulation
- Binary Files
Extract Files and Paths
- Text Files
Extract Text Files & CSV ***NEW!!!
- Creating Custom Data Classes
Extract A Custom Data Class
- Python and Native Code
Extract Native Code Appendix I Python in Visual Studio Code Appendix II C Programming Using Visual Studio Code
While text mode is usually regarded as the simpler option for using files, there are arguments that it is the more complex due to the variations that are possible in data representation and meaning. It is also the case that most people suggest that you should use text mode for your custom files because they are human readable and editable using nothing but a text editor. This is an advantage over binary format files, but it also makes it possible for users to attempt to manually modify files, often with unexpected outcomes and errors. It also used to be argued that binary files were better because they were more compact and hence faster to work with and used less space. Today this is hardy an advantage with storage no longer being in short supply.
The current situation is that text files do have the advantage of being human readable and editable, but this isn’t always desirable. Binary files for internal consumption still have advantages and, even if you don’t create them yourself, you cannot avoid encountering them.
If you do decide to use text files to store data you have the problem of extracting the data and converting it to internal data types. In most cases this requires the file to have a fixed format so that you can parse it. What this means is that discussing text files leads on naturally to the consideration of standard data file formats and in this chapter we also look at CSV, JSON, XML and pickle.
Opening a Text File
A text file is nothing more than a binary file that is treated as if it was an encoding of a text string. You can achieve the same result using a binary file and explicit calls to decode/encode, but opening a file in text mode performs this automatically and the read and write both work in terms of Python strings.
If you open a file in text mode you have to specify the encoding in use, i.e. how the bytes in the file represent the Unicode text. You have to specify the encoding parameter in the call to open.
For example:
open(path, mode='rt', encoding=”utf8”)
or
path.open(mode='rt', encoding=”utf8”)
to work with UTF-8 encoding.
Once the file has been opened in text mode the write method accepts a string and the read method returns a string. It really is this simple – the string is decoded to a UTF-8 bytes stream when written to the file and encoded back to a Unicode string when read from the file, for example:
with path.open(mode="wt") as f:
f.write("Hello World")
with path.open(mode="rt") as f:
myString=f.read()
print(myString)
In this case we write a string to the file and then read the entire file back – which is simply the string we wrote. In general, there will be a conversion between representations and as such there is the possibility that the conversion cannot be performed, i.e. that a character in one of the encodings cannot be represented in the other. In this case by default you will generate a ValueError exception. You can control what happens by setting the errors parameter in the open function which works in exactly the same way as in decode/encode, see Chapter 6.
|