Programmer's Python Data - Files and Paths |
Written by Mike James | ||||
Monday, 17 February 2025 | ||||
Page 2 of 3
Opening FilesAt the most fundamental all files are binary files in the sense that the data that they store is just bytes and interpreting what the bytes mean is up to you. A text file is just a binary file where the bytes are interpreted as encoded text using some specific encoding. Text files are generally considered easier to work with and hence preferable, but these too have their difficulties and, as binary files are more fundamental, this is a good place to start. To work with a file you have to open it: open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None) where each of the parameters determines how the file is opened, the two most important being file and mode. If the open is successful it returns a file object which can be used to work with the file. File and PathThe file parameter determines which file is opened. In simple cases you can think of this as the file name with the assumption that the file is stored in the current directory. To find or set the current directory you can use the os module and the getcwd() and the setcwd(path) functions, although the new pathlib module, see later, has better alternatives. For example: import os print(os.getcwd()) open("myFile") will display the current working directory and attempt to open a file called myFile. This will fail with an exception unless myFile already exists in the directory. In general, however, the file has to be specified using a full absolute or relative path. You should be familiar with the idea of a path, but exactly what this means depends very much on the operating system you are using. At a trivial level, Windows uses a different separator to Linux but this slight difference is the cause of many problems. Python provides the pathlib module to let you work with paths in a reasonably system-independent way. At its most basic, a path is a sequence of directory names that lead you to a final target directory and then to the name of a file stored in that directory. So, for example: myDir1, myDir2, myDir3, myFile specifies that you start at myDir1, move to myDir2 stored in it, move to myDir3 stored in myDir2 and finally find myFile stored in myDir3 You can see why it is called a path because it is exactly that – a path through the directories stored on a device. Today this idea has been generalized to situations where the directories are any names that specify a location. For example, a URL is a web address which can be thought of as a path that takes you through list of names that finally specifies a particular resource. A UNC path is a list of directories starting at the name of a file share. More technically a path specifies a particular node in a tree structure by listing in order each node that you have to traverse to reach it. Of course, it all depends on the point at which the initial directory is taken to be located, i.e. where does the path start from. The convention is that if the path starts with a leading separator then it is absolute and the start is the start of the directory system or tree, wherever that might be. If the path starts with a name then it is assumed that the path is relative to the current directory, i.e. the start is the current directory. Unix/Linux systems always start from /, the root directory. Windows systems also have drive letters like A:/ to indicate which disk drive to use. There are also two special symbols . and .. The single dot means “this directory” and it is generally used to refer to the working directory or to make the fact that a path references a directory more obvious. The double dot means “the previous directory”. You can use the double dot to move back up the directory tree to an earlier directory. PathlibTo work with a path in a system-independent way you need to use the Path class from the pathlib module: pathlib.Path(pathsegments) where pathsegments is a list of names to be used in the path or of other Path objects to be joined together. Path objects are immutable and hence hashable. There is also the PurePath class that allows you to work with paths in a way that doesn’t involve the operating system, but in most cases you will need to use Path which works with a path that is specific to the system the program is running on. For example: import pathlib path=pathlib.Path("myDir1","myDir2","myDir3","myFile") print(path) displays: myDir1\myDir2\myDir3\myFile on a Windows system and myDir1/myDir2/myDir3/myFile on a Unix/Linux system. You can see that apart from the different separators there are no differences. You can also build up a path using /, the path operator: path=pathlib.Path("myDir1") path=path / "myDir2"/"myDir3"/"myFile" where the left-hand operand can be another path object or a string. Alternatively you can use the joinpath method to add additional path elements. The string representation of a Path object is the raw file system path as a string. If you include a root symbol then it will be converted to whichever system is in use. If you include a drive letter, e,g. A:/myDir, then it will be included with the correct separators under Windows or Linux but, of course, it is likely only to make sense under Windows. In most cases it is a good idea to use an absolute path that doesn’t specify a drive letter or a relative path from the current working directory. |
||||
Last Updated ( Monday, 17 February 2025 ) |