Best practice: working with path in Python
Same problem: list of folders and drives
IN last article we used a recursive function of less than 10 lines to solve the problem of scanning folders and ranking files by modified date and size.
Now I will raise the bar and show you how you could have done better.
Combining paths with Pathlib
Old ideas in a new guise?
The previous solution with path joining looked like this:
path_file = os.sep.join([path_dir, filename])
The advantage of this approach is that the solution is operating system agnostic, and you do not need to add strings using the + operator or formatting.
However, it is possible to make a mistake here, for example, inadvertently or mistakenly specifying the path to a directory with a closing separator.
path_dir: str = r"C:/Users/sselt/Documents/blog_demo/" # abschließender Trenner
filename: str = "some_file"
path_file = os.sep.join([path_dir, filename])
# C:/Users/sselt/Documents/blog_demo/some_file
Although this example shows working code, an incorrect delimiter will result in an error when calling this path. And such errors can occur whenever users far from the code operate on paths in configuration files, regardless of conventions.
Python 3.4 has a better solution – module pathlib
… It handles the functions of the module files and folders os using an object-oriented approach.
Let me remind you that the old version looked like this:
import os
path = "C:/Users/sselt/Documents/blog_demo/"
os.path.isdir(path)
os.path.isfile(path)
os.path.getsize(path)
And here’s an alternative:
from pathlib import Path
path: Path = Path("C:/Users/sselt/Documents/blog_demo/")
path.is_dir()
path.is_file()
path.stat().st_size
Both options give the same result. So why is the second option better?
Object oriented and more robust
Calls are mostly object oriented, whether you like it or not, but personally I like this approach. Here we have an object like definition path
which has attributes and methods.
However, the example with overloading operators is more interesting in this case:
filename: Path = Path("some_file.txt")
path: Path = Path("C:/Users/sselt/Documents/blog_demo")
print( path / filename )
# C:UserssseltDocumentsblog_demosome_file.txt
At first, the separation into two paths seems unacceptable. However, the object path
has been overwhelmed to work as a unified path.
In addition to this syntactic sugar, objects path
will catch other common errors:
filename: Path = Path("some_file.txt")
# hier path mit überflüssigem Trenner am Schluss
path: Path = Path("C:/Users/sselt/Documents/blog_demo/")
# hier path mit doppeltem Trenner
path: Path = Path("C:/Users/sselt/Documents/blog_demo//")
# hier path völlig durcheinander
path: Path = Path("C:\Users/sselt\Documents/blog_demo") # hier ein wilder Mix
# alle Varianten führen zum selben Ergebnis
print(path/filename)
# C:UserssseltDocumentsblog_demosome_file.txt
This option is not only nicer, but also more resistant to incorrect input data. In addition to other benefits, the code is also not tied to a specific operating system. It only defines a generic object path
which is declared in Windows system as WindowsPath
and in Linux like PosixPath
…
Most functions that expect a string as a path can work directly with the path. In rare cases, you may need to modify an object simply with str(Path)
…
Path handling with os.walk
In my last article, I used os.listdir
, os.path.isdir
and a recursive function to iterate over the path tree and delimit files and folders.
But os.walk
offers a better solution. This method does not create a list, but an iterator that can be called line by line. As a result, we will get the corresponding path to the folder and a list of all files in this path. The whole process is recursive, so you get all the files in one call.
Better solution with os.walk and Pathlib
If you combine the above two methods, you get a solution that is simpler, completely independent of the operating system, resistant to incorrect path formats, and without explicit recursion:
filesurvey = []
for row in os.walk(path): # row beinhaltet jeweils einen Ordnerinhalt
for filename in row[2]: # row[2] ist ein tupel aus Dateinamen
full_path: Path = Path(row[0]) / Path(filename) # row[0] ist der Ordnerpfad
filesurvey.append([path, filename, full_path.stat().st_mtime, full_path.stat().st_size])
If you manage to improve this option, feel free to tell me about it. I would love your feedback!
The first part of the article can be found here…
Translation of the article was prepared on the eve of the start of the course “Python Developer. Basic “…
We also invite everyone to take part in a free demo lesson of the course on the topic “Three whales: map (), filter () and zip ()”…
Can you write code that requires loops but no loops? Could it be faster than if we were using loops in Python? To implement the plan, you need to know the words “callback”, “iterator” and “lambda”. If interesting – join us!