Best practice: working with path in Python
Same problem: list of folders and drives
IN last article we used a recursive function of less than 10 lines to solve the problem of scanning folders and ranking files by modified date and size.
Now I will raise the bar and show you how you could have done better.
Combining paths with Pathlib
Old ideas in a new guise?
The previous solution with path joining looked like this:
path_file = os.sep.join([path_dir, filename])
The advantage of this approach is that the solution is operating system agnostic, and you do not need to add strings using the + operator or formatting.
However, it is possible to make a mistake here, for example, inadvertently or mistakenly specifying the path to a directory with a closing separator.
path_dir: str = r"C:/Users/sselt/Documents/blog_demo/" # abschließender Trenner filename: str = "some_file" path_file = os.sep.join([path_dir, filename]) # C:/Users/sselt/Documents/blog_demo/some_file
Although this example shows working code, an incorrect delimiter will result in an error when calling this path. And such errors can occur whenever users far from the code operate on paths in configuration files, regardless of conventions.
Python 3.4 has a better solution – module
pathlib… It handles the functions of the module files and folders os using an object-oriented approach.
Let me remind you that the old version looked like this:
import os path = "C:/Users/sselt/Documents/blog_demo/" os.path.isdir(path) os.path.isfile(path) os.path.getsize(path)
And here’s an alternative:
from pathlib import Path path: Path = Path("C:/Users/sselt/Documents/blog_demo/") path.is_dir() path.is_file() path.stat().st_size
Both options give the same result. So why is the second option better?
Object oriented and more robust
Calls are mostly object oriented, whether you like it or not, but personally I like this approach. Here we have an object like definition
pathwhich has attributes and methods.
However, the example with overloading operators is more interesting in this case:
filename: Path = Path("some_file.txt") path: Path = Path("C:/Users/sselt/Documents/blog_demo") print( path / filename ) # C:UserssseltDocumentsblog_demosome_file.txt
At first, the separation into two paths seems unacceptable. However, the object
path has been overwhelmed to work as a unified path.
In addition to this syntactic sugar, objects
path will catch other common errors:
filename: Path = Path("some_file.txt") # hier path mit überflüssigem Trenner am Schluss path: Path = Path("C:/Users/sselt/Documents/blog_demo/") # hier path mit doppeltem Trenner path: Path = Path("C:/Users/sselt/Documents/blog_demo//") # hier path völlig durcheinander path: Path = Path("C:\Users/sselt\Documents/blog_demo") # hier ein wilder Mix # alle Varianten führen zum selben Ergebnis print(path/filename) # C:UserssseltDocumentsblog_demosome_file.txt
This option is not only nicer, but also more resistant to incorrect input data. In addition to other benefits, the code is also not tied to a specific operating system. It only defines a generic object
pathwhich is declared in Windows system as
WindowsPathand in Linux like
Most functions that expect a string as a path can work directly with the path. In rare cases, you may need to modify an object simply with
Path handling with os.walk
In my last article, I used
os.path.isdir and a recursive function to iterate over the path tree and delimit files and folders.
os.walk offers a better solution. This method does not create a list, but an iterator that can be called line by line. As a result, we will get the corresponding path to the folder and a list of all files in this path. The whole process is recursive, so you get all the files in one call.
Better solution with os.walk and Pathlib
If you combine the above two methods, you get a solution that is simpler, completely independent of the operating system, resistant to incorrect path formats, and without explicit recursion:
filesurvey =  for row in os.walk(path): # row beinhaltet jeweils einen Ordnerinhalt for filename in row: # row ist ein tupel aus Dateinamen full_path: Path = Path(row) / Path(filename) # row ist der Ordnerpfad filesurvey.append([path, filename, full_path.stat().st_mtime, full_path.stat().st_size])
If you manage to improve this option, feel free to tell me about it. I would love your feedback!
The first part of the article can be found here…
Translation of the article was prepared on the eve of the start of the course “Python Developer. Basic “…
We also invite everyone to take part in a free demo lesson of the course on the topic “Three whales: map (), filter () and zip ()”…
Can you write code that requires loops but no loops? Could it be faster than if we were using loops in Python? To implement the plan, you need to know the words “callback”, “iterator” and “lambda”. If interesting – join us!