Extracting a filename from a path in python is simple and can be achieved using only the standard library. However, there are some quirks that one needs to be aware of to ensure predictable functionality as well as cross-platform interoperability.
Python’s os module provides a laundry list of functions to help handle paths. The os.path.basename and os.path.split
functions both come in particularly good use when approaching such a task.
However, these functions don’t always behave the same on Windows and Linux machines. Let’s consider some common cases and where troubles might pop up.
OS Module: Common Uses+ Common Issues
Python’s os
module contains many useful path-related tools and supports cross-platform use. As noted in the os modules documentation:
Programs that import and use ‘os’ stand a better chance of being portable between different platforms.
Among the many useful path tools in the os
module are the os.path.basename
and os.path.split
functions. Each of these can return the filename from a path in Python but fail somewhat unexpectedly in some cases. In the following examples, these two functions find the filename easily:
import os # Define a test file FILE = "C:/path/to/my/file.ext" # Print example filename outputs print(os.path.basename(FILE)) print(os.path.split(FILE)) # Output >>> file.ext >>> ('C:/path/to/my', 'file.ext')
Note that the os.path.split
function returns a tuple as the (head, tail) of the path. For direct filename extraction, you would want to capture just the tail portion by either indexing into the return value as such: os.path.split(FILE)[1]
or by unpacking the values into variables like so head, tail = os.path.split(FILE)
.
Trailing Slash Issues
The first area where this approach fails is when a file path ends with a trailing slash (forward, back, double or single). Consider the code from above again, but this time with a trailing slash on the filepath string:
# Define a test file FILE = "C:/path/to/my/file.ext/" # <--note the trailing slash # Print example filename outputs print(os.path.basename(FILE)) print(os.path.split(FILE)) # Output >>> >>> ('C:/path/to/my/file.ext', '')
In the case of the first approach; os.path.basename(FILE)
the return value is blank. Trailing slashes produce empty filename values using the os
module. This output is produced on both Linux and Windows machines. This happens for both single and double (escaped) slash values. This is the first hiccup in using the os module but there are also some issues when moving between platforms.
Linux Issue with Windows-Formatted Paths
The code above also will fail on a Linux machine trying to process a Windows-formatted path. For example, the command os.path.basename("C:\\path\\to\\my\file.ext")
will produce the following output: "C:\\path\\to\\my\file.ext"
.
Similarly, using the os.path.split("C:\\path\\to\\my\\file.ext")
command will result in the following unwanted output: ('', 'C:\\path\\to\\my\\file.ext')
where the first tuple value (the head) is the empty string and the second value (the tail) is the full file path rather than the filename.
ntpath Module: Cross-Platform Fallback
Under the hood of the os
module, one will find that it actually imports ntpath
to handle path
functions on Windows machines. This can be observed by the following lines of code from the os module:
elif 'nt' in _names: name = 'nt' linesep = '\r\n' from nt import * try: from nt import _exit __all__.append('_exit') except ImportError: pass import ntpath as path import nt __all__.extend(_get_exports_list(nt)) del nt try: from nt import _have_functions except ImportError: pass
The import nt as path
is the line is most relevant to the discussion here. This instructs python to import the ntpath
module as the os.path
on Windows machines. Essentially, whether you import ntpath
or os.path
you’ll be getting the same functionality on a Windows machine.
Note: ntpath
treats both /
and \
as directory separators
What happens on a Linux machine though? Let’s look at the source code for the os
module once again, this time just above the prior code:
if 'posix' in _names: name = 'posix' linesep = '\n' from posix import * try: from posix import _exit __all__.append('_exit') except ImportError: pass import posixpath as path try: from posix import _have_functions except ImportError: pass import posix __all__.extend(_get_exports_list(posix)) del posix
The import posixpath as path
is the line of interest here if, for no other reason, than to demonstrate that python’s os.path
module operates differently between Linux and Windows machines.
Armed with that knowledge, the goal now becomes that of creating a solution to produce the desired result on both platforms—handling Windows-formatted paths on Linux and dealing with trailing slashes.
Custom ntpath Function to Extract Filename
Importing the ntpath
module explicitly seems the best approach to offering unified functionality between Windows and Linux machines. However, it’s unclear as to the best approach for dealing with the trailing slashes. Consider the following code:
import ntpath # Test path FILE = "C:\\path\\to\\my\\file\\" # print examples from filepath ntpath.split(FILE) ntpath.basename(FILE) # Returned Values (same on windows and linux) >>> ('C:\\path\\to\\my\\file', '') >>> ''
These results are identical on both Windows and Linux machines, but still produce unwanted results. However, it gets us one step closer to a unified solution.
Notice the return value of the ntpath.os.split
function returns the file path without the trailing slash as part of its tuple. Using this allows for the creation of a custom function with a simple conditional to handle all cases with ease:
import ntpath def extract_filename(filepath: str) -> str: """ Given a filepath string, extract the filename regardless of file separator type or presence in the trailing position. Args: filepath: the filepath string from which to extract the filename Returns: str object representing the file """ head, tail = ntpath.split(filepath) return tail or ntpath.basename(head) # Test paths PATHS = [ "/path/to/my/filename", "/path/to/my/filename/", "path/to/my/filename", "path/to/my/filename/" "\\path\\to\\my\\filename", "\\path\\to\\my\\filename\\", "path\\to\\my\\filename", "path\\to\\my\\filename\\" ] # Get filename for all paths print([extract_filename(p) for p in PATHS]) # Produces the following output >>> ['filename', 'filename', 'filename', 'filename', 'filename', 'filename', 'filename']
This approach handles both forward and backslashes, handles file paths with trailing slashes, and produces the same output on both Windows and Linux machines. For those interested, this code is available on GitHub.
Discussion
Python is one of my favorite languages to handle file paths with. It has several tools built-in to the standard libraries that greatly simplify creating path variables for common system paths, project paths, and system files.
As illustrated above, python makes it easy to get a filename from a path, provided you are ready to handle some quirks.
Cross-platform compatibility becomes an issue when dealing with Windows-formatted double backslashing file paths—Linux machines don’t handle those well. However, falling back on the ntpath
module and using a little bit of conditional logic can solve all such problems.