How To Get a Filename From a Path in Python

python filename extraction

Extracting a filename from a path in python is simple and can be achieved using only the standard library. However, there are some quirks that one needs to be aware of to ensure predictable functionality as well as cross-platform interoperability.

Python’s os module provides a laundry list of functions to help handle paths. The os.path.basename and os.path.split functions both come in particularly good use when approaching such a task.

However, these functions don’t always behave the same on Windows and Linux machines. Let’s consider some common cases and where troubles might pop up.

OS Module: Common Uses+ Common Issues

Python’s os module contains many useful path-related tools and supports cross-platform use. As noted in the os modules documentation:

Programs that import and use ‘os’ stand a better chance of being portable between different platforms.

Among the many useful path tools in the os module are the os.path.basename and os.path.split functions. Each of these can return the filename from a path in Python but fail somewhat unexpectedly in some cases. In the following examples, these two functions find the filename easily:

import os

# Define a test file
FILE = "C:/path/to/my/file.ext"

# Print example filename outputs
print(os.path.basename(FILE))
print(os.path.split(FILE))

# Output
>>> file.ext
>>> ('C:/path/to/my', 'file.ext')

Note that the os.path.split function returns a tuple as the (head, tail) of the path. For direct filename extraction, you would want to capture just the tail portion by either indexing into the return value as such: os.path.split(FILE)[1] or by unpacking the values into variables like so head, tail = os.path.split(FILE).

Trailing Slash Issues

The first area where this approach fails is when a file path ends with a trailing slash (forward, back, double or single). Consider the code from above again, but this time with a trailing slash on the filepath string:

# Define a test file
FILE = "C:/path/to/my/file.ext/"  # <--note the trailing slash

# Print example filename outputs
print(os.path.basename(FILE))
print(os.path.split(FILE))

# Output
>>> 
>>> ('C:/path/to/my/file.ext', '')

In the case of the first approach; os.path.basename(FILE) the return value is blank.  Trailing slashes produce empty filename values using the os module. This output is produced on both Linux and Windows machines. This happens for both single and double (escaped) slash values. This is the first hiccup in using the os module but there are also some issues when moving between platforms.

Linux Issue with Windows-Formatted Paths

The code above also will fail on a Linux machine trying to process a Windows-formatted path. For example, the command os.path.basename("C:\\path\\to\\my\file.ext") will produce the following output: "C:\\path\\to\\my\file.ext" .

Similarly, using the os.path.split("C:\\path\\to\\my\\file.ext") command will result in the following unwanted output: ('', 'C:\\path\\to\\my\\file.ext') where the first tuple value (the head) is the empty string and the second value (the tail) is the full file path rather than the filename.

ntpath Module: Cross-Platform Fallback

Under the hood of the os module, one will find that it actually imports ntpath to handle path functions on Windows machines. This can be observed by the following lines of code from the os module:

elif 'nt' in _names:
    name = 'nt'
    linesep = '\r\n'
    from nt import *
    try:
        from nt import _exit
        __all__.append('_exit')
    except ImportError:
        pass
    import ntpath as path

    import nt
    __all__.extend(_get_exports_list(nt))
    del nt

    try:
        from nt import _have_functions
    except ImportError:
        pass

The import nt as path is the line is most relevant to the discussion here. This instructs python to import the ntpath module as the os.path on Windows machines. Essentially, whether you import ntpath or os.path you’ll be getting the same functionality on a Windows machine.

Note: ntpath treats both / and \ as directory separators

What happens on a Linux machine though? Let’s look at the source code for the os module once again, this time just above the prior code:

if 'posix' in _names:
    name = 'posix'
    linesep = '\n'
    from posix import *
    try:
        from posix import _exit
        __all__.append('_exit')
    except ImportError:
        pass
    import posixpath as path

    try:
        from posix import _have_functions
    except ImportError:
        pass

    import posix
    __all__.extend(_get_exports_list(posix))
    del posix

The import posixpath as path is the line of interest here if, for no other reason, than to demonstrate that python’s os.path module operates differently between Linux and Windows machines.

Armed with that knowledge, the goal now becomes that of creating a solution to produce the desired result on both platforms—handling Windows-formatted paths on Linux and dealing with trailing slashes.

Custom ntpath Function to Extract Filename

Importing the ntpath module explicitly seems the best approach to offering unified functionality between Windows and Linux machines. However, it’s unclear as to the best approach for dealing with the trailing slashes. Consider the following code:

import ntpath

# Test path
FILE = "C:\\path\\to\\my\\file\\"

# print examples from filepath
ntpath.split(FILE)
ntpath.basename(FILE)

# Returned Values (same on windows and linux)
>>> ('C:\\path\\to\\my\\file', '')
>>> ''

These results are identical on both Windows and Linux machines, but still produce unwanted results. However, it gets us one step closer to a unified solution.

Notice the return value of the ntpath.os.split function returns the file path without the trailing slash as part of its tuple. Using this allows for the creation of a custom function with a simple conditional to handle all cases with ease:

import ntpath


def extract_filename(filepath: str) -> str:
    """
    Given a filepath string, extract the filename regardless of file separator type
    or presence in the trailing position.
    Args:
        filepath: the filepath string from which to extract the filename
    Returns:
        str object representing the file
    """
    head, tail = ntpath.split(filepath)
    return tail or ntpath.basename(head)

# Test paths
PATHS = [
    "/path/to/my/filename",
    "/path/to/my/filename/",
    "path/to/my/filename",
    "path/to/my/filename/"
    "\\path\\to\\my\\filename",
    "\\path\\to\\my\\filename\\",
    "path\\to\\my\\filename",
    "path\\to\\my\\filename\\"
]

# Get filename for all paths
print([extract_filename(p) for p in PATHS])

# Produces the following output
>>> ['filename', 'filename', 'filename', 'filename', 'filename', 'filename', 'filename']

This approach handles both forward and backslashes, handles file paths with trailing slashes, and produces the same output on both Windows and Linux machines. For those interested, this code is available on GitHub.

Discussion

Python is one of my favorite languages to handle file paths with. It has several tools built-in to the standard libraries that greatly simplify creating path variables for common system paths, project paths, and system files.

As illustrated above, python makes it easy to get a filename from a path, provided you are ready to handle some quirks.

Cross-platform compatibility becomes an issue when dealing with Windows-formatted double backslashing file paths—Linux machines don’t handle those well. However, falling back on the ntpath module and using a little bit of conditional logic can solve all such problems.

alpharithms discord banner 1
Zαck West
Entrepreneur, programmer, designer, and lifelong learner. Can be found taking notes from Mother Nature when not hammering away at the keyboard.