Text Wrapping in Python with the Standard textwrap Module

Python's standard text-processing ability often goes unnoticed in the shadows of more complex natural language libraries. Learn how to leverage the power of the standard textwrap library to format text like a pro!
text wrapper

Text wrapping is a useful tool when processing textual languages in applications of natural language processing, data analysis, and even art or design work. The Python textwrap module provides convenient functions to help create a text wrapping effect.

This module also provides several related functions that can be used to normalize whitespace, remove or add indentations, prefix and suffix text, and a myriad of other uses. In addition, the textwrap module features a TextWrapper class that provides similar functionality with more memory-efficient consideration.

Text Wrapping in Python

The examples in this article use text from Alexander Dumas’ The Count of Monte Cristo which is now part of the public domain and freely available from Project Gutenberg in a number of formats.  The following text comes from the first three paragraphs of the first chapter.

On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
Naples.

As usual, a pilot put off immediately, and rounding the Château d’If,
got on board the vessel between Cape Morgiou and Rion island.

Immediately, and according to custom, the ramparts of Fort Saint-Jean
were covered with spectators; it is always an event at Marseilles for a
ship to come into port, especially when this ship, like the _Pharaon_,
has been built, rigged, and laden at the old Phocee docks, and belongs
to an owner of the city.

Using the following code, I quickly got an idea of the average line length of this text.

import statistics

# Calculate the average line length
avg_length = statistics.mean([len(x) for x in text.splitlines()])

>>> 48.083333333333336

This isn’t pertinent to anything I’m doing but will help guide values passed for width during the wrapper for a better visual demonstration. After all, the effect of wrapping a 48 character wide string of text to 50 characters wouldn’t be very visually dramatic.

Standalone Functions vs. TextWrapper Class

The Python textwrap module provides a series of methods as standalone functions or as class methods via the TextWrapper class. The official documentation advises one to us the TextWrapper class as such:

” If you’re just wrapping or filling one or two text strings, the convenience functions should be good enough; otherwise, you should use an instance of TextWrapper for efficiency.”

In other words, no need to hog memory if you’re just tweaking a sentence or two. We’ll take a look at each of these types of uses starting with the TextWrapper class since several of the instance attributes defined there are relevant to the standalone functions.

Stand-Alone Functions

The following functions are provided as standalone and provide useful textual utilities. Contrary to what the module’s name might imply, these functions address many common use-cases other than wrapping lines of text in python to fixed lengths. However, that’s where we’ll start!

wrap

“Returns a list of output lines, without final newlines. If the wrapped output has no content, the returned list is empty.” — TextWrapper Documentation

This function provides the most characterizing utility of the textwrap module—by wrapping text to a limited number of characters. This function accepts either a string of text or a collection of texts, a width argument and any other instance arguments available to the TextWrapper class.

Using the sample text, the wrap function, and a width argument of 25 produces the following output:

# Wrap text to lines of 25 characters
textwrap.wrap(text=sample_text, width=25)

# The result is a list of 25-character long strings (last item may be shorter)
>>> ['On the 24th of February,', '1815, the look-out at', 'Notre-Dame de la Garde', 'signalled the three-', 'master, the _Pharaon_', 'from Smyrna, Trieste, and', 'Naples.  As usual, a', 'pilot put off', 'immediately, and rounding', 'the Château d’If, got on', 'board the vessel between', 'Cape Morgiou and Rion', 'island.  Immediately, and', 'according to custom, the', 'ramparts of Fort Saint-', 'Jean were covered with', 'spectators; it is always', 'an event at Marseilles', 'for a ship to come into', 'port, especially when', 'this ship, like the', '_Pharaon_, has been', 'built, rigged, and laden', 'at the old Phocee docks,', 'and belongs to an owner', 'of the city.']

fill

“Wraps the single paragraph in text, and returns a single string containing the wrapped paragraph” — TextWrapper Documentation

After having used the wrap function, one might consider the utility of a collection of substrings. I tend to find my final use-case to be for a block of text more often than a collection of substrings.

In other words, I want the text wrapping to result in a single block of text of shorter average line length. I tend to use something along the line of text_block = "\n".join(lines) to accomplish this goal—which is exactly what the fill function does:

# Create a wrap of 25-characters width
textwrap.fill(text=text, width=25)

# Resulting Text
On the 24th of February,
1815, the look-out at
Notre-Dame de la Garde
signalled the three-
master, the _Pharaon_
from Smyrna, Trieste, and
Naples.  As usual, a
pilot put off
immediately, and rounding
the Château d’If, got on
board the vessel between
Cape Morgiou and Rion
island.  Immediately, and
according to custom, the
ramparts of Fort Saint-
Jean were covered with
spectators; it is always
an event at Marseilles
for a ship to come into
port, especially when
this ship, like the
_Pharaon_, has been
built, rigged, and laden
at the old Phocee docks,
and belongs to an owner
of the city.

Note: The fill function takes the same arguments as the wrap function. The only difference is that it joins the resulting lines into a single block of text.

shorten

“Collapse and truncate the given text to fit in the given width.” — TextWrapper documentation

The shorten function can be regarded as an excerpt-generating function. It takes a string as an argument and truncates said string to a specific number of characters. During the process of truncation, the shorten function also normalizes whitespace such that the resulting text contains only singular space characters. Check out this ASCII lookup table for reference on whitespace character values.

The shorten function is the first example of standalone functions where keyword arguments other than text and width are notably useful. Using the placeholder argument here specifies what characters will be appended to the resulting text. Personally, I find an ellipse (…) to be visually appropriate as seen in the following example:

# Use the shorten function to create a concatenated string
textwrap.shorten(text=sample_text, width=75, placeholder="...")

# The resulting concatenated text with the ellipsis added
# via the "placeholder" argument.
>>> On the 24th of February, 1815, the look-out at Notre-Dame de la Garde...

Note: If the normalization of whitespace results in a text shorter than the specified width that text will be returned in its entirety. An all-whitespace text will return an empty string and also not add the placeholder.

dedent

“Remove any common leading whitespace from every line in text.” — TextWrapper Documentation

The dedent function took a moment to wrap my head around. Its purpose is to normalize leading whitespace for entire blocks of text — not individual lines. The common case here is for intending text found in things like method documentation, string formatting, and etc. This function won’t work with the current sample text but consider the following case of the indented docstring from the dedent source code:

def dedent(text):
    """Remove any common leading whitespace from every line in `text`.

    This can be used to make triple-quoted strings line up with the left
    edge of the display, while still presenting them in the source code
    in indented form.

    Note that tabs and spaces are both treated as whitespace, but they
    are not equal: the lines "  hello" and "\\thello" are
    considered to have no common leading whitespace.

    Entirely blank lines are normalized to a newline character.
    """
    ...

Copy/pasting this indented docstring text would result in lines of text that all shared a leading sequence of 4 single-space characters (NOT A TAB!) Manually removal of this indention is easy enough and does little to justify the dependent function’s existence.

Imagine using the docstring of every function in the standard library (for NLP analysis maybe) in a way that required removal of indentions. Manual removal would be functionally intractable. Using the dedent function would ensure that all indentions were removed during preprocessing. A quick example

# Another excerpt of our sample text, with poorly formatted entry
# using blank lines and indendation
string = """
    On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
    signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
    Naples.
    """
# Dedent the text
textwrap.dedent(text=string)
>>> 

On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
Naples.

Note this sample of Dumas’ work has a newline character at the beginning and end that are not removed. The newline character is not common to all the lines of text therefore isn’t removed during the dedent functions algorithm. Only the indented space shared by all lines is removed.

indent

“Add prefix to the beginning of selected lines in text.” — TextWrapper Documentation

The indent function works as one would expect; it adds whatever string is specified as the suffix argument to a body of text. Consider the result of using the indent function on our original sample text:

# Prefix text lines with three hyphens and a space
textwrap.indent(text=sample_text, prefix="--- ")

>>>
--- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
--- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
--- Naples.

--- As usual, a pilot put off immediately, and rounding the Château d’If,
--- got on board the vessel between Cape Morgiou and Rion island.

--- Immediately, and according to custom, the ramparts of Fort Saint-Jean
--- were covered with spectators; it is always an event at Marseilles for a
--- ship to come into port, especially when this ship, like the _Pharaon_,
--- has been built, rigged, and laden at the old Phocee docks, and belongs
--- to an owner of the city.

By default, the indent function will add the value of prefix to any line other than whitespace only lines. In other words, the prefix isn’t added to the blank lines between paragraphs in our sample text.

The indent function also allows a callable parameter to be passed in as a predicate argument. This is intended to provide logical control for which lines are given the prefix. For example, if one were to insist on prefixing blank lines as well the following syntax could be used:

# Add three hyphens and space as prefix to ALL lines including blank lines
textwrap.indent(text=sample_text, prefix='--- ', predicate=lambda line: True)

>>>
--- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
--- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
--- Naples.
--- 
--- As usual, a pilot put off immediately, and rounding the Château d’If,
--- got on board the vessel between Cape Morgiou and Rion island.
--- 
--- Immediately, and according to custom, the ramparts of Fort Saint-Jean
--- were covered with spectators; it is always an event at Marseilles for a
--- ship to come into port, especially when this ship, like the _Pharaon_,
--- has been built, rigged, and laden at the old Phocee docks, and belongs
--- to an owner of the city.

Other creative approaches might be for such cases where a line contains a series of characters such that one would want visual notification. For example, each line containing the word “and”:

# Add a leading asterisk to any line containing the word 'and'
textwrap.indent(text=sample_text, prefix='* ', predicate=lambda line: 'and' in line)

>>>
On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
* signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
Naples.

* As usual, a pilot put off immediately, and rounding the Château d’If,
* got on board the vessel between Cape Morgiou and Rion island.

* Immediately, and according to custom, the ramparts of Fort Saint-Jean
were covered with spectators; it is always an event at Marseilles for a
ship to come into port, especially when this ship, like the _Pharaon_,
* has been built, rigged, and laden at the old Phocee docks, and belongs
to an owner of the city.

One possible extension of this function I’d like to see in the future is support for conditionally choosing between multiple prefixes. For example, adding a 4-space series to indent every line without the word and or a 3-space series with a leading asterisk if so. Currently, one would be required to do something similar to this:

# Iterate each line, adding the prefix conditionally
text = ""
for line in sample_text.splitlines():
    text += "*-- " if 'and' in line else "--- "
    text += line + '\n'

>>>
--- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde
*-- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and
--- Naples.
--- 
*-- As usual, a pilot put off immediately, and rounding the Château d’If,
*-- got on board the vessel between Cape Morgiou and Rion island.
--- 
*-- Immediately, and according to custom, the ramparts of Fort Saint-Jean
--- were covered with spectators; it is always an event at Marseilles for a
--- ship to come into port, especially when this ship, like the _Pharaon_,
*-- has been built, rigged, and laden at the old Phocee docks, and belongs
--- to an owner of the city.

The end result here is a more visually balanced representation of the * denoting the inclusion of the word and within a line from our sample text.

TextWrapper Class

The TextWrapper Class is to be used for larger bodies of text where memory efficiency may be pertinent. For example, a 1 Million word corpus being pre-processed for NLP analysis. Keyword arguments passed during instantiation are available to class objects as instance attributes.  The TextWrapper class also provides wrap and fill as public methods.

Final Thoughts

The textwrap module contains some pretty basic text manipulation tools. Very rarely do I find that a text-based project can be fully addressed via this module. However, its wrap and dedent functions are both very useful and offer the benefit of being included in the standard Python library. I find myself using them frequently enough to stay familiarized with their inner workings.

If I could change anything about this module it would be having the placeholder argument named suffix, append, or end to better convey the use case. The placeholder syntax implies removal upon input—like the placeholder attribute in an HTML form field. Oddly enough, the indent method chooses to use the suffix syntax.

I would also like to see a more concise method to reveal a wrapped version of a TextWrapper object. For example, to view the text in wrapped form one must pass the text to the public method as such: text.wrap(text=self.text). I’d like to see something more along the lines of text.show() which would still confer the benefits of accessing the original text via the text.text attribute. I guess everyone is a critic!

Zack West
Entrepreneur, programmer, designer, and lifelong learner. Can be found taking notes from Mother Nature when not hammering away at the keyboard.