Text wrapping is a useful tool when processing textual languages in applications of natural language processing, data analysis, and even art or design work. The Python textwrap module provides convenient functions to help create a text wrapping effect.
This module also provides several related functions that can be used to normalize whitespace, remove or add indentations, prefix and suffix text, and a myriad of other uses. In addition, the textwrap
module features a TextWrapper
class that provides similar functionality with more memory-efficient consideration.
Text Wrapping in Python
The examples in this article use text from Alexander Dumas’ The Count of Monte Cristo which is now part of the public domain and freely available from Project Gutenberg in a number of formats. The following text comes from the first three paragraphs of the first chapter.
On the 24th of February, 1815, the look-out at Notre-Dame de la Garde signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and Naples. As usual, a pilot put off immediately, and rounding the Château d’If, got on board the vessel between Cape Morgiou and Rion island. Immediately, and according to custom, the ramparts of Fort Saint-Jean were covered with spectators; it is always an event at Marseilles for a ship to come into port, especially when this ship, like the _Pharaon_, has been built, rigged, and laden at the old Phocee docks, and belongs to an owner of the city.
Using the following code, I quickly got an idea of the average line length of this text.
import statistics # Calculate the average line length avg_length = statistics.mean([len(x) for x in text.splitlines()]) >>> 48.083333333333336
This isn’t pertinent to anything I’m doing but will help guide values passed for width during the wrapper for a better visual demonstration. After all, the effect of wrapping a 48 character wide string of text to 50 characters wouldn’t be very visually dramatic.
Standalone Functions vs. TextWrapper Class
The Python textwrap module provides a series of methods as standalone functions or as class methods via the TextWrapper
class. The official documentation advises one to us the TextWrapper
class as such:
” If you’re just wrapping or filling one or two text strings, the convenience functions should be good enough; otherwise, you should use an instance of TextWrapper for efficiency.”
In other words, no need to hog memory if you’re just tweaking a sentence or two. We’ll take a look at each of these types of uses starting with the TextWrapper class since several of the instance attributes defined there are relevant to the standalone functions.
Stand-Alone Functions
The following functions are provided as standalone and provide useful textual utilities. Contrary to what the module’s name might imply, these functions address many common use-cases other than wrapping lines of text in python to fixed lengths. However, that’s where we’ll start!
wrap
“Returns a list of output lines, without final newlines. If the wrapped output has no content, the returned list is empty.” — TextWrapper Documentation
This function provides the most characterizing utility of the textwrap module—by wrapping text to a limited number of characters. This function accepts either a string of text or a collection of texts, a width argument and any other instance arguments available to the TextWrapper class.
Using the sample text, the wrap function, and a width argument of 25 produces the following output:
# Wrap text to lines of 25 characters textwrap.wrap(text=sample_text, width=25) # The result is a list of 25-character long strings (last item may be shorter) >>> ['On the 24th of February,', '1815, the look-out at', 'Notre-Dame de la Garde', 'signalled the three-', 'master, the _Pharaon_', 'from Smyrna, Trieste, and', 'Naples. As usual, a', 'pilot put off', 'immediately, and rounding', 'the Château d’If, got on', 'board the vessel between', 'Cape Morgiou and Rion', 'island. Immediately, and', 'according to custom, the', 'ramparts of Fort Saint-', 'Jean were covered with', 'spectators; it is always', 'an event at Marseilles', 'for a ship to come into', 'port, especially when', 'this ship, like the', '_Pharaon_, has been', 'built, rigged, and laden', 'at the old Phocee docks,', 'and belongs to an owner', 'of the city.']
fill
“Wraps the single paragraph in text, and returns a single string containing the wrapped paragraph” — TextWrapper Documentation
After having used the wrap function, one might consider the utility of a collection of substrings. I tend to find my final use-case to be for a block of text more often than a collection of substrings.
In other words, I want the text wrapping to result in a single block of text of shorter average line length. I tend to use something along the line of text_block = "\n".join(lines)
to accomplish this goal—which is exactly what the fill
function does:
# Create a wrap of 25-characters width textwrap.fill(text=text, width=25) # Resulting Text On the 24th of February, 1815, the look-out at Notre-Dame de la Garde signalled the three- master, the _Pharaon_ from Smyrna, Trieste, and Naples. As usual, a pilot put off immediately, and rounding the Château d’If, got on board the vessel between Cape Morgiou and Rion island. Immediately, and according to custom, the ramparts of Fort Saint- Jean were covered with spectators; it is always an event at Marseilles for a ship to come into port, especially when this ship, like the _Pharaon_, has been built, rigged, and laden at the old Phocee docks, and belongs to an owner of the city.
Note: The fill function takes the same arguments as the wrap function. The only difference is that it joins the resulting lines into a single block of text.
shorten
“Collapse and truncate the given text to fit in the given width.” — TextWrapper documentation
The shorten
function can be regarded as an excerpt-generating function. It takes a string as an argument and truncates said string to a specific number of characters. During the process of truncation, the shorten function also normalizes whitespace such that the resulting text contains only singular space characters. Check out this ASCII lookup table for reference on whitespace character values.
The shorten
function is the first example of standalone functions where keyword arguments other than text
and width
are notably useful. Using the placeholder
argument here specifies what characters will be appended to the resulting text. Personally, I find an ellipse (…) to be visually appropriate as seen in the following example:
# Use the shorten function to create a concatenated string textwrap.shorten(text=sample_text, width=75, placeholder="...") # The resulting concatenated text with the ellipsis added # via the "placeholder" argument. >>> On the 24th of February, 1815, the look-out at Notre-Dame de la Garde...
Note: If the normalization of whitespace results in a text shorter than the specified width that text will be returned in its entirety. An all-whitespace text will return an empty string and also not add the placeholder.
dedent
“Remove any common leading whitespace from every line in text.” — TextWrapper Documentation
The dedent function took a moment to wrap my head around. Its purpose is to normalize leading whitespace for entire blocks of text — not individual lines. The common case here is for intending text found in things like method documentation, string formatting, and etc. This function won’t work with the current sample text but consider the following case of the indented docstring from the dedent source code:
def dedent(text): """Remove any common leading whitespace from every line in `text`. This can be used to make triple-quoted strings line up with the left edge of the display, while still presenting them in the source code in indented form. Note that tabs and spaces are both treated as whitespace, but they are not equal: the lines " hello" and "\\thello" are considered to have no common leading whitespace. Entirely blank lines are normalized to a newline character. """ ...
Copy/pasting this indented docstring text would result in lines of text that all shared a leading sequence of 4 single-space characters (NOT A TAB!) Manually removal of this indention is easy enough and does little to justify the dependent function’s existence.
Imagine using the docstring of every function in the standard library (for NLP analysis maybe) in a way that required removal of indentions. Manual removal would be functionally intractable. Using the dedent function would ensure that all indentions were removed during preprocessing. A quick example
# Another excerpt of our sample text, with poorly formatted entry # using blank lines and indendation string = """ On the 24th of February, 1815, the look-out at Notre-Dame de la Garde signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and Naples. """ # Dedent the text textwrap.dedent(text=string) >>> On the 24th of February, 1815, the look-out at Notre-Dame de la Garde signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and Naples.
Note this sample of Dumas’ work has a newline character at the beginning and end that are not removed. The newline character is not common to all the lines of text therefore isn’t removed during the dedent functions algorithm. Only the indented space shared by all lines is removed.
indent
“Add prefix to the beginning of selected lines in text.” — TextWrapper Documentation
The indent function works as one would expect; it adds whatever string is specified as the suffix argument to a body of text. Consider the result of using the indent function on our original sample text:
# Prefix text lines with three hyphens and a space textwrap.indent(text=sample_text, prefix="--- ") >>> --- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde --- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and --- Naples. --- As usual, a pilot put off immediately, and rounding the Château d’If, --- got on board the vessel between Cape Morgiou and Rion island. --- Immediately, and according to custom, the ramparts of Fort Saint-Jean --- were covered with spectators; it is always an event at Marseilles for a --- ship to come into port, especially when this ship, like the _Pharaon_, --- has been built, rigged, and laden at the old Phocee docks, and belongs --- to an owner of the city.
By default, the indent function will add the value of prefix to any line other than whitespace only lines. In other words, the prefix isn’t added to the blank lines between paragraphs in our sample text.
The indent function also allows a callable parameter to be passed in as a predicate
argument. This is intended to provide logical control for which lines are given the prefix
. For example, if one were to insist on prefixing blank lines as well the following syntax could be used:
# Add three hyphens and space as prefix to ALL lines including blank lines textwrap.indent(text=sample_text, prefix='--- ', predicate=lambda line: True) >>> --- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde --- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and --- Naples. --- --- As usual, a pilot put off immediately, and rounding the Château d’If, --- got on board the vessel between Cape Morgiou and Rion island. --- --- Immediately, and according to custom, the ramparts of Fort Saint-Jean --- were covered with spectators; it is always an event at Marseilles for a --- ship to come into port, especially when this ship, like the _Pharaon_, --- has been built, rigged, and laden at the old Phocee docks, and belongs --- to an owner of the city.
Other creative approaches might be for such cases where a line contains a series of characters such that one would want visual notification. For example, each line containing the word “and”:
# Add a leading asterisk to any line containing the word 'and' textwrap.indent(text=sample_text, prefix='* ', predicate=lambda line: 'and' in line) >>> On the 24th of February, 1815, the look-out at Notre-Dame de la Garde * signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and Naples. * As usual, a pilot put off immediately, and rounding the Château d’If, * got on board the vessel between Cape Morgiou and Rion island. * Immediately, and according to custom, the ramparts of Fort Saint-Jean were covered with spectators; it is always an event at Marseilles for a ship to come into port, especially when this ship, like the _Pharaon_, * has been built, rigged, and laden at the old Phocee docks, and belongs to an owner of the city.
One possible extension of this function I’d like to see in the future is support for conditionally choosing between multiple prefixes. For example, adding a 4-space series to indent every line without the word and or a 3-space series with a leading asterisk if so. Currently, one would be required to do something similar to this:
# Iterate each line, adding the prefix conditionally text = "" for line in sample_text.splitlines(): text += "*-- " if 'and' in line else "--- " text += line + '\n' >>> --- On the 24th of February, 1815, the look-out at Notre-Dame de la Garde *-- signalled the three-master, the _Pharaon_ from Smyrna, Trieste, and --- Naples. --- *-- As usual, a pilot put off immediately, and rounding the Château d’If, *-- got on board the vessel between Cape Morgiou and Rion island. --- *-- Immediately, and according to custom, the ramparts of Fort Saint-Jean --- were covered with spectators; it is always an event at Marseilles for a --- ship to come into port, especially when this ship, like the _Pharaon_, *-- has been built, rigged, and laden at the old Phocee docks, and belongs --- to an owner of the city.
The end result here is a more visually balanced representation of the * denoting the inclusion of the word and within a line from our sample text.
TextWrapper Class
The TextWrapper Class is to be used for larger bodies of text where memory efficiency may be pertinent. For example, a 1 Million word corpus being pre-processed for NLP analysis. Keyword arguments passed during instantiation are available to class objects as instance attributes. The TextWrapper class also provides wrap
and fill
as public methods.
Final Thoughts
The textwrap
module contains some pretty basic text manipulation tools. Very rarely do I find that a text-based project can be fully addressed via this module. However, its wrap
and dedent
functions are both very useful and offer the benefit of being included in the standard Python library. I find myself using them frequently enough to stay familiarized with their inner workings.
If I could change anything about this module it would be having the placeholder
argument named suffix, append, or end to better convey the use case. The placeholder
syntax implies removal upon input—like the placeholder attribute in an HTML form field. Oddly enough, the indent
method chooses to use the suffix
syntax.
I would also like to see a more concise method to reveal a wrapped version of a TextWrapper object. For example, to view the text in wrapped form one must pass the text to the public method as such: text.wrap(text=self.text)
. I’d like to see something more along the lines of text.show()
which would still confer the benefits of accessing the original text via the text.text
attribute. I guess everyone is a critic!