The Python bytearray() function converts strings or collections of integers into a mutable sequence of bytes. It provides developers the usual methods Python affords to both mutable and byte data types. Python’s bytearray() built-in allows for high-efficiency manipulation of data in several common situations.
bytearray() function’s powerful features do come with some responsibility. Developers must be mindful of encodings, be aware of source data format, and have a basic working knowledge of common character sets like ASCII.
In this article, you’ll learn the rationale, common use cases, advanced use cases, and potential pitfalls of Python’s
bytearray() built-in function. There’s even a section with some notable quirks at the bottom.
TL;DR — Python’s
bytearray() function converts strings and sequences of integers into bytes to provide developers the ability to efficiently update (mutate) data without additional memory allocation.
# Define a string x = "Stay gold Ponyboy." # Try to mutate it x[5:9] = 'silver' # Throws the following error >>> TypeError: 'str' object does not support item assignment # Define a byte array y = bytearray(x, encoding='ascii') # Try to mutate it y[5:] = bytearray('silver Ponyboy.', encoding='ascii') # Print the resulting type, decoded # to an ascii-string print(y.decode(encoding='ascii')) # Results in mutated string >>> Stay silver Ponyboy.
bytearray object was introduced in Python 2 and has served as a ‘mutable counterpart‘ to the bytes object since. Unlike the
bytes object, the
bytearray object doesn’t come with a literal notational representation and must be instantiated directly.
b'string' provides literal representation for a bytes object whereas developers must use the syntax
bytearray('string', encoding='ascii') to create a new
bytearray object. I regard this as only a minor inconvenience given the benefits Python’s
bytearray data type provides developers.
Data like strings are regarded as immutable (more on that below) meaning they can’t be changed directly. To change (mutate) a string, developers must copy the string and assigned the new version to an object located in memory. Consider the following example:
# Create a string a = "string one" print(a, "-", id(a)) >>> string one - 3236722728688 # 'mutate' string a = "string two" print(a, "-", id(a)) >>> string two - 3236720675504
Note that our string has been changed but now has a different id (memory location.) That means that a is now a different object in memory and our original object is garbage. That’s all well-and-good when one is changing around a few strings.
Such easy string manipulation is one of the language features that make Python such a popular programming language. Unfortunately, this approach isn’t efficient for changing large numbers of such data. That’s where the power of the
bytearry() function shines.
The official Python documentation describes the intended usage of bytearry() as follows:
The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Bytearray Operations.
The reference to the Bytes and Bytearray Operations documentation should not be passed over either. That reference provides the following elaboration:
Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data.
This will make more sense in just a bit when we take a look at cases where the
bytearray() function displays escaped character sequences. For now, let’s consider some common cases and basic syntax!
Basic Use & Syntax
bytearray() function returns a new array of bytes. This can result from an input string or a collection of integers in the range 0-255 (inclusive). If you are familiar with binary notation you may have already guessed that makes perfect accommodation for eight-bit representations.
bytearray() takes three possible arguments:
- source – the data to be converted to an array of bytes, either a string or a collection of integers.
- encoding – required for strings, optional for collections of integers
- errors – an optional parameter to specify error handling. See here for options.
Let’s take the
bytearray() function for a quick spin to demonstrate the basic use of these arguments:
# create a bytearray from a string >>> bytearray("byte-sized") TypeError: string argument without an encoding
Well, that didn’t go well. Bytes are essential just numerical representations of information. Without knowing how to interpret that information, Python will make a fuss. Arguably, a default encoding could be used but that would cause another set of problems. Let’s see what happens when we let Python know how to interpret the data:
# create a bytearray from a string, specify encoding >>> bytearray("Byte-sized", encoding='ascii') bytearray(b'Byte-sized')
Now we’ve got a
bytearray object in memory signifying as much by showing a bytes literal within parenthesis (more on that below.) Let’s take a look at how the
bytearray() function handles a series of integer values. Remember, only values in the range of
0 < val < 255 are considered valid here.
# creat a bytearry of integers from ordinal values of a string >>> bytearray([ord(x) for x in "Byte-sized"], encoding='ascii') TypeError: encoding without a string argument
Well, that didn’t go well either. When given a series of numbers, Python doesn’t need to know the encoding because it limits the values to 0-255. The character representation of any value in this range is assumed to be of the ISO8859-1 standard, more commonly known as the Latin-1 character set (a superset of the ASCII encoding.) Let’s try this again:
# creat an array of ints from a string nums = [ord(x) for x in "Byte-sized"] [66, 121, 116, 101, 45, 115, 105, 122, 101, 100] # Instantiate a byte array >>> bytearray(nums) bytearray(b'Byte-sized')
Now we’ve got a
bytearray object from a list of integer values! But what can we do with this? This is where the
bytearray() function starts demonstrating its utility as a mutable object. We’ll cover what mutable vs. immutable means in just a second. Before we get there, however, let’s stop for a second and consider what the difference between a byte string and a byte array might be.
Byte Strings vs. Byte Arrays
At this point, you may be wondering what the difference between a sequence of bytes and a collection of bytes might be. In a word—mutability. Both represent data in binary format, both can easily handle two-digit hexadecimal inputs, and both rely on developers to be mindful of encodings to ensure proper representation of numerical values and the resulting translations. So what gives?
Bytes are treated with more finality than
bytearray objects. That is, they are suited for applications where one doesn’t need to change the data often. For example, a bytes object would be better suited for storing data received from a network transmission. For large transmissions, maybe even as an array of bytes (but not
bytearray object is better suited for situations where frequent manipulation of data is expected. For example, the updating of data received from a network transmission before retransmission.
bytearray objects also provide methods afforded to mutable objects such as sizing, slicing, member assignment, insertion, reversal, and many more.
To fully appreciate the difference between Python’s bytes and
bytearay objects we’ll need to consider the difference between mutable vs. immutable data types. Don’t worry, it’s a fairly simple concept and with a basic example one that’s easy to grasp!
Mutable vs. Immutable
In Python, strings, numbers, and tuples are immutable—meaning their values can’t be changed. They can be copied or reassigned but not manipulated directly. Data structures like lists, dictionaries, and sets are mutable and allow much more efficient manipulation of their constituent parts.
Specifically, additional memory allocation is minimized when updating part of data contained in the middle of a list whereas updating the middle of a string would require a near-doubling. This describes how byte arrays can be applied in representing immutable data types to improve efficiency in certain contexts.
Let’s review our original TL;DR example from above, this time with a little more explanation:
# Define a string x = "stay gold Ponyboy." # Try to mutate it x[5:9] = 'silver' >>> TypeError: 'str' object does not support item assignment # Define a byte array y = bytearray(x, encoding='ascii') # Try to mutate it y[5:] = bytearray('silver Ponyboy.', encoding='ascii') >>> bytearray(b'stay silver Ponyboy.') # assign new value to z, check type z = y.decode(encoding='ascii') z, type(z) >>> (stay silver Ponyboy., <class 'str'>
Here we’ve shown that trying to directly change the text of our string throws a
TypeError. Storing that string as a byte array allows direct manipulation of the individual character values. After manipulation, we can then convert the resulting values to a string via the
decode() method—ensuring to specify the proper encoding.
Is this an example of efficient use of the
bytearray() function? Probably not. Imagine having to update large sequences of textual data, such as in many database applications or large network data consumptions. The ability to manipulate data without allocating additional memory would be essential. That’s where byte arrays really shine.
TL;DR – representing a string as an array of bytes allows one to manipulate it more efficiently. There is an initial overhead cost in memory but a decreasing cost as manipulative operations amass. Always mind your encoding.
bytearray() function deals with some pretty nuanced utility. This makes no surprise of the sometimes spooky behavior exhibited by this built-in. Below are a few cases where the
bytearray() function might behave in ways that might need some explanation. These certainly aren’t bugs but should be kept in mind nonetheless.
Escaped Character Sequences
bytearray() restriction of valid integer arguments to a range of
0-255 one might expect some quicky behavior to arise. A
ValueError gets thrown anytime a non-compliant value is attempted—but that’s expected. What’s quirky is how Python will represent ordinal values for non-printable characters. Consider the following:
# create an array of nums >>> nums = [1, 2, 3, 4, 5] # Create bytearry by reference >>> bytearray(nums) bytearray(b'\x01\x02\x03\x04\x05') # Declare a bytearray of list literal >>> bytearray([1, 2, 3, 4, 5]) bytearray(b'\x01\x02\x03\x04\x05') # Declare a bytearray with some known printable ASCII values print(bytearray([65, 1, 97, 2, 66, 3, 98])) bytearray(b'A\x01a\x02B\x03b')
bytearry() will display escaped characters regardless of how an array of integers is passed in (
\x prefix.) However, note that the characters with integer values matching ordinal values of printable characters are represented as that character. In other words: 1 is displayed as
65 is displayed as
A since it’s the ordinal value of a printable character. Check out the article on Python’s ord() function to get a feel for that interoperability.
bytearray() can be initialized as an empty array or one with default values by passing a non-iterable integer argument. Consider the following:
# Create an empty bytearray >>> bytearray() bytearray(b'') # Create a byte array of size 5 >>> bytearray(5) bytearray(b'\x00\x00\x00\x00\x00') # Create a byte array with single # element 5 >>> bytearray() bytearray(b'\x05')
Note here that passing the integer value of
5 as an argument initializes the
bytearray() object with five values of
null—represented by the escaped character sequence
\x00. The documentation covering valid
bytearray() arguments isn’t entirely clear on this point in my opinion and can lead to some confusion. Note the last example places the 5 in Python’s bracketed list notation. This results in a
bytearray() object of length one with the single escaped character sequence of
\x05. Again, not a bug but certainly quirky. Note:
null is not the same as
bytearray() function provides an easy-to-use utility for memory-efficient manipulation of data. Developers of network applications, large string-processing use cases, and programs that encode and decode large swaths of data can all stand in admiration of its high-level accessibility of low-lever data.
Python certainly isn’t the canonical language for developing memory-efficient applications—at least not directly. The
bytearray() function offers the utility that makes a case that, while not the canonical choice, Python is certainly up to the task if needed. For more insight into the handling of binary data check out the article What’s a Byte Stream Anyway?