Input Field Separators (IFS): Turning Strings into Words

Ever wondered how your favorite shells know how to read data word-by-word or line-by-line? A system variable helps control this and
bash internal field separator ifs alpharithms

Internal Field Separators (IFS) are characters or sequences or characters by which a larger sequence or pattern of characters can be tokenized. This process of tokenization is commonly done via the space, tab, and newline characters to produce distinct boundaries between unique words within a body of text.

For example, standard IFS-mediated tokenization of the character string (showing space characters explicitly) the\slazy\sbrown\sfox will result in the collection of individual words (a.k.a. tokens) [the, lazy, brown, fox] where the \s characters indicate a single space.

IFS in Bash & Other Shells

Command line interpreters or, as they are commonly referred to; shells,  provide users with the means of interacting with a system. In the specific case of bash, the string of characters IFS is a system-reserved keyword (a.k.a token) that represents the Internal Field Separator used by the bash program. The IFS is a variable in that its value can be changed. By default, it is the three-character sequence of tab, space, newline (\t, \s, \n).

To see an example of IFS in action, consider the following bash script:

#!/bin/bash

# define a string variable
string="one:two three four-five"

# iterate over each "word" as 
# determined by use of IFS
for word in $string; do

  # print each token
  echo "Word: $word"
done

Here we iterate over every “word” within the string one:two three four-five of which are determined to be separate, distinct entities by the value of IFS. Since the \s single space character is in the default IFS value, this string is split on the single space resulting in the following printout:

Word: one:two
Word: three    
Word: four-five

Here we see the entire string having been tokenized (a.k.a. split) at each occurrence of a single space, resulting in three unique sequences of characters. This use is the common case and illustrative of why the default IFS value was chosen as such. However, as mentioned, the IFS variable can reference a custom value. Consider the same script ran above with a new addition of a custom value for the IFS variable:

#!/bin/bash

# define a string variable
string="one:two three four-five"

# define custom IFS value
IFS=":-"

# iterate over word boundaries
for word in $string; do
  echo "Word: $word"
done

Word boundaries are now defined to be either : or - such that single spaces are considered interword characters. The output of this script is as follows:

Word: one
Word: two three four
Word: five

Here we see the entire two three four sequence is being considered a single “word” since the IFS value no longer contains the single space. Instead, the string gets split on the : and - characters.

Final Thoughts

Internal Field Separators are used across many programming languages and development environments. In some cases, the term can be seen written as Input Field Separator which, arguably, is a rough equivalent albeit slightly more specific. There are a number of exploits related to the IFS that adversaries leverage. Most modern shells have addressed these issues.

Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.