Text interpretation in Bash

The text is considered a “universal interface” for Unix systems. As you can see, Bash has a certain way of interpreting the text we send him.

We cannot, for example, simply type, “Create a new directory called ‘Documents’

“, and expect Bash to know what’s going on: user@host:~$ Create a new directory called ‘Documents’ Create:

command not found

Bash waits for text to arrive Some words, such as mkdir seem to refer to programs. And some symbols, such as $, ~, and *, will be interpreted by Bash to mean something much more expansive than just individual characters.

How can Bash tell the difference between commands, symbols, and “text only”

? Bash has a syntax that defines how it will interpret the text characters we send it. Just as English has a syntax in which the next two sentences have the same words, but a different interpretation based on punctuation:

“That’s what

,” he said. That’s

what he said

.

But just as it’s a bad idea to teach children their first language by focusing on the rules of grammar, it’s not productive to simply learn Bash by memorizing its particular grammar and syntax: you should write programs and see what happens.

However, it is useful to explain some of the initial concepts of how Bash interprets our commands and data, as a way to prepare for the seemingly rudimentary way Unix handles text. Most of these concepts will make more sense after you’ve read about piping, redirection, and variables.

Literal values

In

programming, a literal value can be thought of: what you see is what you get

.

In the script below, I call the mkdir command three separate times. However, it will not only create 3 directories

: user@host:~$ mkdir 42 user@host:~$ mkdir orange apples user@host:~$ mkdir “42 bottles of beer”

In the animated GIF below, I’m running these commands in OS X so you can see how it affects the file system, graphically:

<img src="

http://www.compciv.org/files/images/cli/mkdir-literal-strings.gif” alt=”img” />

So what were the characters, or text strings, that were interpreted by the shell as literal values?

  • 42
  • orange

  • apples
  • 42 bottles of beer (including

space characters)

And what text characters were not interpreted as literal values?

mkdir

  • – this was interpreted as the command to make a new
  • directory The

  • spacing characters between mkdir and the directory names passed to it
  • The space character between apples and oranges, therefore, the creation of two
  • separate directories

  • The quotation marks around 42 bottles of beer

Values separated by spaces

If you come from a modern operating system, such as Windows or OS X, you’ve probably seen that it’s possible to create files or directories with namespace characters, for example, the My Documents & Settings directory on your C:\ drive.

So how does mkdir know he wanted to make two separate directories instead of one called orange apples? It didn’t. We have to explicitly specify that particular directory name by enclosing it

in quotation marks, either single or double: user@host:~$ mkdir ‘apples and oranges’ “sun and lollipops”

Without the use of quotation marks, Bash will interpret each word separated by spaces as a separate “word” or token. So the dogs mkdir cats will be treated as three different tokens: the mkdir command, and the two arguments dogs and cats

Citations to enclose literal values

Both apostrophes (single quotes) and quotation marks (double quotes) can be used to denote a string of text (whether it contains spaces or new lines) as a single literal value. Whichever one you start with, be sure to be done with it

: user@host:~$ echo ‘Jimmy says ‘Hello” Jimmy says “Hello” user@host:~$ echo “Jimmy’s friend doesn’t respond” Jimmy’s friend doesn’t respond

Single vs

double quotation marks

However, when using double quotation marks, certain special characters, such as dollar signs denoting a variable, They will be interpreted by the shell and expanded.

In the

single quotation mark version, the entire text string passed to echo is interpreted literally

: user@host:~$ some_number=42 user@host:~$ echo ‘There are $s bottles of beer’ There are $some_number bottles of beer

In the double-quotation version, the shell sees the $ and replaces the variable some_number with its actual value,

42: user@host:~$ some_number=42 user@host:~$ echo “There are bottles of beer $some_number” There are 42 bottles of

beer

A technical aspect: in the old days of computing, it was easy to assume that file names (and the names of programs and commands) would never have a space in them. Now, that has changed. Therefore, most programs and commands designed for Unix-like systems still adhere to this “unsophisticated filename” mentality, in my opinion, quite sensibly, while allowing users to use the aforementioned quotation marks to outline fancy file names

. But keep in mind that the real world isn’t that simple, and not knowing that can lead to a lot of problems. For example, watch me create four new directories on my OS X system via

mkdir: ~$mkdir cats dogs~$mkdir “This is the end, my friend” ~$mkdir “You > > never name a directory like > this.” ~$ls Never name a directory like this. This is the end, my cat dog friend # Note: I have removed the apostrophe from the output here for purposes of # format

As an animated GIF:

<img

src=”http://www.compciv.org/files/images/cli/mkdir-annoying.gif” alt=”img” />

Suffice it to say that most programmers don’t expect a filename to contain new lines. , and that assumption is the source of many comical or critical errors (and sometimes both) of the system. That’s why later in this course, we move on to more sophisticated text handling environments, for example, Python.

The importance

of double quotation marks

A vital purpose of double quotation marks will be evident in later examples of variable usage. If a variable contains a space-separated value, such as Documents and Settings, wrapping a variable in double quotation marks prevents the variable’s space-separated values from being interpreted separately, which can cause unpleasant unexpected effects.

Again, this will make more sense when we see how the variables are used. But suppose the dir_name variable has been set to “Documents and Settings”. And compare the effects

of the three mkdir calls below: user@host:~$ dir_name=’Documents and Settings’ user@host:~$ echo $dir_name Documents and Settings user@host:~$ mkdir ‘$dir_name’ user@host:~$ mkdir “$dir_name” user@host:~$ mkdir $dir_name The

  • first call is simply incorrect: by wrapping a variable in single quotation marks, mkdir creates a directory with the literal name of $dir_name
  • The second call, with $dir_name inside double quotation marks, behaves as expected. The shell expands $dir_name to the string, Documents and Settings, and creates a single directory with that name.
  • The third call, with $dir_name passing as an unquoted argument to mkdir, causes three directories to be created: Documents and Settings

Here is an animated GIF showing which directories are created unexpectedly as a result of a variable containing a value with spaces:

<img

src=”http://www.compciv.org/files/images/cli/mkdir-quoted-args.gif” alt=”img” />

Line-by-line interpretation

So, with the interactive command line, the shell usually expects to execute a command every time you press Enter (i.e. send a newline character)

sunet_id@corn30:~$ echo Hello Hello sunet_id@corn30:~$

There are some exceptions, such as when quoted values include newline characters (that is, what happens when you press Enter). And there are special characters we can use to change the interaction line by line, although these are more or less for human readability purposes.

Using backslashes to split

a command into multiple lines For a single command

containing so many characters that it causes a line wrap, it’s useful, again, for human readability, since the computer doesn’t mind in any way, splitting it into multiple lines.

Terminating a line with a backslash

will tell the shell that the command continues on the following line (notice how the indicator changes to a right-angle bracket): sunet_id@corn30:~$ echo Hello > world Hello World Note: Make sure that the backslash

is the last character on the line you want to continue, that is, press Enter immediately after the backslash, do

not put a space or any other character after

the backslash on the same line.

Unwanted multi-line commands

Using the backslash at the end of a line is how we explicitly tell Bash: “Hey, don’t do anything yet, We will continue this command in the next line.” However, it is quite easy for typos to cause us to accidentally transfer commands. This happens most often with unclosed quotation marks or parentheses

: sunet_id@corn30:~$ echo “How are you world? > > ksdfljsadklfj > ” How are you world? ksdfljsadklfj <img

src=”http://www.compciv.org/files/images/cli/echo-hello-oops-quotes.gif” alt=”img” />

Tip: If you unintentionally encounter this situation and can’t find your way out, press Ctrl-C to exit limbo and return to the standard message.

Semicolon to separate short commands on a single line

When you have multiple commands that are so short that they don’t seem to deserve their own lines, you can use the semicolon to separate the commands, and Bash will keep executing the command as if you had put the commands on your own lines

: user@host:/tmp$pwd; MKDIR stuff; CD stuff; pwd /tmp /tmp/stuff

As GIF:

<img

src=”http://www.compciv.org/files/images/cli/semi-colon-commands.gif” alt=”img” />

Double ampersands to execute commands conditionally

Using the double ampersand will allow you to join commands on a single line. However, how && differs from ; is that if the first command fails, the

subsequent command will not be executed: user@host:/$ pwd && mkdir stuff && cd stuff && pwd / user@host: unable to create ‘stuff’ directory: Permission denied

As GIF:

<img

src=”http://www.compciv.org/files/images/cli/ampersand-commands.gif” alt=”img” />

Using double ampersands is considered a good practice when doing something destructive right after a command that may not succeed. Consider these two commands (but do not run them on your own system):

# Dangerous: user@host:/$ cd junk; rm -f * # Secure: user@host:/$ cd junk && rm -f *

What happens when the spam directory exists? The cd (change directory) command succeeds, and then the rm command deletes all the files it contains. But what happens when garbage doesn’t exist? Where is the program when the cd fails? And where will rm be unexpectedly doing business?

Comments with the pound sign

This feature won’t be particularly useful to you until you start writing shell script files. But the pound sign can be used to tell Bash to ignore all the characters to the right of the pound sign. This can be used to annotate your code

: user@host:/tmp/x$ # I hope this works user@host:/tmp/x$ mkdir new_dir user@host:/tmp/x$ # I hope it worked

Multi-line

data

The line-by-line nature of how Bash processes data makes it an inelegant system for processing data that spans more than one line.

For example, in the following example HTML snippet:

<h1>This is a headline</h1>

It is trivial (though clumsy) to extract the text, This is a headline, between the h1 tags using grep (with Perl standard regex):

echo ‘<h1>This is a headline</h1>’ | grep -oP ‘(?<=<h1>)(.+?) (?=</h1>)’

However, if the data looks like this:

<h1>This is a headline </h1>

Then things get complicated. Standard grep, for example, won’t work with text patterns that have newline characters, although it does have access to the awk and sed word processing tools.

Heredocs

While it is possible to use quotation marks to enclose multi-line strings:

echo “hey you what’s going on?”

– this quickly becomes cumbersome when the strings themselves contain literal quotation marks, as in the case of HTML:

echo ” <p class=”note”> John told me, “This site is the <a href=”http://example.com” target=”_blank”> best” </a> </p> ”

When using a “Heredoc string”, we can specify that some other delimiter be used to denote the beginning and end of a string (note that we use cat now, instead of echo). Heredocs is a great way to include multiline text, such as rows of data, along with our script file.

cat <<EOF <p class=”note”> John told me, “This site is the <a href=”http://example.com” target=”_blank”> best” </a> </p> EOF

The “limit string”, which in the above case is EOF, is traditionally used to delimit the string, although a sequence of characters can be used, provided that these conditions are met:

The boundary chain

  1. is immediately preceded by the <<
  2. When you have reached the Heredoc string, the boundary string

  3. will be on its own line with no blank spaces between it and the beginning of the line.
  4. You want the boundary string to

  5. be unique enough that it does not have a chance to appear in the Heredoc string

.

So this is good

: cat <<THISISMYHEREDOC hello THISISMYHEREDOC The

following examples are incorrect

: cat <<EOF hello there is EOF cat << EOF hello there EOF

Send a Heredoc to a file

The notation is a bit strange, but think of it as a cat feeding things.html what you get from the << operator:

cat > stuff.html <<EOF <html> <h1> <a href=”http://example.com”>An example</a> </h1> </html> EOF Make a literal Heredoc that

is not interpreted

By default, a Heredoc that contains special symbols and sequences, such as $ before a variable name, will have those sequences expanded. just as they would be in a normal string of double quotation marks. To avoid this, place the EOF inside single quotes

: world=”LADEEDAH” # This is interpreted cat <<EOF Hello $world EOF # Output: # Hello LADEEDAH # Prevent interpretation: cat <<‘EOF’ Hello $world EOF # Output: #

Hello $world Earlier, I said that the limit string, for example, EOF, has to be exactly the same at the beginning and at the beginning of the heredoc. The exception is for certain special symbols, such as single quotation marks… in other words, you can start a heredoc

with ‘EOF’ and end it with EOF

Assigning a Heredoc to a variable

Use the read command (read this elaboration in StackOverflow): read

-r -d ” some_variable <<‘EOF’ <html> <h1> <a href=”http://example.com”>An example</a> </h1> </html> EOF

Read more about Heredocs here:

  • GNU Reference for Heredocs
  • StackOverflow: How to assign a heredoc value to a variable in Bash?
  • TLDP: Here Documents

Contact US