In How do I bring HEREDOC text into a shell script variable? someone reports a problem using a here document with a quoted delimiter word inside $(...)
command substitution, where a backslash \
at the end of a line inside the document triggers newline-joining line continuation, while the same here document outside command substitution works as expected.
Here is a simplified example document:
cat <<'EOT'abc ` defghi \jklEOT
This includes one backtick and one backslash at the end of a line. The delimiter is quoted, so no expansions occur inside the body. In all Bourne-alikes I can find this outputs the contents verbatim. If I put the same document inside a command substitution as follows:
x=$(cat <<'EOT'abc ` defghi \jklEOT)echo "$x"
then they no longer behave identically:
dash
,ash
,zsh
,ksh93
, BusyBoxash
,mksh
, and SunOS 5.10 POSIXsh
all give the verbatim contents of the document, as before.- Bash 3.2 gives a syntax error for an unmatched backtick. With matched backticks, it attempts to run the contents as a command.
- Bash 4.3 collapses "ghi" and "jkl" onto a single line, but has no error. The
--posix
option does not affect this. Kusalananda tells me (thanks!) thatpdksh
behaves the same way.
In the original question, I said this was a bug in Bash's parser. Is it? [Update: yes] The relevant text from POSIX (all from the Shell Command Language definition) that I can find is:
- §2.6.3 Command Substitution:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
- §2.7.4 Here-Document:
If any part of word is quoted, the delimiter shall be formed by performing quote removal on word, and the here-document lines shall not be expanded.
- §2.2.1 Escape Character (Backslash):
If a <newline> follows the <backslash>, the shell shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into tokens.
- §2.3 Token Recognition:
When an io_here token has been recognized by the grammar (see Shell Grammar), one or more of the subsequent lines immediately following the next NEWLINE token form the body of one or more here-documents and shall be parsed according to the rules of Here-Document.
When it is not processing an io_here, the shell shall break its input into tokens by applying the first applicable rule below to the next character in its input. ...
...
- If the current character is <backslash>, single-quote, or double-quote and it is not quoted, it shall affect quoting for subsequent characters up to the end of the quoted text. The rules for quoting are as described in Quoting . During token recognition no substitutions shall be actually performed, and the result token shall contain exactly the characters that appear in the input (except for <newline> joining), unmodified, including any embedded or enclosing quotes or substitution operators, between the and the end of the quoted text.
My interpretation of this is that all characters after $(
until the terminating )
comprise the shell script, verbatim; a here document appears, so here-document processing occurs instead of ordinary tokenisation; the here document then has a quoted delimiter, meaning that its contents is processed verbatim; and the escape character never comes into it. I can see an argument, however, that this case is simply not addressed, and both behaviours are permissible. It's possible that I've skipped over some relevant text somewhere, too.
- Is this situation made clearer elsewhere?
- What should a portable script be able to rely on (in theory)?
- Is the specific treatment given by of any of these shells (Bash 3.2/Bash 4.3/everyone else) mandated by the standard? Forbidden? Permitted?