bash – Shell脚本读取缺少最后一行

我有一个…奇怪的问题,一个bash shell脚本,我希望得到一些洞察。

我的团队正在处理一个脚本,它遍历文件中的行并检查每个文件中的内容。我们有一个错误,当通过自动化过程运行不同的脚本在一起,最后一行没有被看到。

用于遍历文件中行(存储在DATAFILE中的名称)的代码

cat "$DATAFILE" | while read line

我们可以从命令行运行脚本,它会看到文件中的每一行,包括最后一行,就好了。但是,当由自动化过程(运行脚本生成正在讨论的脚本之前的DATAFILE)运行时,最后一行从不可见。

我们更新了代码,使用以下代码遍历这些行,问题清除了:

for line in `cat "$DATAFILE"`

注意:DATAFILE没有在文件末尾写入的换行符。

我的问题是两部分…为什么最后一行不会被原始代码看到,为什么这将改变有所作为?

我只想到我可以想出为什么最后一行不会被看到是:

>上一个写入文件的进程依赖于进程结束以关闭文件描述符。
>问题脚本启动并打开文件之前足够快,以便在上一个进程“结束”时,它没有“关闭/清理”足够的系统自动关闭文件描述符。

也就是说,如果你有一个shell脚本中的2个命令,第一个应该完全关闭,当脚本运行第二个。

任何对问题的洞察,特别是第一个,将非常感谢。

C标准说文本文件必须以换行符结束,或者在最后一个换行符之后的数据可能无法正确读取。 @H_404_30@

ISO/IEC 9899:2011 §7.21.2 Streams

A text stream is an ordered sequence of characters composed into lines,each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. Characters
may have to be added,altered,or deleted on input and output to conform to differing
conventions for representing text in the host environment. Thus,there need not be a one-to-
one correspondence between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a new-line character.
Whether space characters that are written out immediately before a new-line character
appear when read in is implementation-defined.

我不会有意外的在文件结束时导致麻烦,导致麻烦在bash(或任何Unix shell),但这似乎是可重复性的问题($是这个输出中的提示):

$ echo xxx\\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done      # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done   # UUOC Award pending
abc
def
ghi
xxx
$

它也不限于bash – Korn shell(ksh)和zsh行为也像这样。我住,我学习;感谢提出这个问题。

如上面的代码所示,cat命令读取整个文件。 cat $ DATAFILE技术中的for行收集所有输出,并用一个空格替换空格的任意序列(我推断文件中的每一行都不包含空格)。

在Mac OS X 10.7.5上测试。

POSIX是什么意思?

POSIX read命令规范说:

@H_404_30@

The read utility shall read a single line from standard input.

By default,unless the -r option is specified,<backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character,with the exception of a <newline>. If a <newline> follows the <backslash>,the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

If standard input is a terminal device and the invoking shell is interactive,read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>,unless the -r option is specified.

The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); […]

注意'(如果有)'(强调在报价中添加)!在我看来,如果没有换行符,它应该仍然读结果。另一方面,它也说:

@H_404_30@

STDIN

The standard input shall be a text file.

然后你回到关于一个不以换行符结尾的文件是否是一个文本文件的争论。

但是,同一页文件的理由:

@H_404_30@

Although the standard input is required to be a text file,and therefore will always end with a <newline> (unless it is an empty file),the processing of continuation lines when the -r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that “if any” is used in “The terminating <newline> (if any) shall be removed from the input” in the description. It is not a relaxation of the requirement for standard input to be a text file.

这个理由必须意味着文本文件应该以换行符结束。

POSIX文本文件定义是:

@H_404_30@

07001 Text File

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length,including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard),many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify “text files” in their STDIN or INPUT FILES sections.

这没有规定’end with a< newline>‘直接,但遵循C标准。

解决了“无终端换行”的问题

Gordon Davissonanswer.一个简单的测试表明,他的观察是准确的:

$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$

因此,他的技术:

while read line || [ -n "$line" ]; do echo $line; done < y

要么:

cat y | while read line || [ -n "$line" ]; do echo $line; done

将工作的文件没有换行结束(至少在我的机器上)。

我仍然惊讶地发现,shell删除了最后一个段(它不能被称为一行,因为它不以换行结束)的输入,但在POSIX可能有足够的理由这样做。显然,最好确保你的文本文件真的是以换行符结尾的文本文件

相关文章

普通模式 >G 增加当前行到文档末尾处的缩紧层级 $ 移动到本行的末尾 . 相当于一个...
原文连接: https://spacevim.org/cn/layers/lang/elixir/ 模块简介 功能特性 启用模块 快捷键 语言专属...
原文连接: https://spacevim.org/cn/layers/lang/dart/ 模块简介 功能特性 依赖安装及启用模块 启用模...
 =   赋值操作符,可以用于算术和字符串赋值 +        加法计算     -        减法运算...
1.根据包名来查看指定的APP指定数据 adb shell "top | grep com.xxx.xxx" 由于这样打印出来的数...
ctrl+F 向下翻页 ctrl+B 向下翻页 u 取消最近一次操作 U 取消当前行的操作 ZZ 保存当前内容并退出 gg 跳...