bash 使用shell脚本合并txt文件的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19061509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:05:04  来源:igfitidea点击:

merge lines of a txt file using shell script

bashshellunixvi

提问by arunmoezhi

I invoke a program from shell script and it creates an output file with this format:

我从 shell 脚本调用了一个程序,它创建了一个具有以下格式的输出文件:

aaaaa\
bbbbb\
ccccc\

I would like to change this to:

我想将其更改为:

aaaaabbbbbccccc

In VI editor I can just do ggVGJand then replace all \ with "". But I want to get this done via a script.

在 VI 编辑器中,我可以这样做ggVGJ,然后将所有 \ 替换为“”。但我想通过脚本来完成这项工作。

回答by Steve

Here's one way using GNU sed:

这是使用 GNU 的一种方法sed

sed ':a; N; $!ba; s/\\n//g; s/\$//' file

Another way using awk, may give you better performance:

使用awk, 的另一种方法可能会给您带来更好的性能:

awk '{ sub ("\\$", ""); printf "%s", 
aaaaabbbbbccccc
} END { print "" }' file

Results:

结果:

sed 's/\$//' < sample.txt | tr -d '\n'


Explanation:

解释:

The awksolution removes the trailing backslash (via substitution) and printf's each line (without a newline character). END(which is executed at the end of the script) then prints a newline character. This is superior to the sedsolution, which creates a label called aand appends the next line of input into the pattern space. $!bameans 'if not at the last line of input, branch to label a'. The first substitution then removes each backslash and newline character from the pattern space. The second substitution removes the last, trailing backslash. This solution should be fast for small files, but probably won't be any faster than the awkfor the same file. Although ... it was faster to write.

awk解决方案删除了​​尾随反斜杠(通过替换)和 printf 的每一行(没有换行符)。END(在脚本末尾执行)然后打印一个换行符。这优于sed解决方案,后者创建一个名为的标签a并将输入的下一行附加到模式空间中。$!ba意思是'如果不是在输入的最后一行,分支到标签a'。第一个替换然后从模式空间中删除每个反斜杠和换行符。第二个替换删除最后一个尾随反斜杠。此解决方案对于小文件应该很快,但可能不会比awk相同文件快。虽然......写起来更快。

回答by janos

Here's one way using sedand tr:

这是使用sedand的一种方法tr

sed 's/\$//' < sample.txt | tr -d '\n'; echo

If you want to add a newline too, you can add an echoat the end:

如果您也想添加换行符,可以echo在末尾添加:

{ sed 's/\$//' < sample.txt | tr -d '\n'; echo; }

If you want the whole thing to be a one unit, for example to use in a ... && ... || ...construct then you can group the two steps like this:

如果您希望整个事物成为一个单元,例如在... && ... || ...构造中使用,那么您可以像这样将两个步骤分组:

$ cat file.txt 
aaaaa\
bbbbb\
ccccc\
$ { cat file.txt ; echo; } | while read line; do echo $line; done
aaaaabbbbbccccc
$

回答by Digital Trauma

Another way, using pure bash:

另一种方式,使用纯 bash:

$ cat tmp.txt
aaaaa\
bbbbb\
ccccc\

$ cat tmp.txt | tr -d "\\r\n"
aaaaabbbbbccccc

This works because the bash readcommand actually deals with the \ continuation automatically (use the -r switch to readto disable this behavior). The echoafter the catis necessary for this example because the last line of your sample text ends in \, so the read command doesn't think it has got to the end of a line and doesn't output anything. The echojust inserts an empty line at the end of the stream to clean this up.

这是有效的,因为 bashread命令实际上会自动处理 \ 延续(使用 -r 开关read来禁用此行为)。此示例必须使用echoafter ,cat因为示例文本的最后一行以 结尾\,因此 read 命令不会认为它已到达行尾并且不输出任何内容。在echo刚刚插入一个空行处流的末尾清理它。

回答by Cyber Oliveira

I guess this solution is the smallest:

我想这个解决方案是最小的:

 $ cat file.txt 
 aaaaa\
 bbbbb\
 ccccc\
 $ cat file.txt | gcc -xc -E -P -w - | grep .
 aaaaabbbbbccccc
 $ 

回答by Digital Trauma

This is a reallyugly hack, but you could use the gcc preprocessor:

这是一个非常丑陋的黑客,但您可以使用gcc 预处理器

awk -F'\\$' '{printf "%s", }END{print ""}' file

Why is this risky? If your input text happened to contain preprocessor directives, then they would get interpreted, resulting in a mess.

为什么这有风险?如果您的输入文本碰巧包含预处理器指令,那么它们将被解释,导致混乱。

回答by Kent

try this line;

试试这条线;

sed 's/\$//g' file | awk '{printf "%s", }'

回答by iamauser

One with awkand sed:

一个awksed

##代码##

sedcommand removes the slash at the end of the line. $denotes the end of the line after a slash. Since slashis considered as a meta character in sed, you need an extra \to escape it. piping the output of sed to awk printfprints multiple lines in one. $0represents the entire line.

sed命令删除行尾的斜杠。$表示斜线后的行尾。由于slash被视为 中的元字符sed,因此您需要额外的字符\来转义它。将 sed 的输出管道化为awk printf一行打印多行。$0代表整条线。