bash 使用 awk(或 sed)根据下一行的第一个字符删除换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2208059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 21:38:23  来源:igfitidea点击:

Using awk (or sed) to remove newlines based on first character of next line

bashshellsedawk

提问by Mike

here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:

这是我的情况:我有一个很大的文本文件,我想从中提取某些信息。我使用 sed 根据正则表达式提取所有相关信息,但我提取的每条“信息”都在单独的行上,我希望每个“记录”都在自己的行上,以便可以轻松导入一个数据库。
这是我现在的数据示例:

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

Ideally, I would want this output to look like:

理想情况下,我希望这个输出看起来像:

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

这可能更难做到,所以我会满足于最后一个“记录”的输出只出现一次,附加的“PK...”是该行的第 4 个“字段”。
最后,我能想到的最简单的方法是,如果该行以逗号 ( ^, ) 开头,则应该删除它之前的换行符......不过我对 awk 不太熟悉,所以如果你能给我一个从这个开始,真的很感激!谢谢!

采纳答案by Mike

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

好吧,当我昨晚试图解决这个问题时,我想我应该仔细研究一下在 awk 中使用 Records 的情况......在查看它们 10 分钟后我让它工作了。对于任何感兴趣的人,我是如何做到这一点的:在我原来的 sed 脚本中,我在每条记录的开头添加了一个额外的换行符,因此现在有一个空行分隔每个记录。然后我使用以下 awk 命令:

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
打印 $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

它就像一个魅力,完全按照我想要的方式输出!

回答by Demosthenex

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

翻译:批量阅读而不分隔行,换行后的每个逗号只用一个逗号。

Shortest code here!

最短的代码在这里!

回答by Paused until further notice.

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

回答by ephemient

Without special-casing field 3, easy.

无需特殊套管领域 3、容易。

awk '
    !/^,/   { if (NR > 1) print x ; x = 
awk '
    !/^,/   { if (n && n < 3) print x ; x = 
# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755
; n = 1 } /^,/ { if (++n > 2) { print x, ##代码## } else { x = x OFS ##代码## } } END { if (n && n < 3) print x } '
} /^,/ { x = x OFS ##代码## } END { if (NR) print x } '

With, more complex but still not too hard.

随着,更复杂但仍然不太难。

##代码##

回答by potong

This might work for you:

这可能对你有用:

##代码##

Explanation:

解释:

This comes in two parts:

这分为两部分:

Append the next line and then if the appended line begins with a ,, delete the embedded new line \nand start again. If not print upto the newline and then delete upto the new line. Repeat.

追加下一行,然后如果追加的行以 a 开头,,则删除嵌入的新行\n并重新开始。如果没有打印到换行符,然后删除到新行。重复。

Replace the 5th ,with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.

,新行替换第 5行。然后在嵌入的换行符和第六个字段之间插入前四个字段。