bash 使用 awk（或 sed）根据下一行的第一个字符删除换行符

Question

提问by Mike

here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:

这是我的情况：我有一个很大的文本文件，我想从中提取某些信息。我使用 sed 根据正则表达式提取所有相关信息，但我提取的每条“信息”都在单独的行上，我希望每个“记录”都在自己的行上，以便可以轻松导入一个数据库。
这是我现在的数据示例：

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

Ideally, I would want this output to look like:

理想情况下，我希望这个输出看起来像：

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

这可能更难做到，所以我会满足于最后一个“记录”的输出只出现一次，附加的“PK...”是该行的第 4 个“字段”。
最后，我能想到的最简单的方法是，如果该行以逗号 ( ^, ) 开头，则应该删除它之前的换行符......不过我对 awk 不太熟悉，所以如果你能给我一个从这个开始，真的很感激！谢谢！

Answer 1

采纳答案by Mike

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

好吧，当我昨晚试图解决这个问题时，我想我应该仔细研究一下在 awk 中使用 Records 的情况......在查看它们 10 分钟后我让它工作了。对于任何感兴趣的人，我是如何做到这一点的：在我原来的 sed 脚本中，我在每条记录的开头添加了一个额外的换行符，因此现在有一个空行分隔每个记录。然后我使用以下 awk 命令：

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
打印 $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

它就像一个魅力，完全按照我想要的方式输出！

Answer 2

回答by Demosthenex

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

翻译：批量阅读而不分隔行，换行后的每个逗号只用一个逗号。

Shortest code here!

最短的代码在这里！

Answer 3

回答by Paused until further notice.

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

Answer 4

回答by ephemient

Without special-casing field 3, easy.

无需特殊套管领域 3、容易。

awk '
    !/^,/   { if (NR > 1) print x ; x = awk '
    !/^,/   { if (n && n < 3) print x ; x = # sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755
 ; n = 1 }
    /^,/    { if (++n > 2) { print x, ##代码## } else { x = x OFS ##代码## } }
    END     { if (n && n < 3) print x }
'
 }
    /^,/    { x = x OFS ##代码## }
    END     { if (NR) print x }
'

With, more complex but still not too hard.

随着，更复杂但仍然不太难。

##代码##

Answer 5

回答by potong

This might work for you:

这可能对你有用：

##代码##

Explanation:

解释：

This comes in two parts:

这分为两部分：

Append the next line and then if the appended line begins with a ,, delete the embedded new line \nand start again. If not print upto the newline and then delete upto the new line. Repeat.

追加下一行，然后如果追加的行以 a 开头,，则删除嵌入的新行\n并重新开始。如果没有打印到换行符，然后删除到新行。重复。

Replace the 5th ,with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.

用,新行替换第 5行。然后在嵌入的换行符和第六个字段之间插入前四个字段。

bash 使用 awk（或 sed）根据下一行的第一个字符删除换行符

提问by Mike

采纳答案by Mike

回答by Demosthenex

回答by Paused until further notice.

回答by ephemient

回答by potong

相关推荐

最近更新

标签

bash 使用 awk（或 sed）根据下一行的第一个字符删除换行符

提问by Mike

采纳答案by Mike

回答by Demosthenex

回答by Paused until further notice.

回答by ephemient

回答by potong

相关推荐

bash 循环退出后的局部变量

bash 使用 vi 打开目录中最后修改的文件

bash 如何通过 TCP 匹配模式？

bash 脚本执行日期/时间

相关推荐

最近更新

标签