bash 如何删除双引号内的新行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29150640/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove new lines within double quotes?
提问by Kenny Basuki
How can I remove new line inside the "
from a file?
如何"
从文件中删除里面的新行?
For example:
例如:
"one",
"three
four",
"seven"
So I want to remove the \n
between the three
and four
. Should I use regular expression, or I have to read that's file per character with program?
所以,我想删除\n
之间three
和four
。我应该使用正则表达式,还是必须使用程序读取每个字符的文件?
回答by Wintermute
To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT
):
使用 GNU awk (for RT
)专门处理那些在双引号字符串中的换行符并保留它们之外的那些换行符:
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", "one",
"three
four",
12,
"seven"
, RT) }' file
This works by splitting the file along "
characters and removing newlines in every other block. With a file containing
这是通过沿"
字符拆分文件并在每个其他块中删除换行符来实现的。包含一个文件
"one",
"threefour",
12,
"seven"
this will give the result
这将给出结果
$ awk '/^"/ {if (f) print f; f=sed -r '/^"[^"]+$/{:a;N;/",/!ba;s/\n/ /g}' text
; next} {f=f FS #!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
if (m/^\"/) { print "\n"; }
print;
}
__DATA__
"one",
"three
four",
"seven"
} END {print f}' file
"one",
"three four",
"seven"
Note that it does not handle escape sequences. If strings in the input data can contain \"
, such as "He said: \"this is a direct quote.\""
, then it will not work as desired.
请注意,它不处理转义序列。如果输入数据中的字符串可以包含\"
,例如"He said: \"this is a direct quote.\""
,那么它将无法按预期工作。
回答by fedorqui 'SO stop harming'
You can print those lines starting with "
. If they don't, accumulate its content into a variable and print it later on:
您可以打印以"
. 如果没有,将其内容累积到一个变量中并稍后打印:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } );
open( my $input, "<", "input.csv" ) or die $!;
while ( my $row = $csv->getline($input) ) {
for (@$row) {
#remove linefeeds in each 'element'.
s/\n/ /g;
#print this specific element ('naked' e.g. without quotes).
print;
print ",";
}
print "\n";
}
close($input);
Since we are always printing the previous block of text, note the need of END
to print the last stored value after processing the full file.
由于我们总是打印前一个文本块,请注意END
在处理完整文件后需要打印最后存储的值。
回答by hek2mgl
You can use sed
for that:
你可以使用sed
:
line1 line2
The command searches for lines which start with a doublequote but don't contain another doublequote: /^"[^"]+$/
该命令搜索以双引号开头但不包含另一个双引号的行: /^"[^"]+$/
If such a line is found a label :a
is defined to mark the start of a loop. Using the N
command we append another line from input to the current buffer. If the new line again doesn't contain the closing double quote /",/!
we step again to label a
using ba
unless we found the closing quote.
如果找到这样的行,则:a
定义一个标签来标记循环的开始。使用该N
命令,我们将另一行从输入添加到当前缓冲区。如果新行再次不包含结束双引号,/",/!
我们将再次标记a
using,ba
除非我们找到结束引号。
If the quote was found all newlines gettting replaces by a space s/\n/ /g
and the buffer gets automatically printed by sed.
如果找到引号,所有换行符都会替换为空格s/\n/ /g
,并且缓冲区会由 sed 自动打印。
回答by Sobrique
A simplistic solution:
一个简单的解决方案:
my $csv_in = 'Text::CSV'->new({binary => 1,
sep_char => ";",
auto_diag => 1
})
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;
my $csv_out = 'Text::CSV'->new({ binary => 1,
eol => "\n",
sep_char => ";",
always_quote => 1,
auto_diag => 1
})
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;
logger('LOG-3','PROCESSING FILE :'."\n".$source_feed_date_file);
try{
# Inbound file reader with no encoding specified ==>
open(my $CSV_FILE, '<', $source_feed_date_file) ;
# Outbound file writer with UTF8 encoding ==>
open(my $fh, '>:encoding(UTF-8)', $dest_feed_date_file) ;
my $rx = 0;
while (my $row = $csv_in->getline($CSV_FILE)) {
s/\n|\r|##代码##|[^\x00-\x7F]//g for @$row;
$csv_out->print ($fh, $row);
if( $rx % 1000 == 0) {
print "$rx \n";
}
$rx+=1;
}
print "Total Number Of Records processed:";
print $rx ;
my $e1 = time();
printf("\n\nTime elapsed for %s : %.2f\n", $file,$e1 - $s1);
} catch {
my $e = shift;
print $e;
logger('LOG-4','PROCESSING FAILED FOR FILE :'."\n".$source_feed_date_file);
exit 1;
};
But taking the specificcase of csv
style data, I'd suggest using a perl module called Text::CSV
which parses CSV properly - and treats the 'element with a linefeed' part of the preceeding row.
但是考虑到样式数据的具体情况csv
,我建议使用一个名为 perl 模块的模块Text::CSV
,它可以正确解析 CSV - 并处理前一行的“带有换行符的元素”部分。
回答by Mario Grünwald
tested in a bash
在 bash 中测试
purpose: replace newline inside double quote by \n
目的:用 \n 替换双引号内的换行符
works for unix newline (\n), windows newline (\r\n) and mac newline (\n\r)
适用于 unix 换行符 (\n)、windows 换行符 (\r\n) 和 mac 换行符 (\n\r)
echo -e '"line1\nline2"'`
##代码##echo -e '"line1\nline2"'`
echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) }'
echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) } '
line1\nline2
line1\nline2