bash 用空格替换双引号之间的换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26406281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace new line character between double quotes with space
提问by Tom123456
i want to read a data row by row and whereever i find double quote i want to replace new line character with a space till the second double quote encounter like
我想逐行读取数据,无论在何处找到双引号,我都想用空格替换换行符,直到遇到第二个双引号
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing
Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
Like in above data second row as it finds the double quote(open) and close double quote in 3rd line so we need to merge these lines by a single space as below:
就像上面的数据第二行一样,因为它在第三行找到双引号(打开)和关闭双引号,所以我们需要用一个空格合并这些行,如下所示:
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
回答by anubhava
You can use this gnu-awk one-liner:
你可以使用这个gnu-awk one-liner:
awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/ /g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
RT}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
RS='"[^"]*"'
- Input Record Separator is set to regex'"[^"]*"'
-v ORS=
- Output Record Separator is set to nullgsub(/\n/, " ", RT)
- Replace newlines with space in the text matched byInput Record Separator
RS='"[^"]*"'
- 输入记录分隔符设置为正则表达式'"[^"]*"'
-v ORS=
- 输出记录分隔符设置为空gsub(/\n/, " ", RT)
- 在匹配的文本中用空格替换换行符Input Record Separator
And here is a perl one-liner:
这是一个perl one-liner:
$ perl -00pe 's/(\n[^"]*"[^"]+)\n(.+?")/ /gm' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
回答by terdon
This will work for the simple case in your example:
这将适用于您的示例中的简单情况:
perl -F'' -0 -ane ' foreach $char(@F){ $char eq q(") && {$seen= $seen ? 0 : 1}; $seen && $char eq "\n" && { $char=" "}; print $char}'
Caveats
注意事项
- This will load the entire file into memory and that might be a problem, depending on the size of the file.
- It doesn't deal with open quotes spanning more than a single line.
- 这会将整个文件加载到内存中,这可能是一个问题,具体取决于文件的大小。
- 它不处理跨越多行的开放引号。
Explanation
解释
-00
: slurp the file, treat it as a single string.-pe
: print each input line (a single "line" here, because of the-00
) after applying the script given by-e
to it.(\n[^"]*"[^"]+)\n(.+?")
: match a newline (used to indicate the start of a line), followed by as many non-"
as possible ([^"]*
), then a"
, followed by only non-"
characters until the next newline ([^"]+\n
) and then everything until the 1st quote. The parentheses are there so we can capture the strings matched.$1 $2
: This is the replacement, it will print the first two captured groups so we replace the matched pattern with the 1st group, a space and then the second.gm
: theg
makes the replacement global, and them
allows multiline strings.
-00
: slurp 文件,将其视为单个字符串。-pe
:-00
在应用给-e
它的脚本后打印每个输入行(这里是单个“行”,因为)。(\n[^"]*"[^"]+)\n(.+?")
: 匹配一个换行符(用于表示一行的开始),然后是尽可能多的非字符"
([^"]*
),然后是 a"
,然后是非"
字符直到下一个换行符([^"]+\n
),然后是所有字符直到第一个引号。括号在那里,所以我们可以捕获匹配的字符串。$1 $2
:这是替换,它将打印前两个捕获的组,因此我们将匹配的模式替换为第一个组,一个空格,然后是第二个。gm
:g
使替换全局化,并m
允许多行字符串。
回答by Tiago Lopo
This one-liner will do:
这个单线将做:
perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F'
or:
或者:
$ perl -F'' -0 -ane ' foreach $char(@F){ $char eq q(") && {$seen= $seen ? 0 : 1}; $seen && $char eq "\n" && { $char=" "}; print $char}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
In action:
在行动:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
eol => "\n",
})
or die "Cannot use CSV: " . 'Text::CSV'->error_diag;
open my $CSV, '<:utf8', shift or die $!;
while (my $row = $csv->getline($CSV)) {
s/\n/ /g for @$row;
$csv->print(*STDOUT, $row);
}
回答by choroba
Perl to the rescue:
Perl 来拯救:
remove-newlines.pl input.csv > output.csv
Gives the expected output when run with
运行时给出预期的输出
function fixmylines {
local line fullline
while read line ; do
if [[ "$line" =~ ^[0-9a-f]{16}, ]] ; then
[ "$fullline" ] && echo "$fullline"
fullline="$line"
else
fullline+=" $line"
fi
done
echo "$fullline"
}
回答by Jbar
a solution using (I think) bashism (NOT POSIX, it shouldn'twork on other shell than bash) :
使用(我认为)bashism 的解决方案(不是 POSIX,它不应该在 bash 之外的其他 shell 上工作):
##代码##then you may pipe your data to this function (" | fixmylines ").
那么你可以将你的数据传送到这个函数(“| fixmylines”)。
Note: it use the regexp "^[0-9a-f]{16}," to determine a beginning of a line
注意:它使用正则表达式“^[0-9a-f]{16},”来确定一行的开头