bash 用空格替换双引号之间的换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26406281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:35:04  来源:igfitidea点击:

Replace new line character between double quotes with space

regexbashshellunixawk

提问by Tom123456

i want to read a data row by row and whereever i find double quote i want to replace new line character with a space till the second double quote encounter like

我想逐行读取数据,无论在何处找到双引号,我都想用空格替换换行符,直到遇到第二个双引号

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing
Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Like in above data second row as it finds the double quote(open) and close double quote in 3rd line so we need to merge these lines by a single space as below:

就像上面的数据第二行一样,因为它在第三行找到双引号(打开)和关闭双引号,所以我们需要用一个空格合并这些行,如下所示:

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

回答by anubhava

You can use this gnu-awk one-liner:

你可以使用这个gnu-awk one-liner

awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print 
perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/ /g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
RT}' file 090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology 090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology 090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
  • RS='"[^"]*"'- Input Record Separator is set to regex '"[^"]*"'
  • -v ORS=- Output Record Separator is set to null
  • gsub(/\n/, " ", RT)- Replace newlines with space in the text matched by Input Record Separator
  • RS='"[^"]*"'- 输入记录分隔符设置为正则表达式 '"[^"]*"'
  • -v ORS=- 输出记录分隔符设置为空
  • gsub(/\n/, " ", RT)- 在匹配的文本中用空格替换换行符 Input Record Separator


And here is a perl one-liner:

这是一个perl one-liner

$ perl -00pe 's/(\n[^"]*"[^"]+)\n(.+?")/ /gm' file 
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

回答by terdon

This will work for the simple case in your example:

这将适用于您的示例中的简单情况:

perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}'

Caveats

注意事项

  • This will load the entire file into memory and that might be a problem, depending on the size of the file.
  • It doesn't deal with open quotes spanning more than a single line.
  • 这会将整个文件加载到内存中,这可能是一个问题,具体取决于文件的大小。
  • 它不处理跨越多行的开放引号。

Explanation

解释

  • -00: slurp the file, treat it as a single string.
  • -pe: print each input line (a single "line" here, because of the -00) after applying the script given by -eto it.
  • (\n[^"]*"[^"]+)\n(.+?"): match a newline (used to indicate the start of a line), followed by as many non-"as possible ([^"]*), then a ", followed by only non-"characters until the next newline ([^"]+\n) and then everything until the 1st quote. The parentheses are there so we can capture the strings matched.
  • $1 $2: This is the replacement, it will print the first two captured groups so we replace the matched pattern with the 1st group, a space and then the second.

  • gm: the gmakes the replacement global, and the mallows multiline strings.

  • -00: slurp 文件,将其视为单个字符串。
  • -pe-00在应用给-e它的脚本后打印每个输入行(这里是单个“行”,因为)。
  • (\n[^"]*"[^"]+)\n(.+?"): 匹配一个换行符(用于表示一行的开始),然后是尽可能多的非字符"[^"]*),然后是 a ",然后是非"字符直到下一个换行符([^"]+\n),然后是所有字符直到第一个引号。括号在那里,所以我们可以捕获匹配的字符串。
  • $1 $2:这是替换,它将打印前两个捕获的组,因此我们将匹配的模式替换为第一个组,一个空格,然后是第二个。

  • gm:g使替换全局化,并m允许多行字符串。

回答by Tiago Lopo

This one-liner will do:

这个单线将做:

perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F'

or:

或者:

$ perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

In action:

在行动:

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
                             eol => "\n",
                           })
    or die "Cannot use CSV: " . 'Text::CSV'->error_diag;

open my $CSV, '<:utf8', shift or die $!;
while (my $row = $csv->getline($CSV)) {
    s/\n/ /g for @$row;
    $csv->print(*STDOUT, $row);
}

回答by choroba

Perl to the rescue:

Perl 来拯救:

remove-newlines.pl input.csv > output.csv

Gives the expected output when run with

运行时给出预期的输出

function fixmylines { 
  local line fullline
  while read line ; do 
    if [[ "$line" =~ ^[0-9a-f]{16}, ]] ; then
      [ "$fullline" ] && echo "$fullline"
      fullline="$line"
    else
      fullline+=" $line"
    fi
  done
  echo "$fullline"
}

回答by Jbar

a solution using (I think) bashism (NOT POSIX, it shouldn'twork on other shell than bash) :

使用(我认为)bashism 的解决方案(不是 POSIX,它不应该在 bash 之外的其他 shell 上工作):

##代码##

then you may pipe your data to this function (" | fixmylines ").

那么你可以将你的数据传送到这个函数(“| fixmylines”)。

Note: it use the regexp "^[0-9a-f]{16}," to determine a beginning of a line

注意:它使用正则表达式“^[0-9a-f]{16},”来确定一行的开头