bash 用空格替换双引号之间的换行符

Question

提问by Tom123456

i want to read a data row by row and whereever i find double quote i want to replace new line character with a space till the second double quote encounter like

我想逐行读取数据，无论在何处找到双引号，我都想用空格替换换行符，直到遇到第二个双引号

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing
Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Like in above data second row as it finds the double quote(open) and close double quote in 3rd line so we need to merge these lines by a single space as below:

就像上面的数据第二行一样，因为它在第三行找到双引号（打开）和关闭双引号，所以我们需要用一个空格合并这些行，如下所示：

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Answer 1

回答by anubhava

You can use this gnu-awk one-liner:

你可以使用这个gnu-awk one-liner：

awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/ /g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
  RT}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

RS='"[^"]*"'- Input Record Separator is set to regex '"[^"]*"'
-v ORS=- Output Record Separator is set to null
gsub(/\n/, " ", RT)- Replace newlines with space in the text matched by Input Record Separator

RS='"[^"]*"'- 输入记录分隔符设置为正则表达式 '"[^"]*"'
-v ORS=- 输出记录分隔符设置为空
gsub(/\n/, " ", RT)- 在匹配的文本中用空格替换换行符 Input Record Separator

And here is a perl one-liner:

这是一个perl one-liner：

$ perl -00pe 's/(\n[^"]*"[^"]+)\n(.+?")/ /gm' file 
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Answer 2

回答by terdon

This will work for the simple case in your example:

这将适用于您的示例中的简单情况：

perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}'

Caveats

注意事项

This will load the entire file into memory and that might be a problem, depending on the size of the file.
It doesn't deal with open quotes spanning more than a single line.

这会将整个文件加载到内存中，这可能是一个问题，具体取决于文件的大小。
它不处理跨越多行的开放引号。

Explanation

解释

-00: slurp the file, treat it as a single string.
-pe: print each input line (a single "line" here, because of the -00) after applying the script given by -eto it.
(\n[^"]*"[^"]+)\n(.+?"): match a newline (used to indicate the start of a line), followed by as many non-"as possible ([^"]*), then a ", followed by only non-"characters until the next newline ([^"]+\n) and then everything until the 1st quote. The parentheses are there so we can capture the strings matched.
$1 $2: This is the replacement, it will print the first two captured groups so we replace the matched pattern with the 1st group, a space and then the second.
gm: the gmakes the replacement global, and the mallows multiline strings.

-00: slurp 文件，将其视为单个字符串。
-pe：-00在应用给-e它的脚本后打印每个输入行（这里是单个“行”，因为）。
(\n[^"]*"[^"]+)\n(.+?"): 匹配一个换行符（用于表示一行的开始），然后是尽可能多的非字符"（[^"]*），然后是 a "，然后是非"字符直到下一个换行符（[^"]+\n），然后是所有字符直到第一个引号。括号在那里，所以我们可以捕获匹配的字符串。
$1 $2：这是替换，它将打印前两个捕获的组，因此我们将匹配的模式替换为第一个组，一个空格，然后是第二个。
gm:g使替换全局化，并m允许多行字符串。

Answer 3

回答by Tiago Lopo

This one-liner will do:

这个单线将做：

perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F'

or:

或者：

$ perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

In action:

在行动：

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
                             eol => "\n",
                           })
    or die "Cannot use CSV: " . 'Text::CSV'->error_diag;

open my $CSV, '<:utf8', shift or die $!;
while (my $row = $csv->getline($CSV)) {
    s/\n/ /g for @$row;
    $csv->print(*STDOUT, $row);
}

Answer 4

回答by choroba

Perl to the rescue:

Perl 来拯救：

remove-newlines.pl input.csv > output.csv

Gives the expected output when run with

运行时给出预期的输出

function fixmylines { 
  local line fullline
  while read line ; do 
    if [[ "$line" =~ ^[0-9a-f]{16}, ]] ; then
      [ "$fullline" ] && echo "$fullline"
      fullline="$line"
    else
      fullline+=" $line"
    fi
  done
  echo "$fullline"
}

Answer 5

回答by Jbar

a solution using (I think) bashism (NOT POSIX, it shouldn'twork on other shell than bash) :

使用（我认为）bashism 的解决方案（不是 POSIX，它不应该在 bash 之外的其他 shell 上工作）：

##代码##

then you may pipe your data to this function (" | fixmylines ").

那么你可以将你的数据传送到这个函数（“| fixmylines”）。

Note: it use the regexp "^[0-9a-f]{16}," to determine a beginning of a line

注意：它使用正则表达式“^[0-9a-f]{16},”来确定一行的开头

bash 用空格替换双引号之间的换行符

提问by Tom123456

回答by anubhava

回答by terdon

Caveats

注意事项

Explanation

解释

回答by Tiago Lopo

回答by choroba

回答by Jbar

相关推荐

最近更新

标签

bash 用空格替换双引号之间的换行符

提问by Tom123456

回答by anubhava

回答by terdon

Caveats

注意事项

Explanation

解释

回答by Tiago Lopo

回答by choroba

回答by Jbar

相关推荐

bash 如何退出我的 git commit 消息？我不在 VIM 中，我使用了“ commit -m”命令

bash unity c# 运行shell脚本

bash 通过未执行的 ruby​​_block 设置 Chef 变量

bash ssh 命令输出保存在 shell 脚本中的文本文件中

相关推荐

最近更新

标签

bash 通过未执行的 ruby_block 设置 Chef 变量