bash 如何删除双引号内的新行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29150640/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:34:20  来源:igfitidea点击:

How to remove new lines within double quotes?

regexbashfileubuntunewline

提问by Kenny Basuki

How can I remove new line inside the "from a file?

如何"从文件中删除里面的新行?

For example:

例如:

"one", 
"three
four",
"seven"

So I want to remove the \nbetween the threeand four. Should I use regular expression, or I have to read that's file per character with program?

所以,我想删除\n之间threefour。我应该使用正则表达式,还是必须使用程序读取每个字符的文件?

回答by Wintermute

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

使用 GNU awk (for RT)专门处理那些在双引号字符串中的换行符并保留它们之外的那些换行符:

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", 
"one",
"three
four",
12,
"seven"
, RT) }' file

This works by splitting the file along "characters and removing newlines in every other block. With a file containing

这是通过沿"字符拆分文件并在每个其他块中删除换行符来实现的。包含一个文件

"one",
"threefour",
12,
"seven"

this will give the result

这将给出结果

$ awk '/^"/ {if (f) print f; f=
sed -r '/^"[^"]+$/{:a;N;/",/!ba;s/\n/ /g}' text
; next} {f=f FS
#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    if (m/^\"/) { print "\n"; }
    print;
}


__DATA__
"one", 
"three
four",
"seven"
} END {print f}' file "one", "three four", "seven"

Note that it does not handle escape sequences. If strings in the input data can contain \", such as "He said: \"this is a direct quote.\"", then it will not work as desired.

请注意,它不处理转义序列。如果输入数据中的字符串可以包含\",例如"He said: \"this is a direct quote.\"",那么它将无法按预期工作。

回答by fedorqui 'SO stop harming'

You can print those lines starting with ". If they don't, accumulate its content into a variable and print it later on:

您可以打印以". 如果没有,将其内容累积到一个变量中并稍后打印:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1 } );

open( my $input, "<", "input.csv" ) or die $!;

while ( my $row = $csv->getline($input) ) {
    for (@$row) {
        #remove linefeeds in each 'element'. 
        s/\n/ /g;
        #print this specific element ('naked' e.g. without quotes). 
        print;
        print ",";
    }
    print "\n";
}
close($input);

Since we are always printing the previous block of text, note the need of ENDto print the last stored value after processing the full file.

由于我们总是打印前一个文本块,请注意END在处理完整文件后需要打印最后存储的值。

回答by hek2mgl

You can use sedfor that:

你可以使用sed

line1
line2

The command searches for lines which start with a doublequote but don't contain another doublequote: /^"[^"]+$/

该命令搜索以双引号开头但不包含另一个双引号的行: /^"[^"]+$/

If such a line is found a label :ais defined to mark the start of a loop. Using the Ncommand we append another line from input to the current buffer. If the new line again doesn't contain the closing double quote /",/!we step again to label ausing baunless we found the closing quote.

如果找到这样的行,则:a定义一个标签来标记循环的开始。使用该N命令,我们将另一行从输入添加到当前缓冲区。如果新行再次不包含结束双引号,/",/!我们将再次标记ausing,ba除非我们找到结束引号。

If the quote was found all newlines gettting replaces by a space s/\n/ /gand the buffer gets automatically printed by sed.

如果找到引号,所有换行符都会替换为空格s/\n/ /g,并且缓冲区会由 sed 自动打印。

回答by Sobrique

A simplistic solution:

一个简单的解决方案:

 my $csv_in = 'Text::CSV'->new({binary => 1,
                             sep_char => ";",
                             auto_diag => 1
                             })
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;

my $csv_out = 'Text::CSV'->new({ binary => 1,
                             eol => "\n",
                             sep_char => ";",
                             always_quote => 1,
                             auto_diag => 1
                             })
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;

logger('LOG-3','PROCESSING FILE :'."\n".$source_feed_date_file);

try{
    # Inbound file reader with no encoding specified ==>
    open(my $CSV_FILE, '<', $source_feed_date_file) ;
    # Outbound file writer with UTF8 encoding ==>
    open(my $fh, '>:encoding(UTF-8)', $dest_feed_date_file) ;
    my $rx = 0;
    while (my $row = $csv_in->getline($CSV_FILE)) {
        s/\n|\r|##代码##|[^\x00-\x7F]//g for @$row;
        $csv_out->print ($fh, $row);

        if( $rx % 1000 == 0) {
            print "$rx \n";
        }
        $rx+=1;
    }
    print "Total Number Of Records processed:";
    print $rx ;
    my $e1 = time();
    printf("\n\nTime elapsed for %s : %.2f\n", $file,$e1 - $s1);
  } catch {
        my $e = shift;
        print $e;
        logger('LOG-4','PROCESSING FAILED FOR FILE :'."\n".$source_feed_date_file);
        exit 1;
    };

But taking the specificcase of csvstyle data, I'd suggest using a perl module called Text::CSVwhich parses CSV properly - and treats the 'element with a linefeed' part of the preceeding row.

但是考虑到样式数据的具体情况csv,我建议使用一个名为 perl 模块的模块Text::CSV,它可以正确解析 CSV - 并处理前一行的“带有换行符的元素”部分。

##代码##

回答by Mario Grünwald

tested in a bash

在 bash 中测试

purpose: replace newline inside double quote by \n

目的:用 \n 替换双引号内的换行符

works for unix newline (\n), windows newline (\r\n) and mac newline (\n\r)

适用于 unix 换行符 (\n)、windows 换行符 (\r\n) 和 mac 换行符 (\n\r)

echo -e '"line1\nline2"'`

echo -e '"line1\nline2"'`

##代码##

echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) }'

echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) } '

line1\nline2

line1\nline2