bash 使用 Sed Mac 终端查找和替换空格

Question

提问by Leonna Sylvester

I have a .CSV file with over 500,000 lines that I need to:

我有一个超过 500,000 行的 .CSV 文件，我需要：

find all 'space double quote space' sequences and replace with nothing
find all 'space double quote' sequences and replace with nothing
find all double quotes and replace with nothing

找到所有“空格双引号空格”序列并用空替换
找到所有“空格双引号”序列并用空替换
查找所有双引号并用空替换

Example of .CSV line:

.CSV 行示例：

"DISH Hartford & New Haven  (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County"

** Required output**

** 所需输出**

DISH Hartford & New Haven  (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

I need to remove all double quotes (") and spaces in front of and behind the commas (,).

我需要删除所有双引号 ( ") 和逗号 ( ,)前后的空格。

I've tried

我试过了

$ cd /Users/Leonna/Downloads/
$ cat bs-B2Bformat.csv | sed s/ " //g

This gives me the 'command incomplete' greater than prompt, so I then tried:

这给了我比提示更大的“命令不完整”，所以我尝试了：

$ cat bs-B2Bformat.csv | sed s/ " //g
sed: 1: "s/": unterminated substitute pattern
$ cat bs-B2Bformat.csv |sed s/ \" //g
sed: 1: "s/": unterminated substitute pattern
$

There are too many lines for me to edit in Excel (Excel won't load all the lines) or even a text editor. How can I fix this?

我需要在 Excel 中编辑太多行（Excel 不会加载所有行）甚至文本编辑器。我怎样才能解决这个问题？

Answer 1

回答by brunocodutra

Quoted from here:

从这里引用：

For POSIX compliance, use the character class [[:space:]] instead of \s, since the latter is a GNU sed extension.

为了符合 POSIX，请使用字符类 [[:space:]] 而不是 \s，因为后者是 GNU sed 扩展。

Based on that, I would suggest the following, which, as Jonathan Lefflerpointed out, is portable across GNU and BSD implementations.

基于此，我建议以下内容，正如Jonathan Leffler指出的那样，它可以跨 GNU 和 BSD 实现移植。

sed -E 's/[[:space:]]?"[[:space:]]?//g' <path/to/file>

The -Eflag enables extended regular expressionson BSD implementations. On GNU sedit is undocumented, but as discussed here, it enables compatibility with the BSD standard.

该-E标志在 BSD 实现上启用扩展正则表达式。在 GNU 上sed它是未记录的，但正如这里所讨论的，它能够与 BSD 标准兼容。

Quoted from the manual for BSD sed:

引用自BSD 手册sed：

-E Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE's).

-E 将正则表达式解释为扩展（现代）正则表达式而不是基本正则表达式 (BRE)。

Applying the above command on a file containing the following single line

在包含以下单行的文件上应用上述命令

"DISH Hartford & New Haven (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County"

“DISH Hartford & New Haven (Hartford)”、“206”、“FBNHD”、“06028”、“East Windsor Hill”、“CT”、“Hartford County”

it yields

它产生

DISH Hartford & New Haven (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

Answer 2

回答by Shylo Hana

This should do it:

这应该这样做：

sed -i 's/\(\s\|\)"\(\|\s\)//g' bs-B2Bformat.csv

Answer 3

回答by iamauser

This works for me. Is this what you want ?

这对我有用。这是你想要的吗？

 sed -e 's|", "|,|g' -e 's|^"||g' -e 's|"$||g' file.csv

 echo '"DISH Hartford & New Haven (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County"' | sed -e 's|", "|,|g' -e 's|^"||g' -e 's|"$||g'

 DISH Hartford & New Haven (Hartford),206,FBNHD, 06028, East Windsor Hill,CT,Hartford County

Answer 4

回答by Birei

One way is to use pythonand its csvmodule:

一种方法是使用python及其csv模块：

import csv 
import sys 

## Open file provided as argument.
with open(sys.argv[1], 'r') as f:

    ## Create the csv reader and writer. Avoid to quote fields in output.
    reader = csv.reader(f, skipinitialspace=True)
    writer = csv.writer(sys.stdout, quoting=csv.QUOTE_NONE, escapechar='\')

    ## Read file line by line, remove leading and trailing white spaces and
    ## print.
    for row in reader:
        row = [field.strip() for field in row]
        writer.writerow(row)

Run it like:

像这样运行它：

python3 script.py csvfile

That yields:

这产生：

DISH Hartford & New Haven  (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

Answer 5

回答by Nashenas

What all of the current answers seemed to miss:

当前所有的答案似乎都遗漏了什么：

$ cat bs-B2Bformat.csv | sed s/ " //g
sed: 1: "s/": unterminated substitute pattern
$ cat bs-B2Bformat.csv |sed s/ \" //g
sed: 1: "s/": unterminated substitute pattern
$

$ cat bs-B2Bformat.csv | sed s/ " //g
sed: 1: "s/": unterminated substitute pattern
$ cat bs-B2Bformat.csv |sed s/ \" //g
sed: 1: "s/": unterminated substitute pattern
$

The problem in the above is missing single quotes. It should have been:

上面的问题是缺少单引号。本来应该是：

$ cat bs-B2Bformat.csv | sed 's/ " //g'
                             ^        ^

Without the single quotes, bash splits at the spaces and sends three separate arguments (well at least for the case of \"). sed was seeing its first argument as just s/.

如果没有单引号，bash 在空格处拆分并发送三个单独的参数（至少对于的情况是这样 \"）。sed 将其第一个参数视为 just s/。

Edit: FYI, single quotes are not required, they just make this case easier. If you want to use double quotes, just escape the one you want to keep for matching:

编辑：仅供参考，不需要单引号，它们只是使这种情况更容易。如果你想使用双引号，只需转义你想保留的匹配：

$ cat bs-B2Bformat.csv | sed "s/ \" //g"

bash 使用 Sed Mac 终端查找和替换空格

提问by Leonna Sylvester

回答by brunocodutra

回答by Shylo Hana

回答by iamauser

回答by Birei

回答by Nashenas

相关推荐

最近更新

标签

bash 使用 Sed Mac 终端查找和替换空格

提问by Leonna Sylvester

回答by brunocodutra

回答by Shylo Hana

回答by iamauser

回答by Birei

回答by Nashenas

相关推荐

bash SUID 不适用于 shell 脚本

bash 在 CentOS 中设置 Crontab 以执行 PHP 脚本

bash 获取grep匹配后的下一个单词

bash 如何使用 sed/awk/perl 从数字中删除前导零和尾随零？

相关推荐

最近更新

标签