bash 在bash中计算一行中的逗号

Question

提问by Stuart Woodward

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.

有时我会收到一个 CSV 文件，其中在单元格内有回车符。对于将其用作输入的程序来说，这不是可接受的格式。

In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?

为了检测输入行是否被拆分，我确定坏行中不会包含预期数量的逗号。是否有 bash 或其他常见的 unix 命令行工具可以让我计算行中的逗号？如有必要，我可以编写一个 Python 或 Perl 程序来执行此操作，但如果可能，我想在现有的 bash 脚本中添加一两行，以在逗号计数错误时使其失败。有任何想法吗？

Answer 1

回答by lanzz

Strip everything but the commas, and then count number of characters left:

去除除逗号之外的所有内容，然后计算剩余的字符数：

$ echo foo,bar,baz | tr -cd , | wc -c
2

Answer 2

回答by Jon Lin

To count the number of times a comma appears, you can use something like awk:

要计算逗号出现的次数，您可以使用类似 awk 的方法：

string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'

But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.

但这确实不足以确定字段中是否包含回车符。字段可以包含逗号，只要它们被引号包围即可。

Answer 3

回答by Paused until further notice.

In pure Bash:

在纯 Bash 中：

while IFS=, read -ra array
do
    echo "$((${#array[@]} - 1))"
done < inputfile

or

或者

while read -r line
do
    count=${line//[^,]}
    echo "${#count}"
done < inputfile

Answer 4

回答by ceving

Try Perl:

试试 Perl：

$ perl -ne 'print 0+@{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4

Answer 5

回答by D Bro

Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:

根据您尝试对 CSV 数据执行的操作，使用 csvquote 之类的包装脚本临时替换引用字段内有问题的换行符（和逗号），然后恢复它们可能会有所帮助。例如：

csvquote inputfile.csv | wc -l

and

和

csvquote inputfile.csv | cut -d, -f1 | csvquote -u

may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1]for the code and more information

可能是你正在寻找的那种东西。有关[https://github.com/dbro/csvquote][1]代码和更多信息，请参阅

Answer 6

回答by Hunter McMillen

Just remove all of the carriage returns:

只需删除所有回车：

tr -d "\r" old_file > new_file

bash 在bash中计算一行中的逗号

提问by Stuart Woodward

回答by lanzz

回答by Jon Lin

回答by Paused until further notice.

回答by ceving

回答by D Bro

回答by Hunter McMillen

相关推荐

最近更新

标签

bash 在bash中计算一行中的逗号

提问by Stuart Woodward

回答by lanzz

回答by Jon Lin

回答by Paused until further notice.

回答by ceving

回答by D Bro

回答by Hunter McMillen

相关推荐

bash git删除多个远程分支

Bash 如何在不同的目录上下文中执行命令？

将 bash 输出重定向到动态文件名

Bash：获取第一个命令行参数并传递其余部分

相关推荐

最近更新

标签