如何使用 bash 命令将 csv 转换为二进制文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37613688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 14:43:54  来源:igfitidea点击:

How do I convert a csv to a binary file with a bash command?

bashcsvbinaryfiles

提问by JVE999

I have a csvfile which is just a simple comma-separated list of numbers. I want to convert this csvfile into a binary file (just a sequence of bytes, with each interpreted number being a number from the csvfile).

我有一个csv文件,它只是一个简单的逗号分隔的数字列表。我想将此csv文件转换为二进制文件(只是一个字节序列,每个解释的数字都是csv文件中的一个数字)。

The reason I am doing this is to be able to import audio data from a spreadsheet of values. In my import (I am using audacity), I have a few formats to choose from for the binary file:

我这样做的原因是能够从值的电子表格中导入音频数据。在我的导入中(我使用的是 audacity),我有几种格式可供选择用于二进制文件:

Encoding:
Signed 8, 24, 16, or 32 bit PCM
Unsigned 8 bit PCM
32 bit or 64 bit float
U-Law
A-Law
GSM 6.10
12, 16, or 24 bit DWVW
VOX ADPCM

Byte Order:
No endianness
Big endian
Little endian

I was moving along the lines of big endian 32-bit floatto keep things simple. I wanted to keep things as simple as possible, so I was thinking bashwould be the optimal tool.

我正在沿着big endian 32-bit float使事情简单的路线前进。我想让事情尽可能简单,所以我认为bash这是最佳工具。

回答by Dummy00001

I have a csvfile which is just a simple comma-separated list of numbers. I want to convert this csvfile into a binary file [...]

I was moving along the lines of big endian 32-bit floatto keep things simple.

我有一个csv文件,它只是一个简单的逗号分隔的数字列表。我想将此csv文件转换为二进制文件 [...]

我正在沿着big endian 32-bit float使事情简单的路线前进。

Not sure how to do it in pure bash(actually doubt that it is doable, since float as binary is non-standard conversion).

不确定如何以纯方式执行bash(实际上怀疑它是否可行,因为 float 作为二进制是非标准转换)。

But here it is with a simple Perl one-liner:

但这里有一个简单的 Perl 单行:

$ cat example1.csv
1.0
2.1
3.2
4.3

$ cat example1.csv | perl -ne 'print pack("f>*", split(/\s*,\s*/))' > example1.bin

$ hexdump -C < example1.bin
00000000  3f 80 00 00 40 06 66 66  40 4c cc cd 40 89 99 9a  |[email protected]@L..@...|
00000010

It uses the Perl's pack functionwith fto convert floats to binary, and <to convert them into BE. (I have also added the split in case of multiple numbers per CSV line.)

它使用 Perl 的pack 函数withf将浮点数转换为二进制,<并将它们转换为 BE。(我还添加了拆分,以防每个 CSV 行有多个数字。)

P.S. The command to convert to integers to 16-bit shorts with native endianness:

PS 将整数转换为具有本机字节序的 16 位 short 的命令:

perl -ne 'print pack("s*", split(/\s*,\s*/))'

Use "s>*"for BE, or "s<*"for LE, instead of the "s*".

使用"s>*"的是或"s<*"为LE,而不是"s*"

P.P.S. If it is audio data, you can also check the soxtool. Haven't used it in ages, but IIRC it could convert anything PCM-like from literally any format to any format, while also applying effects.

PPS 如果是音频数据,也可以查看sox工具。很久没有使用它了,但是 IIRC 它可以将任何类似 PCM 的格式从任何格式转换为任何格式,同时还可以应用效果。

回答by Brian Cain

I would recommend Python over bash. For this particular task, it's simpler/saner IMO.

我会推荐 Python 而不是bash. 对于此特定任务,IMO 更简单/更理智。

#!/usr/bin/env python

import array

with open('input.csv', 'rt') as f:
    text = f.read()
    entries = text.split(',')
    values = [int(x) for x in entries]
    # do a scalar here: if your input goes from [-100, 100] then
    #   you may need to translate/scale into [0, 2^16-1] for
    #   16-bit PCM
    # e.g.:
    #   values = [(val * scale) for val in values]

with open('output.pcm', 'wb') as out:
    pcm_vals = array.array('h', values) # 16-bit signed
    pcm_vals.tofile(out)

You could also use Python's wavemoduleinstead of just writing raw PCM.

您还可以使用Python 的wave模块,而不仅仅是编写原始 PCM。

Here's how the example above works:

下面是上面例子的工作原理:

$ echo 1,2,3,4,5,6,7 > input.csv
$ ./so_pcm.py
$ xxd output.pcm
0000000: 0100 0200 0300 0400 0500 0600 0700       ..............

xxdshows the binary values. It used my machine's native endianness (little).

xxd显示二进制值。它使用了我机器的本机字节序(小)。