python 解释 WAV 数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2226853/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:07:12  来源:igfitidea点击:

Interpreting WAV Data

pythonaudiopcm

提问by SapphireSun

I'm trying to write a program to display PCM data. I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. However, I'm not sure how to interpret the data.

我正在尝试编写一个程序来显示 PCM 数据。我一直很沮丧试图找到一个具有正确抽象级别的库,但是我找到了 python wave 库并且一直在使用它。但是,我不确定如何解释数据。

The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). This all seems cheery, but then I tried printing a single frame:'\xc0\xff\xd0\xff' which is 4 bytes. I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there.

wave.getparams 函数返回(2 个通道,2 个字节,44100 Hz,96333 帧,无压缩,无压缩)。这一切看起来很愉快,但后来我尝试打印一个单帧:'\xc0\xff\xd0\xff',它是 4 个字节。我想一帧可能是 2 个样本,但歧义并不止于此。

96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds

96333 帧 * 2 个样本/帧 *(1/44.1k 秒/样本)= 4.3688 秒

However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. What's going on here?

但是,iTunes 报告的时间接近 2 秒,并且基于文件大小和比特率的计算大约为 2.7 秒。这里发生了什么?

Additionally, how am I to know if the bytes are signed or unsigned?

另外,我如何知道字节是有符号的还是无符号的?

Many thanks!

非常感谢!

回答by SapphireSun

Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it:

谢谢您的帮助!我让它工作了,我会在这里发布解决方案供每个人使用,以防其他可怜的灵魂需要它:

import wave
import struct

def pcm_channels(wave_file):
    """Given a file-like object or file path representing a wave file,
    decompose it into its constituent PCM data streams.

    Input: A file like object or file path
    Output: A list of lists of integers representing the PCM coded data stream channels
        and the sample rate of the channels (mixed rate channels not supported)
    """
    stream = wave.open(wave_file,"rb")

    num_channels = stream.getnchannels()
    sample_rate = stream.getframerate()
    sample_width = stream.getsampwidth()
    num_frames = stream.getnframes()

    raw_data = stream.readframes( num_frames ) # Returns byte data
    stream.close()

    total_samples = num_frames * num_channels

    if sample_width == 1: 
        fmt = "%iB" % total_samples # read unsigned chars
    elif sample_width == 2:
        fmt = "%ih" % total_samples # read signed 2 byte shorts
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    integer_data = struct.unpack(fmt, raw_data)
    del raw_data # Keep memory tidy (who knows how big it might be)

    channels = [ [] for time in range(num_channels) ]

    for index, value in enumerate(integer_data):
        bucket = index % num_channels
        channels[bucket].append(value)

    return channels, sample_rate

回答by Alex Martelli

"Two channels" means stereo, so it makes no sense to sumeach channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). As for signedness, as explained for example here, and I quote:

“两个通道”意味着立体声,因此将每个通道的持续时间相加是没有意义的——所以你的偏差是两倍(2.18 秒,而不是 4.37)。至于签名,正如这里解释的例子,我引用:

8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.

8 位样本存储为无符号字节,范围从 0 到 255。16 位样本存储为 2 的补码有符号整数,范围从 -32768 到 32767。

This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file.

这是 WAV 格式规范的一部分(实际上是它的超集 RIFF),因此不依赖于您用来处理 WAV 文件的库。

回答by Justin Peel

I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this.

我知道一个答案已经被接受,但我不久前用音频做了一些事情,你必须像这样解开 wave。

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

Also, one package that I used was called PyAudio, though I still had to use the wave package with it.

此外,我使用的一个包称为 PyAudio,但我仍然必须使用 wave 包。

回答by John La Rooy

Each sample is 16 bits and there 2 channels, so the frame takes 4 bytes

每个样本为 16 位,有 2 个通道,因此该帧需要 4 个字节

回答by mhawke

The duration is simply the number of frames divided by the number of frames per second. From your data this is: 96333 / 44100 = 2.18 seconds.

持续时间只是帧数除以每秒的帧数。根据您的数据,这是:96333 / 44100 = 2.18 seconds

回答by rudolfbyker

Building upon this answer, you can get a good performance boost by using numpy.fromstringor numpy.fromfile. Also see this answer.

基于此答案,您可以通过使用numpy.fromstringnumpy.fromfile获得良好的性能提升。另请参阅此答案

Here is what I did:

这是我所做的:

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):

    if sample_width == 1:
        dtype = np.uint8 # unsigned char
    elif sample_width == 2:
        dtype = np.int16 # signed 2-byte short
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    channels = np.fromstring(raw_bytes, dtype=dtype)

    if interleaved:
        # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
        channels.shape = (n_frames, n_channels)
        channels = channels.T
    else:
        # channels are not interleaved. All samples from channel M occur before all samples from channel M-1
        channels.shape = (n_channels, n_frames)

    return channels

Assigning a new value to shape will throw an error if it requires data to be copied in memory. This is a good thing, since you want to use the data in place (using less time and memory overall). The ndarray.T function also does not copy (i.e. returns a view) if possible, but I'm not sure how you ensurethat it does not copy.

如果需要将数据复制到内存中,则为 shape 分配新值将引发错误。这是一件好事,因为您希望就地使用数据(总体上使用更少的时间和内存)。如果可能, ndarray.T 函数也不会复制(即返回视图),但我不确定您如何确保它不会复制。

Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. I haven't tried this yet.

使用 np.fromfile 直接从文件中读取会更好,但您必须使用自定义 dtype 跳过标头。我还没有试过这个。