Python 什么是使用 pyaudio 时的块、样本和帧

Question

提问by shiva

After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.

在浏览了 pyaudio 的文档并阅读了网络上的其他一些文章后，我很困惑我的理解是否正确。

This is the code for audio recording found on pyaudio's site:

这是在 pyaudio 网站上找到的录音代码：

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

and if I add these lines then I am able to play whatever I recorded:

如果我添加这些行，那么我就可以播放我录制的任何内容：

play=pyaudio.PyAudio()
stream_play=play.open(format=FORMAT,
                      channels=CHANNELS,
                      rate=RATE,
                      output=True)
for data in frames: 
    stream_play.write(data)
stream_play.stop_stream()
stream_play.close()
play.terminate()

"RATE" is the number of samples collected per second.
"CHUNK" is the number of frames in the buffer.
Each frame will have 2 samples as "CHANNELS=2".
Size of each sample is 2 bytes, calculated using the function: pyaudio.get_sample_size(pyaudio.paInt16).
Therefore size of each frame is 4 bytes.
In the "frames" list, size of each element must be 1024*4 bytes, for example, size of frames[0]must be 4096 bytes. However, sys.getsizeof(frames[0])returns 4133, but len(frames[0])returns 4096.
forloop executes int(RATE / CHUNK * RECORD_SECONDS)times, I cant understand why. Hereis the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he says CHUNK=bytes. And according to his explanation, it must be int(RATE / (CHUNK*2) * RECORD_SECONDS)as (CHUNK*2)is the number of samples read in buffer with each iteration.
Finally when I write print frames[0], it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal using structmodule? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?

“RATE”是每秒收集的样本数。
“CHUNK”是缓冲区中的帧数。
每帧将有 2 个样本作为“CHANNELS=2”。
每个样本的大小为 2 个字节，使用函数计算：pyaudio.get_sample_size(pyaudio.paInt16).
因此每帧的大小为 4 个字节。
在“frames”列表中，每个元素的大小必须为1024*4字节，例如，大小frames[0]必须为4096字节。然而， sys.getsizeof(frames[0])返回4133，但len(frames[0])返回4096。
for循环执行int(RATE / CHUNK * RECORD_SECONDS)次数，我不明白为什么。这是“Ruben Sanchez”回答的相同问题，但我不确定他说的是否正确CHUNK=bytes。而根据他的解释，那一定是int(RATE / (CHUNK*2) * RECORD_SECONDS)因为(CHUNK*2)是样品的数量读取缓冲区每次迭代。
最后，当我写入时print frames[0]，它会打印出乱码，因为它试图将字符串视为 ASCII 编码，而实际上并非如此，它只是一个字节流。那么如何使用struct模块以十六进制打印这个字节流？如果稍后，我用我选择的值更改每个十六进制值，它还会产生可播放的声音吗？

Whatever I wrote above was my understanding of the things and many of them maybe wrong.

我上面写的都是我对事情的理解，其中许多可能是错误的。

Answer 1

回答by Matthias

"RATE" is the "sampling rate", i.e. the number of framesper second
"CHUNK" is the (arbitrarily chosen) number of framesthe (potentially very long) signals are split into in this example
Yes, each frame will have 2 samples as "CHANNELS=2", but the term "samples" is seldom used in this context (because it is confusing)
Yes, size of each sample is 2 bytes (= 16 bits) in this example
Yes, size of each frame is 4 bytes
Yes, each element of "frames" should be 4096 bytes. sys.getsizeof()reports the storage space needed by the Python interpreter, which is typically a bit more than the actual size of the raw data.
RATE * RECORD_SECONDSis the number of framesthat should be recorded. Since the forloop is not repeated for each framebut only for each chunk, the number of loops has to be divided by the chunk size CHUNK. This has nothing to do with samples, so there is no factor of 2involved.
If you really want to see the hexadecimal values, you can try something like [hex(x) for x in frames[0]]. If you want to get the actual 2-byte numbers use the format string '<H'with the structmodule.

“RATE”是“采样率”，即数量帧每秒
“CHUNK”是（任意选择的）信号在本例中被分割成的（可能很长）的帧数
是的，每帧将有 2 个样本作为“CHANNELS=2”，但术语“样本”在这种情况下很少使用（因为它令人困惑）
是的，在此示例中，每个样本的大小为 2 个字节（= 16 位）
是的，每帧的大小是 4 个字节
是的，“帧”的每个元素应该是 4096 字节。sys.getsizeof()报告 Python 解释器所需的存储空间，通常比原始数据的实际大小多一点。
RATE * RECORD_SECONDS是应该记录的帧数。由于for循环不是针对每个帧重复，而是针对每个块重复，因此循环次数必须除以块大小CHUNK。这与samples无关，因此没有2涉及的因素。
如果您真的想查看十六进制值，可以尝试类似[hex(x) for x in frames[0]]. 如果你想获得实际的2字节数使用格式字符串'<H'与struct模块。

You might be interested in my tutorial about reading WAV files with the wavemodule, which covers some of your questions in more detail: http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-wave.ipynb

您可能对我关于使用wave模块阅读 WAV 文件的教程感兴趣，其中更详细地涵盖了您的一些问题：http: //nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files /audio-files-with-wave.ipynb

Python 什么是使用 pyaudio 时的块、样本和帧

提问by shiva

回答by Matthias

相关推荐

最近更新

标签

Python 什么是使用 pyaudio 时的块、样本和帧

提问by shiva

回答by Matthias

相关推荐

错误！C:\file\example.db 不是 UTF-8 编码的 ipython 笔记本

Python 无法使用 anaconda 更新到 numpy 1.13？

如何在 PyCharm 中安装 MySQLdb，Windows，python2.7

Python 在 NumPy 数组的每一行（逐行）上应用函数

相关推荐

最近更新

标签