Python 什么是使用 pyaudio 时的块、样本和帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35970282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are chunks, samples and frames when using pyaudio
提问by shiva
After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.
在浏览了 pyaudio 的文档并阅读了网络上的其他一些文章后,我很困惑我的理解是否正确。
This is the code for audio recording found on pyaudio's site:
这是在 pyaudio 网站上找到的录音代码:
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
and if I add these lines then I am able to play whatever I recorded:
如果我添加这些行,那么我就可以播放我录制的任何内容:
play=pyaudio.PyAudio()
stream_play=play.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
output=True)
for data in frames:
stream_play.write(data)
stream_play.stop_stream()
stream_play.close()
play.terminate()
- "RATE" is the number of samples collected per second.
- "CHUNK" is the number of frames in the buffer.
- Each frame will have 2 samples as "CHANNELS=2".
- Size of each sample is 2 bytes, calculated using the function:
pyaudio.get_sample_size(pyaudio.paInt16)
. - Therefore size of each frame is 4 bytes.
- In the "frames" list, size of each element must be 1024*4 bytes, for example, size of
frames[0]
must be 4096 bytes. However,sys.getsizeof(frames[0])
returns4133
, butlen(frames[0])
returns4096
. for
loop executesint(RATE / CHUNK * RECORD_SECONDS)
times, I cant understand why. Hereis the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he saysCHUNK=bytes
. And according to his explanation, it must beint(RATE / (CHUNK*2) * RECORD_SECONDS)
as(CHUNK*2)
is the number of samples read in buffer with each iteration.- Finally when I write
print frames[0]
, it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal usingstruct
module? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?
- “RATE”是每秒收集的样本数。
- “CHUNK”是缓冲区中的帧数。
- 每帧将有 2 个样本作为“CHANNELS=2”。
- 每个样本的大小为 2 个字节,使用函数计算:
pyaudio.get_sample_size(pyaudio.paInt16)
. - 因此每帧的大小为 4 个字节。
- 在“frames”列表中,每个元素的大小必须为1024*4字节,例如,大小
frames[0]
必须为4096字节。然而,sys.getsizeof(frames[0])
返回4133
,但len(frames[0])
返回4096
。 for
循环执行int(RATE / CHUNK * RECORD_SECONDS)
次数,我不明白为什么。这是“Ruben Sanchez”回答的相同问题,但我不确定他说的是否正确CHUNK=bytes
。而根据他的解释,那一定是int(RATE / (CHUNK*2) * RECORD_SECONDS)
因为(CHUNK*2)
是样品的数量读取缓冲区每次迭代。- 最后,当我写入时
print frames[0]
,它会打印出乱码,因为它试图将字符串视为 ASCII 编码,而实际上并非如此,它只是一个字节流。那么如何使用struct
模块以十六进制打印这个字节流?如果稍后,我用我选择的值更改每个十六进制值,它还会产生可播放的声音吗?
Whatever I wrote above was my understanding of the things and many of them maybe wrong.
我上面写的都是我对事情的理解,其中许多可能是错误的。
回答by Matthias
- "RATE" is the "sampling rate", i.e. the number of framesper second
- "CHUNK" is the (arbitrarily chosen) number of framesthe (potentially very long) signals are split into in this example
- Yes, each frame will have 2 samples as "CHANNELS=2", but the term "samples" is seldom used in this context (because it is confusing)
- Yes, size of each sample is 2 bytes (= 16 bits) in this example
- Yes, size of each frame is 4 bytes
- Yes, each element of "frames" should be 4096 bytes.
sys.getsizeof()
reports the storage space needed by the Python interpreter, which is typically a bit more than the actual size of the raw data. RATE * RECORD_SECONDS
is the number of framesthat should be recorded. Since thefor
loop is not repeated for each framebut only for each chunk, the number of loops has to be divided by the chunk sizeCHUNK
. This has nothing to do with samples, so there is no factor of2
involved.- If you really want to see the hexadecimal values, you can try something like
[hex(x) for x in frames[0]]
. If you want to get the actual 2-byte numbers use the format string'<H'
with thestruct
module.
- “RATE”是“采样率”,即数量帧每秒
- “CHUNK”是(任意选择的)信号在本例中被分割成的(可能很长)的帧数
- 是的,每帧将有 2 个样本作为“CHANNELS=2”,但术语“样本”在这种情况下很少使用(因为它令人困惑)
- 是的,在此示例中,每个样本的大小为 2 个字节(= 16 位)
- 是的,每帧的大小是 4 个字节
- 是的,“帧”的每个元素应该是 4096 字节。
sys.getsizeof()
报告 Python 解释器所需的存储空间,通常比原始数据的实际大小多一点。 RATE * RECORD_SECONDS
是应该记录的帧数。由于for
循环不是针对每个帧重复,而是针对每个块重复,因此循环次数必须除以块大小CHUNK
。这与samples无关,因此没有2
涉及的因素。- 如果您真的想查看十六进制值,可以尝试类似
[hex(x) for x in frames[0]]
. 如果你想获得实际的2字节数使用格式字符串'<H'
与struct
模块。
You might be interested in my tutorial about reading WAV files with the wave
module, which covers some of your questions in more detail: http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-wave.ipynb
您可能对我关于使用wave
模块阅读 WAV 文件的教程感兴趣,其中更详细地涵盖了您的一些问题:http: //nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files /audio-files-with-wave.ipynb