Python 下采样 wav 音频文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30619740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Downsampling wav audio file
提问by d3cr1pt0r
I have to downsample a wav file from 44100Hz to 16000Hz without using any external Python libraries, so preferably wave
and/or audioop
. I tried just changing the wav files framerate to 16000 by using setframerate
function but that just slows down the entire recording. How can I just downsample the audio file to 16kHz and maintain the same length of the audio?
我必须在不使用任何外部 Python 库的情况下将 wav 文件从 44100Hz 下采样到 16000Hz,所以最好wave
和/或audioop
. 我尝试使用setframerate
函数将 wav 文件的帧速率更改为 16000,但这只会减慢整个录制的速度。我怎样才能将音频文件下采样到 16kHz 并保持相同的音频长度?
回答by jcoppens
You can use resample in scipy
. It's a bit of a headache to do, because there's some type conversion to be done between the bytestring
native to python and the arrays needed in scipy
. There's another headache, because in the wave module in Python, there is no way to tell if the data is signed or not (only if it's 8 or 16 bits). It might (should) work for both, but I haven't tested it.
您可以在scipy
. 这样做有点头疼,因为在bytestring
原生到 python 和scipy
. 还有一个令人头疼的问题,因为在 Python 的 wave 模块中,无法判断数据是否已签名(仅当它是 8 位或 16 位时)。它可能(应该)对两者都有效,但我还没有测试过。
Here's a small program which converts (unsigned) 8 and 16 bits mono from 44.1 to 16. If you have stereo, or use other formats, it shouldn't be that difficult to adapt. Edit the input/output names at the start of the code. Never got around to use the command line arguments.
这是一个将(无符号的)8 位和 16 位单声道从 44.1 转换为 16 的小程序。如果您有立体声,或使用其他格式,适应起来应该不难。在代码开头编辑输入/输出名称。从来没有使用过命令行参数。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# downsample.py
#
# Copyright 2015 John Coppens <[email protected]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
inwave = "sine_44k.wav"
outwave = "sine_16k.wav"
import wave
import numpy as np
import scipy.signal as sps
class DownSample():
def __init__(self):
self.in_rate = 44100.0
self.out_rate = 16000.0
def open_file(self, fname):
try:
self.in_wav = wave.open(fname)
except:
print("Cannot open wav file (%s)" % fname)
return False
if self.in_wav.getframerate() != self.in_rate:
print("Frame rate is not %d (it's %d)" % \
(self.in_rate, self.in_wav.getframerate()))
return False
self.in_nframes = self.in_wav.getnframes()
print("Frames: %d" % self.in_wav.getnframes())
if self.in_wav.getsampwidth() == 1:
self.nptype = np.uint8
elif self.in_wav.getsampwidth() == 2:
self.nptype = np.uint16
return True
def resample(self, fname):
self.out_wav = wave.open(fname, "w")
self.out_wav.setframerate(self.out_rate)
self.out_wav.setnchannels(self.in_wav.getnchannels())
self.out_wav.setsampwidth (self.in_wav.getsampwidth())
self.out_wav.setnframes(1)
print("Nr output channels: %d" % self.out_wav.getnchannels())
audio = self.in_wav.readframes(self.in_nframes)
nroutsamples = round(len(audio) * self.out_rate/self.in_rate)
print("Nr output samples: %d" % nroutsamples)
audio_out = sps.resample(np.fromstring(audio, self.nptype), nroutsamples)
audio_out = audio_out.astype(self.nptype)
self.out_wav.writeframes(audio_out.copy(order='C'))
self.out_wav.close()
def main():
ds = DownSample()
if not ds.open_file(inwave): return 1
ds.resample(outwave)
return 0
if __name__ == '__main__':
main()
回答by d3cr1pt0r
Thank you all for your answers. I found a solution already and it works very nice. Here is the whole function.
谢谢大家的答案。我已经找到了一个解决方案,而且效果很好。这是整个功能。
def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
if not os.path.exists(src):
print 'Source not found!'
return False
if not os.path.exists(os.path.dirname(dst)):
os.makedirs(os.path.dirname(dst))
try:
s_read = wave.open(src, 'r')
s_write = wave.open(dst, 'w')
except:
print 'Failed to open files!'
return False
n_frames = s_read.getnframes()
data = s_read.readframes(n_frames)
try:
converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
if outchannels == 1:
converted = audioop.tomono(converted[0], 2, 1, 0)
except:
print 'Failed to downsample wav'
return False
try:
s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
s_write.writeframes(converted)
except:
print 'Failed to write wav'
return False
try:
s_read.close()
s_write.close()
except:
print 'Failed to close wav files'
return False
return True
回答by wafflecat
You can use Librosa's load() function,
您可以使用 Librosa 的 load() 函数,
import librosa
y, s = librosa.load('test.wav', sr=8000) # Downsample 44.1kHz to 8kHz
The extra effort to install Librosa is probably worth the peace of mind.
安装 Librosa 的额外努力可能值得高枕无忧。
Pro-tip: when installing Librosa on Anaconda, you need to install ffmpegas well, so
专业提示:在 Anaconda 上安装 Librosa 时,您还需要安装 ffmpeg,所以
pip install librosa
conda install -c conda-forge ffmpeg
This saves you the NoBackendError() error.
这为您节省了 NoBackendError() 错误。
回答by Jeremy Cochoy
To downsample (also called decimate) your signal (it means to reduce the sampling rate), or upsample (increase the sampling rate) you need to interpolate between your data.
要对信号进行下采样(也称为抽取)(这意味着降低采样率)或上采样(增加采样率),您需要在数据之间进行插值。
The idea is that you need to somehow drawa curve between your points, and then take values from this curve at the new sampling rate. This is because you want to know the value of the sound wave at some time that wasn't sampled, so you have to guess this value by one way or an other. The only case where subsampling would be easy is when you divide the sampling rate by an integer $k$. In this case, you just have to take buckets of $k$ samples and keep only the first one. But this won't answer your question. See the picture below where you have a curve sampled at two different scales.
这个想法是您需要以某种方式在您的点之间绘制一条曲线,然后以新的采样率从这条曲线中获取值。这是因为您想知道未采样的某个时间的声波值,因此您必须通过一种或另一种方式来猜测该值。子采样很容易的唯一情况是将采样率除以整数 $k$。在这种情况下,您只需要获取 $k$ 样本桶并仅保留第一个。但这不会回答你的问题。请参见下图,其中您有一条以两种不同比例采样的曲线。
You could do it by hand if you understand the principle, but I strongly recommend you to use a library. The reason is that interpolating the right wayisn't easy or either obvious.
如果您了解原理,您可以手动完成,但我强烈建议您使用库。原因是插入正确的方式并不容易,也不明显。
You could use a linear interpolation (connect points with a line) or a binomial interpolation (connect three points with a piece of polynom) or (sometimes the best for sound) use a Fourier transform and interpolate in the space of frequency.
Since fourier transform isn't something you want to re-write by hand, if you want a good subsampling/supsampling,
See the following picture for two curves of upsampling using a different algorithm from scipy. The "resampling" function use fourier transform.
您可以使用线性插值(用线连接点)或二项式插值(用一个多项式连接三个点)或(有时最适合声音)使用傅立叶变换并在频率空间内插值。由于傅立叶变换不是您想手动重写的东西,如果您想要一个好的下采样/上采样,请参阅下图,了解使用与 scipy 不同的算法的上采样的两条曲线。“重采样”函数使用傅立叶变换。
I was indeed in the case I was loading a 44100Hz wave file and required a 48000Hz sampled data, so I wrote the few following lines to load my data:
我确实是在加载 44100Hz 波形文件并需要 48000Hz 采样数据的情况下,所以我写了以下几行来加载我的数据:
# Imports
from scipy.io import wavfile
import scipy.signal as sps
# Your new sampling rate
new_rate = 48000
# Read file
sampling_rate, data = wavfile.read(path)
# Resample data
number_of_samples = round(len(data) * float(new_rate) / sampling_rate)
data = sps.resample(data, number_of_samples)
Notice you can also use the method decimatein the case you are only doing downsampling and want something faster than fourier.
请注意,如果您只进行下采样并且想要比傅立叶更快的东西,您也可以使用抽取方法。
回答by Gowtham S
I tried using Librosa but for some reasons even after giving the line y, s = librosa.load('test.wav', sr=16000)
and librosa.output.write_wav(filename, y, sr)
, the sound files are not getting saved with the given sample rate(16000, downsampled from 44kHz).
But pydub
works well. An awesome library by jiaaro, I used the following commands:
我尝试使用 Librosa,但由于某些原因,即使在给出了行y, s = librosa.load('test.wav', sr=16000)
和之后librosa.output.write_wav(filename, y, sr)
,声音文件也没有以给定的采样率(16000,从 44kHz 向下采样)保存。但pydub
效果很好。jiaaro 的一个很棒的库,我使用了以下命令:
from pydub import AudioSegment as am
sound = am.from_file(filepath, format='wav', frame_rate=22050)
sound = sound.set_frame_rate(16000)
sound.export(filepath, format='wav')
The above code states that the file that I reading with a frame_rate of 22050 is changed to rate of 16000 and export
function overwrites the existing files with this file with a new frame_rate. It works better than librosa but I am looking ways to compare the speed between two packages but haven't yet figured it out since I have very less data !!!
上面的代码表明,我以 22050 的帧速率读取的文件更改为速率 16000,并且export
函数使用新的帧速率用此文件覆盖现有文件。它比 librosa 效果更好,但我正在寻找比较两个包之间速度的方法,但由于数据很少,所以还没有弄清楚!!!