Python 下采样 wav 音频文件

Question

提问by d3cr1pt0r

I have to downsample a wav file from 44100Hz to 16000Hz without using any external Python libraries, so preferably waveand/or audioop. I tried just changing the wav files framerate to 16000 by using setframeratefunction but that just slows down the entire recording. How can I just downsample the audio file to 16kHz and maintain the same length of the audio?

我必须在不使用任何外部 Python 库的情况下将 wav 文件从 44100Hz 下采样到 16000Hz，所以最好wave和/或audioop. 我尝试使用setframerate函数将 wav 文件的帧速率更改为 16000，但这只会减慢整个录制的速度。我怎样才能将音频文件下采样到 16kHz 并保持相同的音频长度？

Answer 1

回答by jcoppens

You can use resample in scipy. It's a bit of a headache to do, because there's some type conversion to be done between the bytestringnative to python and the arrays needed in scipy. There's another headache, because in the wave module in Python, there is no way to tell if the data is signed or not (only if it's 8 or 16 bits). It might (should) work for both, but I haven't tested it.

您可以在scipy. 这样做有点头疼，因为在bytestring原生到 python 和scipy. 还有一个令人头疼的问题，因为在 Python 的 wave 模块中，无法判断数据是否已签名（仅当它是 8 位或 16 位时）。它可能（应该）对两者都有效，但我还没有测试过。

Here's a small program which converts (unsigned) 8 and 16 bits mono from 44.1 to 16. If you have stereo, or use other formats, it shouldn't be that difficult to adapt. Edit the input/output names at the start of the code. Never got around to use the command line arguments.

这是一个将（无符号的）8 位和 16 位单声道从 44.1 转换为 16 的小程序。如果您有立体声，或使用其他格式，适应起来应该不难。在代码开头编辑输入/输出名称。从来没有使用过命令行参数。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  downsample.py
#  
#  Copyright 2015 John Coppens <[email protected]>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#

inwave = "sine_44k.wav"
outwave = "sine_16k.wav"

import wave
import numpy as np
import scipy.signal as sps

class DownSample():
    def __init__(self):
        self.in_rate = 44100.0
        self.out_rate = 16000.0

    def open_file(self, fname):
        try:
            self.in_wav = wave.open(fname)
        except:
            print("Cannot open wav file (%s)" % fname)
            return False

        if self.in_wav.getframerate() != self.in_rate:
            print("Frame rate is not %d (it's %d)" % \
                  (self.in_rate, self.in_wav.getframerate()))
            return False

        self.in_nframes = self.in_wav.getnframes()
        print("Frames: %d" % self.in_wav.getnframes())

        if self.in_wav.getsampwidth() == 1:
            self.nptype = np.uint8
        elif self.in_wav.getsampwidth() == 2:
            self.nptype = np.uint16

        return True

    def resample(self, fname):
        self.out_wav = wave.open(fname, "w")
        self.out_wav.setframerate(self.out_rate)
        self.out_wav.setnchannels(self.in_wav.getnchannels())
        self.out_wav.setsampwidth (self.in_wav.getsampwidth())
        self.out_wav.setnframes(1)

        print("Nr output channels: %d" % self.out_wav.getnchannels())

        audio = self.in_wav.readframes(self.in_nframes)
        nroutsamples = round(len(audio) * self.out_rate/self.in_rate)
        print("Nr output samples: %d" %  nroutsamples)

        audio_out = sps.resample(np.fromstring(audio, self.nptype), nroutsamples)
        audio_out = audio_out.astype(self.nptype)

        self.out_wav.writeframes(audio_out.copy(order='C'))

        self.out_wav.close()

def main():
    ds = DownSample()
    if not ds.open_file(inwave): return 1
    ds.resample(outwave)
    return 0

if __name__ == '__main__':
    main()

Answer 2

回答by d3cr1pt0r

Thank you all for your answers. I found a solution already and it works very nice. Here is the whole function.

谢谢大家的答案。我已经找到了一个解决方案，而且效果很好。这是整个功能。

def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
    if not os.path.exists(src):
        print 'Source not found!'
        return False

    if not os.path.exists(os.path.dirname(dst)):
        os.makedirs(os.path.dirname(dst))

    try:
        s_read = wave.open(src, 'r')
        s_write = wave.open(dst, 'w')
    except:
        print 'Failed to open files!'
        return False

    n_frames = s_read.getnframes()
    data = s_read.readframes(n_frames)

    try:
        converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
        if outchannels == 1:
            converted = audioop.tomono(converted[0], 2, 1, 0)
    except:
        print 'Failed to downsample wav'
        return False

    try:
        s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
        s_write.writeframes(converted)
    except:
        print 'Failed to write wav'
        return False

    try:
        s_read.close()
        s_write.close()
    except:
        print 'Failed to close wav files'
        return False

    return True

Answer 3

回答by wafflecat

You can use Librosa's load() function,

您可以使用 Librosa 的 load() 函数，

import librosa    
y, s = librosa.load('test.wav', sr=8000) # Downsample 44.1kHz to 8kHz

The extra effort to install Librosa is probably worth the peace of mind.

安装 Librosa 的额外努力可能值得高枕无忧。

Pro-tip: when installing Librosa on Anaconda, you need to install ffmpegas well, so

专业提示：在 Anaconda 上安装 Librosa 时，您还需要安装 ffmpeg，所以

pip install librosa
conda install -c conda-forge ffmpeg

This saves you the NoBackendError() error.

这为您节省了 NoBackendError() 错误。

Answer 4

回答by Jeremy Cochoy

To downsample (also called decimate) your signal (it means to reduce the sampling rate), or upsample (increase the sampling rate) you need to interpolate between your data.

要对信号进行下采样（也称为抽取）（这意味着降低采样率）或上采样（增加采样率），您需要在数据之间进行插值。

The idea is that you need to somehow drawa curve between your points, and then take values from this curve at the new sampling rate. This is because you want to know the value of the sound wave at some time that wasn't sampled, so you have to guess this value by one way or an other. The only case where subsampling would be easy is when you divide the sampling rate by an integer $k$. In this case, you just have to take buckets of $k$ samples and keep only the first one. But this won't answer your question. See the picture below where you have a curve sampled at two different scales.

这个想法是您需要以某种方式在您的点之间绘制一条曲线，然后以新的采样率从这条曲线中获取值。这是因为您想知道未采样的某个时间的声波值，因此您必须通过一种或另一种方式来猜测该值。子采样很容易的唯一情况是将采样率除以整数 $k$。在这种情况下，您只需要获取 $k$ 样本桶并仅保留第一个。但这不会回答你的问题。请参见下图，其中您有一条以两种不同比例采样的曲线。

You could do it by hand if you understand the principle, but I strongly recommend you to use a library. The reason is that interpolating the right wayisn't easy or either obvious.

如果您了解原理，您可以手动完成，但我强烈建议您使用库。原因是插入正确的方式并不容易，也不明显。

You could use a linear interpolation (connect points with a line) or a binomial interpolation (connect three points with a piece of polynom) or (sometimes the best for sound) use a Fourier transform and interpolate in the space of frequency. Since fourier transform isn't something you want to re-write by hand, if you want a good subsampling/supsampling, See the following picture for two curves of upsampling using a different algorithm from scipy. The "resampling" function use fourier transform.

您可以使用线性插值（用线连接点）或二项式插值（用一个多项式连接三个点）或（有时最适合声音）使用傅立叶变换并在频率空间内插值。由于傅立叶变换不是您想手动重写的东西，如果您想要一个好的下采样/上采样，请参阅下图，了解使用与 scipy 不同的算法的上采样的两条曲线。“重采样”函数使用傅立叶变换。

I was indeed in the case I was loading a 44100Hz wave file and required a 48000Hz sampled data, so I wrote the few following lines to load my data:

我确实是在加载 44100Hz 波形文件并需要 48000Hz 采样数据的情况下，所以我写了以下几行来加载我的数据：

    # Imports
    from scipy.io import wavfile
    import scipy.signal as sps

    # Your new sampling rate
    new_rate = 48000

    # Read file
    sampling_rate, data = wavfile.read(path)

    # Resample data
    number_of_samples = round(len(data) * float(new_rate) / sampling_rate)
    data = sps.resample(data, number_of_samples)

Notice you can also use the method decimatein the case you are only doing downsampling and want something faster than fourier.

请注意，如果您只进行下采样并且想要比傅立叶更快的东西，您也可以使用抽取方法。

Answer 5

回答by Gowtham S

I tried using Librosa but for some reasons even after giving the line y, s = librosa.load('test.wav', sr=16000)and librosa.output.write_wav(filename, y, sr), the sound files are not getting saved with the given sample rate(16000, downsampled from 44kHz). But pydubworks well. An awesome library by jiaaro, I used the following commands:

我尝试使用 Librosa，但由于某些原因，即使在给出了行y, s = librosa.load('test.wav', sr=16000)和之后librosa.output.write_wav(filename, y, sr)，声音文件也没有以给定的采样率（16000，从 44kHz 向下采样）保存。但pydub效果很好。jiaaro 的一个很棒的库，我使用了以下命令：

from pydub import AudioSegment as am
sound = am.from_file(filepath, format='wav', frame_rate=22050)
sound = sound.set_frame_rate(16000)
sound.export(filepath, format='wav')

The above code states that the file that I reading with a frame_rate of 22050 is changed to rate of 16000 and exportfunction overwrites the existing files with this file with a new frame_rate. It works better than librosa but I am looking ways to compare the speed between two packages but haven't yet figured it out since I have very less data !!!

上面的代码表明，我以 22050 的帧速率读取的文件更改为速率 16000，并且export函数使用新的帧速率用此文件覆盖现有文件。它比 librosa 效果更好，但我正在寻找比较两个包之间速度的方法，但由于数据很少，所以还没有弄清楚！！！

Refernce: https://github.com/jiaaro/pydub/issues/232

参考：https: //github.com/jiaaro/pydub/issues/232

Python 下采样 wav 音频文件

提问by d3cr1pt0r

回答by jcoppens

回答by d3cr1pt0r

回答by wafflecat

回答by Jeremy Cochoy

回答by Gowtham S

相关推荐

最近更新

标签

Python 下采样 wav 音频文件

提问by d3cr1pt0r

回答by jcoppens

回答by d3cr1pt0r

回答by wafflecat

回答by Jeremy Cochoy

回答by Gowtham S

相关推荐

在python中的列表列表中获取唯一值

Python pandas 按年份分组，按销售列排名，在具有重复数据的数据框中

Python ipsec.py 找不到属性 IPPROTO_ESP 和 socket.IPPROTO_AH

Python中的反向索引？

相关推荐

最近更新

标签