如何使用来自 Java Sound 的音频样本数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26824663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 10:41:35  来源:igfitidea点击:

How do I use audio sample data from Java Sound?

javaaudiojavasoundaudio-processingjavax.sound.sampled

提问by Radiodef

This question is usually asked as a part of another question but it turns out that the answer is long. I've decided to answer it here so I can link to it elsewhere.

这个问题通常作为另一个问题的一部分提出,但事实证明答案很长。我决定在这里回答它,以便我可以在其他地方链接到它。

Although I'm not aware of a way that Java can produce audio samples for us at this time, if that changes in the future, this can be a place for it. I know that JavaFXhas some stuff like this, for example AudioSpectrumListener, but still not a way to access samples directly.

虽然我目前不知道 Java 可以为我们生成音频样本的方式,但如果将来发生变化,这可以成为它的地方。我知道JavaFX有一些这样的东西,例如AudioSpectrumListener,但仍然不是直接访问样本的方法。



I'm using javax.sound.sampledfor playback and/or recording but I'd like to do something with the audio.

javax.sound.sampled用于播放和/或录音,但我想对音频做一些事情。

Perhaps I'd like to display it visually or process it in some way.

也许我想直观地显示它或以某种方式处理它。

How do I access audio sample data to do that with Java Sound?

如何使用 Java Sound 访问音频样本数据?

See also:

也可以看看:

回答by Radiodef

Well, the simplest answer is that at the moment Java can't produce sample data for the programmer.

嗯,最简单的答案是目前 Java 无法为程序员生成示例数据。

This quote is from the official tutorial:

此引用来自官方教程

There are two ways to apply signal processing:

  • You can use any processing supported by the mixer or its component lines, by querying for Controlobjects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.

  • If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.

This page discusses the first technique in greater detail, because there is no special API for the second technique.

有两种应用信号处理的方法:

  • 您可以使用混音器或其组件线支持的任何处理,方法是查询Control对象,然后根据用户需要设置控件。混音器和线路支持的典型控制包括增益、声像和混响控制。

  • 如果混音器或其线路没有提供您需要的处理类型,您的程序可以直接对音频字节进行操作,根据需要对其进行操作。

本页更详细地讨论第一种技术,因为第二种技术没有特殊的 API

Playback with javax.sound.sampledlargely acts as a bridge between the file and the audio device. The bytes are read in from the file and sent off.

播放在javax.sound.sampled很大程度上充当文件和音频设备之间的桥梁。从文件中读取字节并发送出去。

Don't assume the bytes are meaningful audio samples! Unless you happen to have an 8-bit AIFF file, they aren't. (On the other hand, if the samples aredefinitely 8-bit signed, you cando arithmetic with them. Using 8-bit is one way to avoid the complexity described here, if you're just playing around.)

不要假设字节是有意义的音频样本!除非您碰巧有一个 8 位 AIFF 文件,否则它们不是。(在另一方面,如果样品肯定的8位有符号,你可以做算术和他们在一起。使用8位的一种方式,以避免复杂这里所描述的,如果你只是玩弄。)

So instead, I'll enumerate the types of AudioFormat.Encodingand describe how to decode them yourself. This answer will notcover how to encode them, but it's included in the complete code example at the bottom. Encoding is mostly just the decoding process in reverse.

因此,我将枚举类型AudioFormat.Encoding并描述如何自己解码它们。这个答案不会涵盖如何编码它们,但它包含在底部的完整代码示例中。编码大多只是相反的解码过程。

This is a long answer but I wanted to give a thorough overview.

这是一个很长的答案,但我想给出一个全面的概述。



A Little About Digital Audio

关于数字音频的一点

Generally when digital audio is explained, we're referring to Linear Pulse-Code Modulation(LPCM).

通常,在解释数字音频时,我们指的是线性脉冲编码调制(LPCM)。

A continuous sound wave is sampled at regular intervals and the amplitudes are quantized to integers of some scale.

连续声波以固定间隔采样,振幅被量化为某个尺度的整数。

Shown here is a sine wave sampled and quantized to 4-bit:

此处显示的是采样并量化为 4 位的正弦波:

lpcm_graph

lpcm_graph

(Notice that the most positive value in two's complementrepresentation is 1 less than the most negative value. This is a minor detail to be aware of. For example if you're clipping audio and forget this, the positive clips will overflow.)

(请注意,二进制补码表示中的最大正值比最大负值小 1。这是一个需要注意的小细节。例如,如果您正在剪辑音频而忘记了这一点,则正剪辑将溢出。)

When we have audio on the computer, we have an array of these samples. A sample array is what we want to turn the bytearray in to.

当我们在计算机上有音频时,我们就有了这些样本的数组。样本数组是我们想要将byte数组转换为的内容。

To decode PCM samples, we don't care much about the sample rate or number of channels, so I won't be saying much about them here. Channels are usually interleaved, so that if we had an array of them, they'd be stored like this:

要解码 PCM 样本,我们不太关心采样率或通道数,因此我不会在这里过多讨论它们。通道通常是交错的,所以如果我们有一个它们的数组,它们会像这样存储:

Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...

In other words, for stereo, the samples in the array just alternate between left and right.

换句话说,对于立体声,阵列中的样本只是左右交替。



Some Assumptions

一些假设

All of the code examples will assume the following declarations:

所有代码示例都将假定以下声明:

  • byte[] bytes;The bytearray, read from the AudioInputStream.
  • float[] samples;The output sample array that we're going to fill.
  • float sample;The sample we're currently working on.
  • long temp;An interim value used for general manipulation.
  • int i;The position in the bytearray where the current sample's data starts.
  • byte[] bytes;byte阵列,从所述读出AudioInputStream
  • float[] samples;我们要填充的输出样本数组。
  • float sample;我们目前正在处理的样本。
  • long temp;用于一般操作的临时值。
  • int i;byte当前样本数据在数组中的位置。

We'll normalize all of the samples in our float[]array to the range of -1f <= sample <= 1f. All of the floating-point audio I've seen comes this way and it's pretty convenient.

我们会将float[]数组中的所有样本归一化到 的范围内-1f <= sample <= 1f。我见过的所有浮点音频都是这种方式,非常方便。

If our source audio doesn't already come like that (as is for e.g. integer samples), we can normalize them ourselves using the following:

如果我们的源音频还没有像这样(例如整数样本),我们可以使用以下方法自己规范化它们:

sample = sample / fullScale(bitsPerSample);

Where fullScaleis 2bitsPerSample - 1, i.e. Math.pow(2, bitsPerSample-1).

fullScale2 bitsPerSample-1在哪里,即Math.pow(2, bitsPerSample-1)



How do I coerce the bytearray in to meaningful data?

如何将byte数组强制转换为有意义的数据?

The bytearray contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each sample packet.

byte数组包含拆分成一行的样本帧。这实际上非常简单,除了称为endianness 的东西,它是byte每个样本数据包中 s的顺序。

Here's a diagram. This sample (packed in to a bytearray) holds the decimal value 9999:

这是一个图表。此示例(打包到byte数组中)包含十进制值 9999:

  24-bit sample as big-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00000000     00100111     00001111

 24-bit sample as little-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00001111     00100111     00000000

They hold the same binary values; however, the byteorders are reversed.

它们持有相同的二进制值;但是,byte顺序颠倒了。

  • In big-endian, the more significant bytes come before the less significant bytes.
  • In little-endian, the less significant bytes come before the more significant bytes.
  • 在 big-endian 中,更重要的bytes 在不太重要的bytes之前。
  • 在 little-endian 中,较不重要的bytes在较重要的s 之前bytes

WAVfiles are stored in little-endian order and AIFF filesare stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian.

WAV文件以小端顺序存储,AIFF 文件以大端顺序存储。可以从AudioFormat.isBigEndian.

To concatenate the bytes and put them in to our long tempvariable, we:

要连接bytes 并将它们放入我们的long temp变量中,我们:

  1. Bitwise AND each bytewith the mask 0xFF(which is 0b1111_1111) to avoid sign-extensionwhen the byteis automatically promoted. (char, byteand shortare promoted to intwhen arithmetic is performed on them.) See also What does value & 0xffdo in Java?
  2. Bit shift each bytein to position.
  3. Bitwise OR the bytes together.
  1. 按位 AND 每个都byte带有掩码0xFF(即0b1111_1111),以避免在自动提升时进行符号扩展byte。(charbyteshort提升到int时对它们进行运算。)另见什么是value & 0xffJava中吗?
  2. 将每个位移byte到位置。
  3. 按位或将bytes 放在一起。

Here's a 24-bit example:

这是一个 24 位示例:

long temp;
if (isBigEndian) {
    temp = (
          ((bytes[i    ] & 0xffL) << 16)
        | ((bytes[i + 1] & 0xffL) <<  8)
        |  (bytes[i + 2] & 0xffL)
    );
} else {
    temp = (
           (bytes[i    ] & 0xffL)
        | ((bytes[i + 1] & 0xffL) <<  8)
        | ((bytes[i + 2] & 0xffL) << 16)
    );
}

Notice that the shift order is reversed based on endianness.

请注意,移位顺序是根据字节顺序颠倒的。

This can also be generalized to a loop, which can be seen in the full code at the bottom of this answer. (See the unpackAnyBitand packAnyBitmethods.)

这也可以推广到循环,可以在此答案底部的完整代码中看到。(见unpackAnyBitpackAnyBit方法。)

Now that we have the bytes concatenated together, we can take a few more steps to turn them in to a sample. The next steps depend on the actual encoding.

现在我们将bytes 连接在一起,我们可以采取更多步骤将它们转换为样本。接下来的步骤取决于实际的编码。

How do I decode Encoding.PCM_SIGNED?

我如何解码Encoding.PCM_SIGNED

The two's complement sign must be extended. This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1s. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set, so I usually do it this way:

必须扩展二进制补码。这意味着如果最高有效位 (MSB) 设置为 1,我们将用 1 填充其上方的所有位。>>如果设置了符号位,算术右移 ( ) 会自动为我们做填充,所以我通常这样做:

int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.

(Where Long.SIZEis 64. If our tempvariable wasn't a long, we'd use something else. If we used e.g. int tempinstead, we'd use 32.)

Long.SIZE64在哪里。如果我们的temp变量不是 a long,我们会使用别的东西。如果我们使用 eg int temp,我们会使用 32。)

To understand how this works, here's a diagram of sign-extending 8-bit to 16-bit:

为了理解它是如何工作的,这里有一个符号扩展 8 位到 16 位的图表:

 11111111 is the byte value -1, but the upper bits of the short are 0.
 Shift the byte's MSB in to the MSB position of the short.

 0000 0000 1111 1111
 <<                8
 ───────────────────
 1111 1111 0000 0000

 Shift it back and the right-shift fills all the upper bits with 1s.
 We now have the short value of -1.

 1111 1111 0000 0000
 >>                8
 ───────────────────
 1111 1111 1111 1111

Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.

正值(在 MSB 中为 0)保持不变。这是算术右移的一个很好的特性。

Then normalize the sample, as described in Some Assumptions.

然后对样本进行归一化,如一些假设中所述

You might not need to write explicit sign-extension if your code is simple

如果您的代码很简单,您可能不需要编写显式的符号扩展

Java does sign-extension automatically when converting from one integral type to a larger type, for example byteto int. If you knowthat your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.

当从一种整数类型转换为更大的类型时,Java 会自动进行符号扩展,例如转换byteint. 如果你知道你的输入和输出格式总是有符号的,你可以在前面的步骤中连接字节时使用自动符号扩展。

Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFFto prevent sign-extension from occurring. If you just remove the & 0xFFfrom the highest byte, sign-extension will happen automatically.

回想一下上面的部分(如何将字节数组强制转换为有意义的数据?),我们用来b & 0xFF防止发生符号扩展。如果您只是& 0xFF从最高的 中删除byte,符号扩展将自动发生。

For example, the following decodes signed, big-endian, 16-bit samples:

例如,以下解码有符号、大端、16 位样本:

for (int i = 0; i < bytes.length; i++) {
    int sample = (bytes[i] << 8) // high byte is sign-extended
               | (bytes[i + 1] & 0xFF); // low byte is not
    // ...
}

How do I decode Encoding.PCM_UNSIGNED?

我如何解码Encoding.PCM_UNSIGNED

We turn it in to a signed number. Unsigned samples are simply offset so that, for example:

我们把它变成一个有符号的数字。无符号样本只是偏移,例如:

  • An unsigned value of 0 corresponds to the most negative signed value.
  • An unsigned value of 2bitsPerSample - 1corresponds to the signed value of 0.
  • An unsigned value of 2bitsPerSamplecorresponds to the most positive signed value.
  • 无符号值 0 对应于最负的有符号值。
  • 2 bitsPerSample - 1的无符号值对应于有符号值 0。
  • 2 bitsPerSample的无符号值对应于最正的有符号值。

So this turns out to be pretty simple. Just subtract the offset:

所以这很简单。只需减去偏移量:

float sample = temp - fullScale(bitsPerSample);

Then normalize the sample, as described in Some Assumptions.

然后对样本进行归一化,如一些假设中所述

How do I decode Encoding.PCM_FLOAT?

我如何解码Encoding.PCM_FLOAT

This is new since Java 7.

这是自 Java 7 以来的新功能。

In practice, floating-point PCM is typically either IEEE 32-bit or IEEE 64-bit and already normalized to the range of ±1.0. The samples can be obtained with the utility methods Float#intBitsToFloatand Double#longBitsToDouble.

在实践中,浮点 PCM 通常是 IEEE 32 位或 IEEE 64 位,并且已经标准化到±1.0. 样品可以通过效用方法Float#intBitsToFloat和 获得Double#longBitsToDouble

// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);
// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDouble(temp);
float sample = (float) sampleAsDouble; // or just use double for arithmetic

How do I decode Encoding.ULAWand Encoding.ALAW?

我如何解码Encoding.ULAWEncoding.ALAW

These are compandingcompression codecs that are more common in telephones and such. They're supported by javax.sound.sampledI assume because they're used by Sun's Au format. (However, it's not limited to just this type of container. For example, WAV can contain these encodings.)

这些是在电话等中更常见的压扩压缩编解码器。javax.sound.sampled我假设它们支持它们,因为它们被Sun 的 Au 格式使用。(但是,它不仅限于这种类型的容器。例如,WAV 可以包含这些编码。)

You can conceptualize A-lawand μ-lawlike they're a floating-point format. These are PCM formats but the range of values is non-linear.

您可以将A-lawμ-law概念化,就像它们是浮点格式一样。这些是 PCM 格式,但值的范围是非线性的。

There are two ways to decode them. I'll show the way which uses the mathematical formula. You can also decode them by manipulating the binary directly which is described in this blog postbut it's more esoteric-looking.

有两种方法可以解码它们。我将展示使用数学公式的方式。您还可以通过直接操作本博客文章中描述的二进制文件来解码它们,但它看起来更深奥。

For both, the compressed data is 8-bit. Standardly A-law is 13-bit when decoded and μ-law is 14-bit when decoded; however, applying the formula yields a range of ±1.0.

对于两者,压缩数据都是 8 位。标准 A-law 解码时为 13 位,μ-law 解码时为 14 位;然而,应用该公式会产生一个范围±1.0

Before you can apply the formula, there are three things to do:

在应用公式之前,需要做三件事:

  1. Some of the bits are standardly inverted for storage due to reasons involving data integrity.
  2. They're stored as sign and magnitude (rather than two's complement).
  3. The formula also expects a range of ±1.0, so the 8-bit value has to be scaled.
  1. 由于涉及数据完整性的原因,一些位被标准反转用于存储。
  2. 它们存储为符号和大小(而不是二进制补码)。
  3. 该公式还期望范围为±1.0,因此必须缩放 8 位值。

For μ-law all the bitsare inverted, so:

对于 μ-law,所有位都被反转,因此:

temp ^= 0xffL; // 0xff == 0b1111_1111

(Note that we can't use ~, because we don't want to invert the high bits of the long.)

(请注意,我们不能使用~,因为我们不想反转 的高位long。)

For A-law, every other bitis inverted, so:

对于 A 律,每隔一位都被反转,所以:

temp ^= 0x55L; // 0x55 == 0b0101_0101

(XOR can be used to do inversion. See How do you set, clear and toggle a bit?)

(XOR 可用于进行反转。请参阅如何设置、清除和切换一点?

To convert from sign and magnitude to two's complement, we:

要将符号和大小转换为二进制补码,我们:

  1. Check to see if the sign bit was set.
  2. If so, clear the sign bit and negate the number.
  1. 检查是否设置了符号位。
  2. 如果是,清除符号位并取反数字。
// 0x80 == 0b1000_0000
if ((temp & 0x80L) != 0) {
    temp ^= 0x80L;
    temp = -temp;
}

Then scale the encoded numbers, the same way as described in Some Assumptions:

然后缩放编码的数字,与Some Assumptions 中描述的方式相同:

sample = temp / fullScale(8);

Now we can apply the expansion.

现在我们可以应用扩展。

The μ-law formula translated to Java is then:

转换为 Java 的 μ-law 公式为:

sample = (float) (
    signum(sample)
        *
    (1.0 / 255.0)
        *
    (pow(256.0, abs(sample)) - 1.0)
);

The A-law formula translated to Java is then:

转换为 Java 的 A 律公式为:

float signum = signum(sample);
sample = abs(sample);

if (sample < (1.0 / (1.0 + log(87.7)))) {
    sample = (float) (
        sample * ((1.0 + log(87.7)) / 87.7)
    );
} else {
    sample = (float) (
        exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
    );
}

sample = signum * sample;


Here's the full example code for the SimpleAudioConversionclass.

这是SimpleAudioConversion该类的完整示例代码。

package mcve.audio;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;

import static java.lang.Math.*;

/**
 * <p>Performs simple audio format conversion.</p>
 *
 * <p>Example usage:</p>
 *
 * <pre>{@code  AudioInputStream ais = ... ;
 * SourceDataLine  line = ... ;
 * AudioFormat      fmt = ... ;
 *
 * // do setup
 *
 * for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
 *     int slen;
 *     slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
 *
 *     // do something with samples
 *
 *     blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
 *     line.write(bytes, 0, blen);
 * }}</pre>
 *
 * @author Radiodef
 * @see <a href="http://stackoverflow.com/a/26824664/2891664">Overview on Stack Overflow</a>
 */
public final class SimpleAudioConversion {
    private SimpleAudioConversion() {}

    /**
     * Converts from a byte array to an audio sample float array.
     *
     * @param bytes   the byte array, filled by the AudioInputStream
     * @param samples an array to fill up with audio samples
     * @param blen    the return value of AudioInputStream.read
     * @param fmt     the source AudioFormat
     *
     * @return the number of valid audio samples converted
     *
     * @throws NullPointerException if bytes, samples or fmt is null
     * @throws ArrayIndexOutOfBoundsException
     *         if bytes.length is less than blen or
     *         if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
     */
    public static int decode(byte[]      bytes,
                             float[]     samples,
                             int         blen,
                             AudioFormat fmt) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while (i < blen) {
            long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
            float sample = 0f;

            if (encoding == Encoding.PCM_SIGNED) {
                temp = extendSign(temp, bitsPerSample);
                sample = (float) (temp / fullScale);

            } else if (encoding == Encoding.PCM_UNSIGNED) {
                temp = unsignedToSigned(temp, bitsPerSample);
                sample = (float) (temp / fullScale);

            } else if (encoding == Encoding.PCM_FLOAT) {
                if (bitsPerSample == 32) {
                    sample = Float.intBitsToFloat((int) temp);
                } else if (bitsPerSample == 64) {
                    sample = (float) Double.longBitsToDouble(temp);
                }
            } else if (encoding == Encoding.ULAW) {
                sample = bitsToMuLaw(temp);

            } else if (encoding == Encoding.ALAW) {
                sample = bitsToALaw(temp);
            }

            samples[s] = sample;

            i += bytesPerSample;
            s++;
        }

        return s;
    }

    /**
     * Converts from an audio sample float array to a byte array.
     *
     * @param samples an array of audio samples to encode
     * @param bytes   an array to fill up with bytes
     * @param slen    the return value of the decode method
     * @param fmt     the destination AudioFormat
     *
     * @return the number of valid bytes converted
     *
     * @throws NullPointerException if samples, bytes or fmt is null
     * @throws ArrayIndexOutOfBoundsException
     *         if samples.length is less than slen or
     *         if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
     */
    public static int encode(float[]     samples,
                             byte[]      bytes,
                             int         slen,
                             AudioFormat fmt) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while (s < slen) {
            float sample = samples[s];
            long temp = 0L;

            if (encoding == Encoding.PCM_SIGNED) {
                temp = (long) (sample * fullScale);

            } else if (encoding == Encoding.PCM_UNSIGNED) {
                temp = (long) (sample * fullScale);
                temp = signedToUnsigned(temp, bitsPerSample);

            } else if (encoding == Encoding.PCM_FLOAT) {
                if (bitsPerSample == 32) {
                    temp = Float.floatToRawIntBits(sample);
                } else if (bitsPerSample == 64) {
                    temp = Double.doubleToRawLongBits(sample);
                }
            } else if (encoding == Encoding.ULAW) {
                temp = muLawToBits(sample);

            } else if (encoding == Encoding.ALAW) {
                temp = aLawToBits(sample);
            }

            packBits(bytes, i, temp, isBigEndian, bytesPerSample);

            i += bytesPerSample;
            s++;
        }

        return i;
    }

    /**
     * Computes the block-aligned bytes per sample of the audio format,
     * using Math.ceil(bitsPerSample / 8.0).
     * <p>
     * Round towards the ceiling because formats that allow bit depths
     * in non-integral multiples of 8 typically pad up to the nearest
     * integral multiple of 8. So for example, a 31-bit AIFF file will
     * actually store 32-bit blocks.
     *
     * @param  bitsPerSample the return value of AudioFormat.getSampleSizeInBits
     * @return The block-aligned bytes per sample of the audio format.
     */
    public static int bytesPerSample(int bitsPerSample) {
        return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
    }

    /**
     * Computes the largest magnitude representable by the audio format,
     * using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
     * audio, the largest positive value is one less than the return value of
     * this method.
     * <p>
     * The result is returned as a double because in the case that
     * bitsPerSample is 64, a long would overflow.
     *
     * @param bitsPerSample the return value of AudioFormat.getBitsPerSample
     * @return the largest magnitude representable by the audio format
     */
    public static double fullScale(int bitsPerSample) {
        return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
    }

    private static long unpackBits(byte[]  bytes,
                                   int     i,
                                   boolean isBigEndian,
                                   int     bytesPerSample) {
        switch (bytesPerSample) {
            case  1: return unpack8Bit(bytes, i);
            case  2: return unpack16Bit(bytes, i, isBigEndian);
            case  3: return unpack24Bit(bytes, i, isBigEndian);
            default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
        }
    }

    private static long unpack8Bit(byte[] bytes, int i) {
        return bytes[i] & 0xffL;
    }

    private static long unpack16Bit(byte[]  bytes,
                                    int     i,
                                    boolean isBigEndian) {
        if (isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 8)
                |  (bytes[i + 1] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) << 8)
            );
        }
    }

    private static long unpack24Bit(byte[]  bytes,
                                    int     i,
                                    boolean isBigEndian) {
        if (isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 16)
                | ((bytes[i + 1] & 0xffL) <<  8)
                |  (bytes[i + 2] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) <<  8)
                | ((bytes[i + 2] & 0xffL) << 16)
            );
        }
    }

    private static long unpackAnyBit(byte[]  bytes,
                                     int     i,
                                     boolean isBigEndian,
                                     int     bytesPerSample) {
        long temp = 0;

        if (isBigEndian) {
            for (int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (
                    8 * (bytesPerSample - b - 1)
                );
            }
        } else {
            for (int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (8 * b);
            }
        }

        return temp;
    }

    private static void packBits(byte[]  bytes,
                                 int     i,
                                 long    temp,
                                 boolean isBigEndian,
                                 int     bytesPerSample) {
        switch (bytesPerSample) {
            case  1: pack8Bit(bytes, i, temp);
                     break;
            case  2: pack16Bit(bytes, i, temp, isBigEndian);
                     break;
            case  3: pack24Bit(bytes, i, temp, isBigEndian);
                     break;
            default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
                     break;
        }
    }

    private static void pack8Bit(byte[] bytes, int i, long temp) {
        bytes[i] = (byte) (temp & 0xffL);
    }

    private static void pack16Bit(byte[]  bytes,
                                  int     i,
                                  long    temp,
                                  boolean isBigEndian) {
        if (isBigEndian) {
            bytes[i    ] = (byte) ((temp >>> 8) & 0xffL);
            bytes[i + 1] = (byte) ( temp        & 0xffL);
        } else {
            bytes[i    ] = (byte) ( temp        & 0xffL);
            bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
        }
    }

    private static void pack24Bit(byte[]  bytes,
                                  int     i,
                                  long    temp,
                                  boolean isBigEndian) {
        if (isBigEndian) {
            bytes[i    ] = (byte) ((temp >>> 16) & 0xffL);
            bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
            bytes[i + 2] = (byte) ( temp         & 0xffL);
        } else {
            bytes[i    ] = (byte) ( temp         & 0xffL);
            bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
            bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
        }
    }

    private static void packAnyBit(byte[]  bytes,
                                   int     i,
                                   long    temp,
                                   boolean isBigEndian,
                                   int     bytesPerSample) {
        if (isBigEndian) {
            for (int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte) (
                    (temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
                );
            }
        } else {
            for (int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
            }
        }
    }

    private static long extendSign(long temp, int bitsPerSample) {
        int bitsToExtend = Long.SIZE - bitsPerSample;
        return (temp << bitsToExtend) >> bitsToExtend;
    }

    private static long unsignedToSigned(long temp, int bitsPerSample) {
        return temp - (long) fullScale(bitsPerSample);
    }

    private static long signedToUnsigned(long temp, int bitsPerSample) {
        return temp + (long) fullScale(bitsPerSample);
    }

    // mu-law constant
    private static final double MU = 255.0;
    // A-law constant
    private static final double A = 87.7;
    // natural logarithm of A
    private static final double LN_A = log(A);

    private static float bitsToMuLaw(long temp) {
        temp ^= 0xffL;
        if ((temp & 0x80L) != 0) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float) (temp / fullScale(8));

        return (float) (
            signum(sample)
                *
            (1.0 / MU)
                *
            (pow(1.0 + MU, abs(sample)) - 1.0)
        );
    }

    private static long muLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        sample = (float) (
            sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
        );

        long temp = (long) (sample * fullScale(8));

        if (temp < 0) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0xffL;
    }

    private static float bitsToALaw(long temp) {
        temp ^= 0x55L;
        if ((temp & 0x80L) != 0) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float) (temp / fullScale(8));

        float sign = signum(sample);
        sample = abs(sample);

        if (sample < (1.0 / (1.0 + LN_A))) {
            sample = (float) (sample * ((1.0 + LN_A) / A));
        } else {
            sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
        }

        return sign * sample;
    }

    private static long aLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        if (sample < (1.0 / A)) {
            sample = (float) ((A * sample) / (1.0 + LN_A));
        } else {
            sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
        }

        sample *= sign;

        long temp = (long) (sample * fullScale(8));

        if (temp < 0) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0x55L;
    }
}

回答by Carlos Rendon

This is how you get the actual sample data from the currently playing sound. The other excellent answerwill tell you what the data means. Haven't tried it on another OS than my Windows 10 machine YMMV. For me it pulls the current system default recording device. On Windows set it to "Stereo Mix" instead of "Microphone" to get playing sound. You may have to toggle "Show Disabled Devices" to see "Stereo Mix".

这是从当前播放的声音中获取实际样本数据的方式。在其他优秀的答案会告诉你什么是数据的装置。除了我的 Windows 10 机器 YMMV,还没有在其他操作系统上尝试过。对我来说,它会拉取当前系统默认的录音设备。在 Windows 上将其设置为“立体声混音”而不是“麦克风”以获得播放声音。您可能需要切换“显示禁用的设备”才能看到“立体声混音”。

import javax.sound.sampled.*;

public class SampleAudio {

    private static long extendSign(long temp, int bitsPerSample) {
        int extensionBits = 64 - bitsPerSample;
        return (temp << extensionBits) >> extensionBits;
    }

    public static void main(String[] args) throws LineUnavailableException {
        float sampleRate = 8000;
        int sampleSizeBits = 16;
        int numChannels = 1; // Mono
        AudioFormat format = new AudioFormat(sampleRate, sampleSizeBits, numChannels, true, true);
        TargetDataLine tdl = AudioSystem.getTargetDataLine(format);
        tdl.open(format);
        tdl.start();
        if (!tdl.isOpen()) {
            System.exit(1);         
        } 
        byte[] data = new byte[(int)sampleRate*10];
        int read = tdl.read(data, 0, (int)sampleRate*10);
        if (read > 0) {
            for (int i = 0; i < read-1; i = i + 2) {
                long val = ((data[i] & 0xffL) << 8L) | (data[i + 1] & 0xffL);
                long valf = extendSign(val, 16);
                System.out.println(i + "\t" + valf);
            }
        }
        tdl.close();
    }
}