为什么没有更多的 Java 代码使用 PipedInputStream / PipedOutputStream？

Question

提问by Steven Huwig

I've discovered this idiom recently, and I am wondering if there is something I am missing. I've never seen it used. Nearly all Java code I've worked with in the wild favors slurping data into a string or buffer, rather than something like this example (using HttpClient and XML APIs for example):

我最近发现了这个成语，我想知道我是否遗漏了什么。我从未见过它被使用过。我在野外使用过的几乎所有 Java 代码都倾向于将数据放入字符串或缓冲区中，而不是像这个例子那样（例如使用 HttpClient 和 XML API）：

    final LSOutput output; // XML stuff initialized elsewhere
    final LSSerializer serializer;
    final Document doc;
    // ...
    PostMethod post; // HttpClient post request
    final PipedOutputStream source = new PipedOutputStream();
    PipedInputStream sink = new PipedInputStream(source);
    // ...
    executor.execute(new Runnable() {
            public void run() {
                output.setByteStream(source);
                serializer.write(doc, output);
                try {
                    source.close();
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }});

    post.setRequestEntity(new InputStreamRequestEntity(sink));
    int status = httpClient.executeMethod(post);

That code uses a Unix-piping style technique to prevent multiple copies of the XML data being kept in memory. It uses the HTTP Post output stream and the DOM Load/Save API to serialize an XML Document as the content of the HTTP request. As far as I can tell it minimizes the use of memory with very little extra code (just the few lines for Runnable, PipedInputStream, and PipedOutputStream).

该代码使用 Unix 管道样式技术来防止将 XML 数据的多个副本保存在内存中。它使用 HTTP Post 输出流和 DOM Load/Save API 将 XML 文档序列化为 HTTP 请求的内容。至于我可以告诉它最大限度地减少用很少的额外代码使用的内存（只是几行了Runnable，PipedInputStream和PipedOutputStream）。

So, what's wrong with this idiom? If there's nothing wrong with this idiom, why haven't I seen it?

那么，这个成语有什么问题呢？如果这个成语没有任何问题，我为什么没有看到它？

EDIT: to clarify, PipedInputStreamand PipedOutputStreamreplace the boilerplate buffer-by-buffer copy that shows up everywhere, and they also allow you to process incoming data concurrently with writing out the processed data. They don't use OS pipes.

编辑：澄清PipedInputStream并PipedOutputStream替换随处可见的样板缓冲区副本，它们还允许您在写出处理数据的同时处理传入数据。他们不使用操作系统管道。

Answer 1

采纳答案by matt b

From the Javadocs:

从Javadocs：

Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.

通常，数据由一个线程从 PipedInputStream 对象读取，数据由其他线程写入相应的 PipedOutputStream。不建议尝试从单个线程使用这两个对象，因为这可能会使线程死锁。

This may partially explain why it is not more commonly used.

这可以部分解释为什么它不更常用。

I'd assume another reason is that many developers do not understand its purpose / benefit.

我认为另一个原因是许多开发人员不了解其目的/好处。

Answer 2

回答by kdgregory

In your example you're creating two threads to do the work that could be done by one. And introducing I/O delays into the mix.

在您的示例中，您正在创建两个线程来完成一个线程可以完成的工作。并在混合中引入 I/O 延迟。

Do you have a better example? Or did I just answer your question.

你有更好的例子吗？还是我刚刚回答了你的问题。

To pull some of the comments (at least my view of them) into the main response:

将一些评论（至少是我对它们的看法）提取到主要回复中：

Concurrency introduces complexity into an application. Instead of dealing with a single linear flow of data, you now have to be concerned about sequencing of independent data flows. In some cases, the added complexity may be justified, particularly if you can leverage multiple cores/CPUs to do CPU-intensive work.
If you are in a situation where you can benefit from concurrent operations, there's usually a better way to coordinate the flow of data between threads. For example, passing objects between threads using a concurrent queue, rather than wrapping the piped streams in object streams.
Where a piped stream may be a good solution is when you have multiple threads performing text processing, a la a Unix pipeline (eg: grep | sort).

并发性给应用程序带来了复杂性。您现在不必处理单个线性数据流，而必须关注独立数据流的排序。在某些情况下，增加的复杂性可能是合理的，特别是如果您可以利用多个内核/CPU 来执行 CPU 密集型工作。
如果您处于可以从并发操作中受益的情况，通常有更好的方法来协调线程之间的数据流。例如，使用并发队列在线程之间传递对象，而不是将管道流包装在对象流中。
当您有多个线程执行文本处理时，管道流可能是一个很好的解决方案，例如 Unix 管道（例如：grep | sort）。

In the specific example, the piped stream allows use of an existing RequestEntity implementation class provided by HttpClient. I believe that a better solution is to create a new implementation class, as below, because the example is ultimately a sequential operation that cannot benefit from the complexity and overhead of a concurrent implementation. While I show the RequestEntity as an anonymous class, reusability would indicate that it should be a first-class class.

在特定示例中，管道流允许使用 HttpClient 提供的现有 RequestEntity 实现类。我认为更好的解决方案是创建一个新的实现类，如下所示，因为该示例最终是一个顺序操作，无法从并发实现的复杂性和开销中受益。虽然我将 RequestEntity 显示为匿名类，但可重用性表明它应该是一流的类。

post.setRequestEntity(new RequestEntity()
{
    public long getContentLength()
    {
        return 0-1;
    }

    public String getContentType()
    {
        return "text/xml";
    }

    public boolean isRepeatable()
    {
        return false;
    }

    public void writeRequest(OutputStream out) throws IOException
    {
        output.setByteStream(out);
        serializer.write(doc, output);
    }
});

Answer 3

回答by Brian Matthews

I too only discovered the PipedInputStream/PipedOutputStream classes recently.

我最近也只发现了 PipedInputStream/PipedOutputStream 类。

I am developing an Eclipse plug-in that needs to execute commands on a remote server via SSH. I am using JSchand the Channel API reads from an input stream and writes to an output stream. But I need to feed commands through the input stream and read the responses from an output stream. Thats where PipedInput/OutputStream comes in.

我正在开发一个需要通过 SSH 在远程服务器上执行命令的 Eclipse 插件。我正在使用JSch，Channel API 从输入流读取并写入输出流。但是我需要通过输入流提供命令并从输出流读取响应。这就是 PipedInput/OutputStream 的用武之地。

import java.io.PipedInputStream;
import java.io.PipedOutputStream;

import com.jcraft.jsch.Channel;

Channel channel;
PipedInputStream channelInputStream = new PipedInputStream();
PipedOutputStream channelOutputStream = new PipedOutputStream();

channel.setInputStream(new PipedInputStream(this.channelOutputStream));
channel.setOutputStream(new PipedOutputStream(this.channelInputStream));
channel.connect();

// Write to channelInputStream
// Read from channelInputStream

channel.disconnect();

Answer 4

回答by Peter Lawrey

So, what's wrong with this idiom? If there's nothing wrong with this idiom, why haven't I seen it?
EDIT: to clarify, PipedInputStream and PipedOutputStream replace the boilerplate buffer-by-buffer copy that shows up everywhere, and they also allow you to process incoming data concurrently with writing out the processed data. They don't use OS pipes.

那么，这个成语有什么问题呢？如果这个成语没有任何问题，我为什么没有看到它？
编辑：澄清一下，PipedInputStream 和 PipedOutputStream 替换了随处可见的样板缓冲区副本，它们还允许您在写出处理数据的同时处理传入数据。他们不使用操作系统管道。

You have stated what it does but haven't stated why you are doing this.

你已经说明了它的作用，但没有说明你为什么要这样做。

If you believe that this will either reduce resources used (cpu/memory) or improve performance then it won't do either. However it will make your code more complex.

如果您认为这会减少使用的资源（cpu/内存）或提高性能，那么它不会做任何事情。但是，它会使您的代码更加复杂。

Basically you have a solution without a problem for which it solves.

基本上你有一个解决方案，没有它解决的问题。

Answer 5

回答by Adrian Pronk

I tried using these classes a while back for something, I forget the details. But I did discover that their implementation is fatally flawed. I can't remember what it was but I have a sneaky memory that it may have been a race condition which meant that they occasionally deadlocked (And yes, of course I was using them in separately threads: they simply aren't usable in a single thread and weren't designed to be).

不久前我尝试使用这些类来做某事，但我忘记了细节。但我确实发现他们的实施存在致命的缺陷。我不记得它是什么，但我有一个偷偷摸摸的记忆，它可能是一个竞争条件，这意味着它们偶尔会死锁（是的，当然我在单独的线程中使用它们：它们根本不能用于单线程，并非设计为）。

I might have a look at their source code andsee if I can see what the problem might have been.

我可能会看看他们的源代码，看看我是否能看到问题可能出在哪里。

Answer 6

回答by StaxMan

Also, back to the original example: no, it does not exactly minimize memory usage either. DOM tree(s) get built, in-memory buffering done -- while that is better than full byte array replicas, it's not that much better. But buffering in this case will be slower; and an extra thread is also created -- you can not use PipedInput/OutputStream pair from within a single thread.

另外，回到最初的例子：不，它也没有完全减少内存使用。DOM 树构建完成，内存缓冲完成——虽然这比全字节数组副本好，但也好不到哪里去。但是在这种情况下缓冲会更慢；并且还创建了一个额外的线程——您不能在单个线程中使用 PipedInput/OutputStream 对。

Sometimes PipedXxxStreams are useful, but the reason they are not used more is because quite often they are not the right solution. They are ok for inter-thread communication, and that's where I have used them for what that's worth. It's just that there aren't that many use cases for this, given how SOA pushes most such boundaries to be between services, instead of between threads.

有时 PipedXxxStreams 很有用，但它们没有被更多使用的原因是因为它们通常不是正确的解决方案。它们适用于线程间通信，这就是我使用它们的价值所在。只是考虑到 SOA 如何将大多数此类边界推到服务之间而不是线程之间，因此没有那么多用例。

Answer 7

回答by Guido Medina

java.io pipes have too much context switching (per byte read/write) and their java.nio counterpart requires you to have some NIO background and proper usage of channels and stuff, this is my own implementation of pipes using a blocking queue which for a single producer/consumer will perform fast and scale well:

java.io 管道有太多的上下文切换（每字节读/写），它们的 java.nio 对应要求你有一些 NIO 背景和正确使用通道和东西，这是我自己使用阻塞队列的管道实现，用于单个生产者/消费者将执行快速和扩展良好：

import java.io.IOException;
import java.io.OutputStream;
import java.util.concurrent.*;

public class QueueOutputStream extends OutputStream
{
  private static final int DEFAULT_BUFFER_SIZE=1024;
  private static final byte[] END_SIGNAL=new byte[]{};

  private final BlockingQueue<byte[]> queue=new LinkedBlockingDeque<>();
  private final byte[] buffer;

  private boolean closed=false;
  private int count=0;

  public QueueOutputStream()
  {
    this(DEFAULT_BUFFER_SIZE);
  }

  public QueueOutputStream(final int bufferSize)
  {
    if(bufferSize<=0){
      throw new IllegalArgumentException("Buffer size <= 0");
    }
    this.buffer=new byte[bufferSize];
  }

  private synchronized void flushBuffer()
  {
    if(count>0){
      final byte[] copy=new byte[count];
      System.arraycopy(buffer,0,copy,0,count);
      queue.offer(copy);
      count=0;
    }
  }

  @Override
  public synchronized void write(final int b) throws IOException
  {
    if(closed){
      throw new IllegalStateException("Stream is closed");
    }
    if(count>=buffer.length){
      flushBuffer();
    }
    buffer[count++]=(byte)b;
  }

  @Override
  public synchronized void write(final byte[] b, final int off, final int len) throws IOException
  {
    super.write(b,off,len);
  }

  @Override
  public synchronized void close() throws IOException
  {
    flushBuffer();
    queue.offer(END_SIGNAL);
    closed=true;
  }

  public Future<Void> asyncSendToOutputStream(final ExecutorService executor, final OutputStream outputStream)
  {
    return executor.submit(
            new Callable<Void>()
            {
              @Override
              public Void call() throws Exception
              {
                try{
                  byte[] buffer=queue.take();
                  while(buffer!=END_SIGNAL){
                    outputStream.write(buffer);
                    buffer=queue.take();
                  }
                  outputStream.flush();
                } catch(Exception e){
                  close();
                  throw e;
                } finally{
                  outputStream.close();
                }
                return null;
              }
            }
    );
  }

Answer 8

回答by Robert Christian

Here's a use case where pipes make sense:

这是管道有意义的用例：

Suppose you have a third party lib, such as an xslt mapper or crypto lib that has an interface like this: doSomething(inputStream, outputStream). And you do not want to buffer the result before sending over the wire. Apache and other clients disallow direct access to the wire outputstream. Closest you can get is obtaining the outputstream - at an offset, after headers are written - in a request entity object. But since this is under the hood, it's still not enough to pass an inputstream and outputstream to the third party lib. Pipes are a good solution to this problem.

假设你有一个第三方库，比如一个 xslt 映射器或加密库，它有一个这样的接口：doSomething(inputStream, outputStream)。并且您不想在通过网络发送之前缓冲结果。Apache 和其他客户端不允许直接访问线路输出流。最接近的是在请求实体对象中获取输出流 - 在写入标头后的偏移量处。但由于这是在幕后，将输入流和输出流传递给第三方库仍然不够。管道可以很好地解决这个问题。

Incidentally, I wrote an inversion of Apache's HTTP Client API [PipedApacheClientOutputStream]which provides an OutputStream interface for HTTP POST using Apache Commons HTTP Client 4.3.4. This is an example where Piped Streams might make sense.

顺便说一句，我写了一个 Apache 的 HTTP 客户端 API [PipedApacheClientOutputStream]的倒置，它为使用 Apache Commons HTTP Client 4.3.4 的 HTTP POST 提供了一个 OutputStream 接口。这是管道流可能有意义的示例。

Answer 9

回答by DWoldrich

PipedInputStream and PipeOutputStream will sleep its thread for 1 secondwhenever they are blocking waiting for the other side to read or write out of the full or empty buffer. Do not use.

每当PipedInputStream 和 PipeOutputStream阻塞等待另一端读取或写入已满或空缓冲区时，它们的线程将休眠1 秒。不使用。

为什么没有更多的 Java 代码使用 PipedInputStream / PipedOutputStream？

提问by Steven Huwig

采纳答案by matt b

回答by kdgregory

回答by Brian Matthews

回答by Peter Lawrey

回答by Adrian Pronk

回答by StaxMan

回答by Guido Medina

回答by Robert Christian

回答by DWoldrich

相关推荐

最近更新

标签

为什么没有更多的 Java 代码使用 PipedInputStream / PipedOutputStream？

提问by Steven Huwig

采纳答案by matt b

回答by kdgregory

回答by Brian Matthews

回答by Peter Lawrey

回答by Adrian Pronk

回答by StaxMan

回答by Guido Medina

回答by Robert Christian

回答by DWoldrich

相关推荐

Java 有缓冲区溢出吗？

如何在 Java 8 中为 LocalDate 添加天数时跳过周末？

Java 自定义事件处理程序和侦听器

Java 如何修改 HttpUrlConnection 的标头

相关推荐

最近更新

标签