如何将 JSON 文件转换为 Java 8 对象流?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35000998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 23:37:52  来源:igfitidea点击:

How do I turn a JSON file into a Java 8 Object Stream?

javaarraysjsonjava-8bigdata

提问by Witbrock

I have a very large > 1GB JSON file containing an array (it's confidential, but this is a proxy:)

我有一个非常大的 > 1GB JSON 文件,其中包含一个数组(它是机密的,但这是一个代理:)

 [
        {
            "date": "August 17, 2015",
            "hours": 7,
            "minutes": 10
        },
        {
            "date": "August 19, 2015",
            "hours": 4,
            "minutes": 46
        },
        {
            "date": "August 19, 2015",
            "hours": 7,
            "minutes": 22
        },
        {
            "date": "August 21, 2015",
            "hours": 4,
            "minutes": 48
        },
        {
            "date": "August 21, 2015",
            "hours": 6,
            "minutes": 1
        }
    ]

I've used JSON2POJO to produce a "Sleep" object definition.

我使用 JSON2POJO 来生成“睡眠”对象定义。

Now, one could use Hymanson's Mapper to just convert to an array, and then use Arrays.stream(ARRAY). Except that this crashes (yes, it's a BIG file).

现在,可以使用 Hymanson 的 Mapper 转换为数组,然后使用 Arrays.stream(ARRAY)。除了这会崩溃(是的,这是一个大文件)。

The obvious thing is to use Hymanson's Streaming API. But that's super low level. In particular, I still want Sleep Objects.

显而易见的是使用 Hymanson 的 Streaming API。但那是超低水平。特别是,我仍然想要睡眠对象。

How do I use the Hymanson Streaming JSON reader and my Sleep.java class to generate a Java 8 Stream of Sleep Objects?

如何使用 Hymanson Streaming JSON 读取器和我的 Sleep.java 类来生成 Java 8 睡眠对象流?

采纳答案by Witbrock

I couldn't find a good solution to this, and I needed one for a particular case: I had a >1GB JSON file (a top level JSON array, with tens of thousands of largish objects), and using the normal Hymanson mapper caused crashes when accessing the resulting Java object array.

我找不到一个好的解决方案,我需要一个针对特定情况的解决方案:我有一个 >1GB 的 JSON 文件(一个顶级 JSON 数组,包含数万个大型对象),并且使用普通的 Hymanson 映射器导致访问生成的 Java 对象数组时崩溃。

The examples I found for using the Hymanson Streaming API lost the object mapping that is so appealing, and certainly didn't allow access to the objects via the (obviously appropriate) Java 8 Streaming API.

我发现的使用 Hymanson Streaming API 的示例失去了如此吸引人的对象映射,当然也不允许通过(显然合适的)Java 8 Streaming API 访问对象。

The code is now on GitHub

代码现在在 GitHub 上

Here's a quick example of use:

这是一个快速的使用示例:

 //Use the JSON File included as a resource
 ClassLoader classLoader = SleepReader.class.getClassLoader();
 File dataFile = new File(classLoader.getResource("example.json").getFile());

 //Simple example of getting the Sleep Objects from that JSON
 new JsonArrayStreamDataSupplier<>(dataFile, Sleep.class) //Got the Stream
                .forEachRemaining(nightsRest -> {
                    System.out.println(nightsRest.toString());
                });

Here's some JSON from example.json

这是 example.json 中的一些 JSON

   [
    {
        "date": "August 17, 2015",
        "hours": 7,
        "minutes": 10
    },
    {
        "date": "August 19, 2015",
        "hours": 4,
        "minutes": 46
    },
    {
        "date": "August 19, 2015",
        "hours": 7,
        "minutes": 22
    },
    {
        "date": "August 21, 2015",
        "hours": 4,
        "minutes": 48
    },
    {
        "date": "August 21, 2015",
        "hours": 6,
        "minutes": 1
    }
]

and, in case you don't want to go to GitHub (you should), here's the wrapper class itself:

而且,如果你不想去 GitHub(你应该去),这里是包装类本身:

    /**
 * @license APACHE LICENSE, VERSION 2.0 http://www.apache.org/licenses/LICENSE-2.0
 * @author Michael Witbrock
 */
package com.michaelwitbrock.Hymansonstream;

import com.fasterxml.Hymanson.core.JsonFactory;
import com.fasterxml.Hymanson.core.JsonParser;
import com.fasterxml.Hymanson.core.JsonToken;
import com.fasterxml.Hymanson.databind.JsonNode;
import com.fasterxml.Hymanson.databind.ObjectMapper;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.Spliterators;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public class JsonArrayStreamDataSupplier<T> implements Iterator<T> {
    /*
    * This class wraps the Hymanson streaming API for arrays (a common kind of 
    * large JSON file) in a Java 8 Stream. The initial motivation was that 
    * use of a default objectmapper to a Java array was crashing for me on
    * a very large JSON file (> 1GB).  And there didn't seem to be good example 
    * code for handling Hymanson streams as Java 8 streams, which seems natural.
    */

    static ObjectMapper mapper = new ObjectMapper();
    JsonParser parser;
    boolean maybeHasNext = false;
    int count = 0;
    JsonFactory factory = new JsonFactory();
    private Class<T> type;

    public JsonArrayStreamDataSupplier(File dataFile, Class<T> type) {
        this.type = type;
        try {
            // Setup and get into a state to start iterating
            parser = factory.createParser(dataFile);
            parser.setCodec(mapper);
            JsonToken token = parser.nextToken();
            if (token == null) {
                throw new RuntimeException("Can't get any JSON Token from "
                        + dataFile.getAbsolutePath());
            }

            // the first token is supposed to be the start of array '['
            if (!JsonToken.START_ARRAY.equals(token)) {
                // return or throw exception
                maybeHasNext = false;
                throw new RuntimeException("Can't get any JSON Token fro array start from "
                        + dataFile.getAbsolutePath());
            }
        } catch (Exception e) {
            maybeHasNext = false;
        }
        maybeHasNext = true;
    }

    /*
    This method returns the stream, and is the only method other 
    than the constructor that should be used.
    */
    public Stream<T> getStream() {
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(this, 0), false);
    }

    /* The remaining methods are what enables this to be passed to the spliterator generator, 
       since they make it Iterable.
    */
    @Override
    public boolean hasNext() {
        if (!maybeHasNext) {
            return false; // didn't get started
        }
        try {
            return (parser.nextToken() == JsonToken.START_OBJECT);
        } catch (Exception e) {
            System.out.println("Ex" + e);
            return false;
        }
    }

    @Override
    public T next() {
        try {
            JsonNode n = parser.readValueAsTree();
            //Because we can't send T as a parameter to the mapper
            T node = mapper.convertValue(n, type);
            return node;
        } catch (IOException | IllegalArgumentException e) {
            System.out.println("Ex" + e);
            return null;
        }

    }


}

回答by Chris

Remove implementation of Iterator

删除迭代器的实现

I think you can get rid of the whole Iterator implementation using Hymanson's API.

我认为您可以使用 Hymanson 的 API 摆脱整个 Iterator 实现。

The catch 22 here is that readValueAs can return an iterator, the only thing I did not figure out completely is why I have to consume the JSON Array start before I can let Hymanson do it's work

这里的问题 22 是 readValueAs 可以返回一个迭代器,我唯一没有完全弄清楚的是为什么我必须先使用 JSON Array start 才能让 Hymanson 完成它的工作

public class InputStreamJsonArrayStreamDataSupplier<T> implements Supplier<Stream<T>> {


private ObjectMapper mapper = new ObjectMapper();
private JsonParser jsonParser;
private Class<T> type;



public InputStreamJsonArrayStreamDataSupplier(Class<T> type) throws IOException {
    this.type = type;

    // Setup and get into a state to start iterating
    jsonParser = mapper.getFactory().createParser(data);
    jsonParser.setCodec(mapper);
    JsonToken token = jsonParser.nextToken();
    if (JsonToken.START_ARRAY.equals(token)) {
        // if it is started with START_ARRAY it's ok
        token = jsonParser.nextToken();
    }
    if (!JsonToken.START_OBJECT.equals(token)) {
        throw new RuntimeException("Can't get any JSON object from input " + data);
    }
}


public Stream<T> get() {
    try {
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize((Iterator<T>) jsonParser.readValuesAs(type), 0), false);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
}