在 zeppelin scala 中读取大型 JSON 文件时出现 org.apache.thrift.transport.TTransportException 错误

Question

提问by Kiran Shashi

I am trying to read a large JSON file (1.5 GB) using Zeppelin and Scala.

我正在尝试使用 Zeppelin 和 Scala 读取大型 JSON 文件（1.5 GB）。

Zeppelin is working on SPARK in local mode installed on Ubuntu OS on a VM with 10 GB RAM. I have alloted 8GB to the spark.executor.memory

Zeppelin 正在以本地模式在具有 10 GB RAM 的 VM 上安装在 Ubuntu 操作系统上的 SPARK。我已经为 spark.executor.memory 分配了 8GB

My Code is as below

我的代码如下

val inputFileWeather="/home/shashi/incubator-zeppelin-master/data/ai/weather.json"
val temp=sqlContext.read.json(inputFileWeather)

I am getting the following error

我收到以下错误

org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:241)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:225)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:229)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:229)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Answer 1

回答by user1314742

The error you got is due to a problem in running the Spark interpreter, so Zeppelin could not connect with the interpreter process.

您得到的错误是由于运行 Spark 解释器时出现问题，因此 Zeppelin 无法连接到解释器进程。

You have to check your logs located in /PATH/TO/ZEPPELIN/logs/*.outto know exactly what happening. Perhaps in the interpreter logs you will see an OOM.

您必须检查位于中的日志/PATH/TO/ZEPPELIN/logs/*.out以确切了解发生了什么。也许在解释器日志中你会看到一个 OOM。

I think that 8GB for executor memory on a VM with 10 GB is a unreasonable,(and how many executors are you starting?). You have to consider the driver memeory as well

我认为 10 GB 的 VM 上的执行程序内存为 8GB 是不合理的，（您要启动多少个执行程序？）。您还必须考虑驱动程序内存

Answer 2

回答by Aditya Bangard

Increase the driver memory in the pyspark interpreter i.e. spark.driver.memory. By default its 1G

增加 pyspark 解释器中的驱动程序内存，即 spark.driver.memory。默认为 1G

在 zeppelin scala 中读取大型 JSON 文件时出现 org.apache.thrift.transport.TTransportException 错误

提问by Kiran Shashi

回答by user1314742

回答by Aditya Bangard

相关推荐

最近更新

标签

在 zeppelin scala 中读取大型 JSON 文件时出现 org.apache.thrift.transport.TTransportException 错误

提问by Kiran Shashi

回答by user1314742

回答by Aditya Bangard

相关推荐

scala 如何将 Akka ByteString 转换为 String？

Spark：在 Scala 中以编程方式创建数据帧架构

scala 将行值转换为火花数据框中的列数组

IntelliJ 错误：Scala 145，错误：scalac：找不到 Scala 编译器 JAR

相关推荐

最近更新

标签