Support Questions

Find answers, ask questions, and share your expertise

Query Hive Using Python Getting below error

avatar
Contributor

How to resove given ERROR - 

ERROR - Unknown Exception:07<class 'py4j.protocol.Py4JJavaError'>An error occurred while calling o1140.insertInto.
: org.apache.spark.SparkException: Job aborted.

1 ACCEPTED SOLUTION

avatar
Guru

@pankshiv1809  were you able to fix? Please click "Accept as solution" if this has worked

View solution in original post

7 REPLIES 7

avatar
Guru

@pankshiv1809  I see you are getting Spark related error . Are you using Spark or Hive query ?

avatar
Contributor

Hi Asish thanks for re-veiw here We are using python script using spark parameter, Also i am sharing complete error log for to do more analysis -

 

Unknown Exception:07<class 'py4j.protocol.Py4JJavaError'>An error occurred while calling o1140.insertInto.
: org.apache.spark.SparkException: Job aborted.
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:664)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:664)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:664)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:322)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:308)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 21 in stage 107.0 failed 4 times, most recent failure: Lost task 21.3 in stage 107.0 (TID 17545, NDC3HDPPRODDN13.vodafoneidea.com, executor 75): java.io.FileNotFoundException: File does not exist: /warehouse/tablespace/external/hive/dim_cd_db.db/dim_subs_language_ivr/circle_id=14/part-00000-f53b4b78-8256-4040-9195-7af54c3365c8.c000.snappy.orc
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1931)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)

avatar
Guru

This seems to be more of Spark. But I see below error:

 java.io.FileNotFoundException: File does not exist: /warehouse/tablespace/external/hive/dim_cd_db.db/dim_subs_language_ivr/circle_id=14/part-00000-f53b4b78-8256-4040-9195-7af54c3365c8.c000.snappy.orc

 

 Can you please check,if the file is present:

hdfs dfs -ls/warehouse/tablespace/external/hive/dim_cd_db.db/dim_subs_language_ivr/circle_id=14/part-00000-f53b4b78-8256-4040-9195-7af54c3365c8.c000.snappy.orc

Please also perfrom and let us know

msck repair table dim_subs_language_ivr

Try to run the same in hive beeline and check if the issue persists.

 

avatar
Contributor

Sure Asish. Let me check and will update you accordingly with log and respective o/p.

avatar
Guru

@pankshiv1809  were you able to fix? Please click "Accept as solution" if this has worked

avatar
Contributor

Hi Asish,

 

I have implemented the steps.. and my team will work on post dependent flow...will update you if in case any issue comes on the same.

avatar
Guru

thank you @pankshiv1809