Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark-LLAP on HDP 2.6.2 throws error on describe()

Highlighted

Spark-LLAP on HDP 2.6.2 throws error on describe()

Explorer

I am running Spark-LLAP to access hive on HDP 2.6.2.

Created a simple dataframe joining two ORC tables and started a describe command on the dataframe and got the following exception. Any help on what could be going wrong?

Message: Job aborted due to stage failure: Task 171 in stage 4.0 failed 4 times, most recent failure: Lost task 171.3 in stage 4.0 (TID 715, wx0599.danskenet.net, executor 50): java.lang.ArrayIndexOutOfBoundsException: -2
	at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstanceRandom(LlapBaseInputFormat.java:290)
	at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:240)
	at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:129)
	at org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:52)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:410)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
3 REPLIES 3
Highlighted

Re: Spark-LLAP on HDP 2.6.2 throws error on describe()

Rising Star

I am not able to reproduce.

Can you please share the exact steps you performed to get this exception? ie. tables' DDL, how you created them, which spark-llap version you re using, the spark code you run to get it, ...

Thanks.

Highlighted

Re: Spark-LLAP on HDP 2.6.2 throws error on describe()

Explorer

Strangely it succeeds on running the same code the second time and only for certain tables is this issue noted at all.

But let me share more details.

This was the spark code that I ran. We are running the version of Spark llap that shipped with HDP 2.6.2 not sure what the version is. This was the name of the jar if the version is in there somewhere (spark-llap-assembly-1.0.0.2.6.2.0-205).

val custDF = spark.table(remindedTable).selectExpr("acc as acc_tmp").coalesce(10) 
val txnDF = spark.table(allTxnTable).where("year = 2015")
val txnScopeDF = txnDF.join(custDF, txnDF("acc") === custDF("acc_tmp")).drop("acc_tmp")
val custOutflowStatDF = txnScopeDF.where("debit_flag = 1").groupBy("acc").agg(sum("blps").as("sum"),count("blps").as("cnt"))
custOutflowStatDF.describe().show()

The remindedTable was created from hive using CTAS from another hive ORC tables I guess. The larger table allTxnTable is a partitioned table. I cant share the entire create table statement because of security reasons but I can share the relevant parts below.

CREATE TABLE allTxnTable(
...
)
partitioned by (colName decimal(15,2))
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
  'orc.create.index'='true', 'orc.bloom.filter.columns'='*','orc.bloom.filter.fpp'='0.10');

Re: Spark-LLAP on HDP 2.6.2 throws error on describe()

Rising Star

Sorry, but I am still unable to reproduce the issue...

Don't have an account?
Coming from Hortonworks? Activate your account here