Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Kylin with Spark

avatar

I have setup kylin in HDP. Now want to build the cube using spark instead of map reduce. Cube is getting build with map reduce but when trying to use spark it is getting hanged. Tried to set the hive execution engine as spark instead of Tez in hive. But it is not working and getting error java.lang.NoClassDefFoundError: org/apache/spark/SparkConf. Is there any way we can run hive on spark in HDP or any suggestion how to run kylin using spark? Help is much appreciated.

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar
Contributor

What's the version of Kylin and HDP that you're running? Please provide the full error trace for analysis. Also, please set SPARK_HOME to KYLIN_HOME/spark before starting up Kylin.

Building Cube

avatar

KYLIN VERSION is 2.5.2 and Hdp version is 2.6.5. I set SPARK_HOME to KYLIN_HOME/spark

Followed the below document http://kylin.apache.org/docs20/tutorial/cube_spark.html

Except copying the spark-assembly-1.6.3-hadoop2.6.0.jar jar I build a fat jar of spark 2.2 and copied to /kylin/spark directory in hdfs.

As mentioned there is no error. I am able to login to kylin. But when choosing the execution engine as spark for building the cube. it is by default running mapreduce job instead of spark. Any suggestion how to change the application type to spark? Also note in hive-site-xml. I have change the execution engine from tez to spark,.

avatar
Contributor

Hello Somnath,

Kylin 2.5 needs Spark 2.1 (not Spark 2.2); The guide for Kylin 2.5 is in https://kylin.apache.org/docs/tutorial/cube_spark.html; (the link you provided above is the tutorial for Kylin 2.0, not matched with your version);

Please try this:

1) build the assembly jar from the spark 2.1 that shipped with Kylin;

2) copy the sample cube to a new one, edit it, and then change the "engine type" to "spark";

Just take a try, it should be minor issue as many users are using Spark now.

avatar

Thanks Shaofeng for your quick response. I am able to run in spark now. But job is getting failed at Convert Cuboid Data to HFile.

Getting below error now.

Caused by: java.lang.RuntimeException: Could not create  interface org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the hadoop compatibility jar on the classpath?
	at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:73)
	at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:31)
	at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:192)
	... 15 more
Caused by: java.util.NoSuchElementException
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:59)
	... 17 more

19/01/07 17:02:23 WARN TaskSetManager: Lost task 1.1 in stage 1.0 (TID 13, data1.hdplab.com, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thanks Shaofeng for your help in solving this. I also added hbase-common.* , hbase-client as I was getting other error related to class not found.

Thanks again!!