Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Kylin with Spark

avatar

I have setup kylin in HDP. Now want to build the cube using spark instead of map reduce. Cube is getting build with map reduce but when trying to use spark it is getting hanged. Tried to set the hive execution engine as spark instead of Tez in hive. But it is not working and getting error java.lang.NoClassDefFoundError: org/apache/spark/SparkConf. Is there any way we can run hive on spark in HDP or any suggestion how to run kylin using spark? Help is much appreciated.

1 ACCEPTED SOLUTION

avatar
Contributor

Hello Somnath,

This is a known issue and recorded in https://issues.apache.org/jira/browse/KYLIN-3607.

The workaround is: "After I added hbase-hadoop2-compat-*.jar and hbase-hadoop-compat-*.jar into $KYLIN_HOME/spark/jars, then it worked."

The two jar files can be found in HBase's lib folder; As you already make the Spark assembly jar, you may need to re-package that and then upload to HDFS.

After doting that, just resume Kylin's fail job, it will re-submit the spark job. Ideally, it will be okay.

View solution in original post

6 REPLIES 6

avatar
Contributor

What's the version of Kylin and HDP that you're running? Please provide the full error trace for analysis. Also, please set SPARK_HOME to KYLIN_HOME/spark before starting up Kylin.

Building Cube

avatar

KYLIN VERSION is 2.5.2 and Hdp version is 2.6.5. I set SPARK_HOME to KYLIN_HOME/spark

Followed the below document http://kylin.apache.org/docs20/tutorial/cube_spark.html

Except copying the spark-assembly-1.6.3-hadoop2.6.0.jar jar I build a fat jar of spark 2.2 and copied to /kylin/spark directory in hdfs.

As mentioned there is no error. I am able to login to kylin. But when choosing the execution engine as spark for building the cube. it is by default running mapreduce job instead of spark. Any suggestion how to change the application type to spark? Also note in hive-site-xml. I have change the execution engine from tez to spark,.

avatar
Contributor

Hello Somnath,

Kylin 2.5 needs Spark 2.1 (not Spark 2.2); The guide for Kylin 2.5 is in https://kylin.apache.org/docs/tutorial/cube_spark.html; (the link you provided above is the tutorial for Kylin 2.0, not matched with your version);

Please try this:

1) build the assembly jar from the spark 2.1 that shipped with Kylin;

2) copy the sample cube to a new one, edit it, and then change the "engine type" to "spark";

Just take a try, it should be minor issue as many users are using Spark now.

avatar

Thanks Shaofeng for your quick response. I am able to run in spark now. But job is getting failed at Convert Cuboid Data to HFile.

Getting below error now.

Caused by: java.lang.RuntimeException: Could not create  interface org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the hadoop compatibility jar on the classpath?
	at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:73)
	at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:31)
	at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:192)
	... 15 more
Caused by: java.util.NoSuchElementException
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:59)
	... 17 more

19/01/07 17:02:23 WARN TaskSetManager: Lost task 1.1 in stage 1.0 (TID 13, data1.hdplab.com, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile

avatar
Contributor

Hello Somnath,

This is a known issue and recorded in https://issues.apache.org/jira/browse/KYLIN-3607.

The workaround is: "After I added hbase-hadoop2-compat-*.jar and hbase-hadoop-compat-*.jar into $KYLIN_HOME/spark/jars, then it worked."

The two jar files can be found in HBase's lib folder; As you already make the Spark assembly jar, you may need to re-package that and then upload to HDFS.

After doting that, just resume Kylin's fail job, it will re-submit the spark job. Ideally, it will be okay.

avatar

Thanks Shaofeng for your help in solving this. I also added hbase-common.* , hbase-client as I was getting other error related to class not found.

Thanks again!!