Created 01-04-2019 11:42 AM
I have setup kylin in HDP. Now want to build the cube using spark instead of map reduce. Cube is getting build with map reduce but when trying to use spark it is getting hanged. Tried to set the hive execution engine as spark instead of Tez in hive. But it is not working and getting error java.lang.NoClassDefFoundError: org/apache/spark/SparkConf. Is there any way we can run hive on spark in HDP or any suggestion how to run kylin using spark? Help is much appreciated.
Created 01-08-2019 01:29 AM
Hello Somnath,
This is a known issue and recorded in https://issues.apache.org/jira/browse/KYLIN-3607.
The workaround is: "After I added hbase-hadoop2-compat-*.jar and hbase-hadoop-compat-*.jar into $KYLIN_HOME/spark/jars, then it worked."
The two jar files can be found in HBase's lib folder; As you already make the Spark assembly jar, you may need to re-package that and then upload to HDFS.
After doting that, just resume Kylin's fail job, it will re-submit the spark job. Ideally, it will be okay.
Created 01-05-2019 09:55 AM
What's the version of Kylin and HDP that you're running? Please provide the full error trace for analysis. Also, please set SPARK_HOME to KYLIN_HOME/spark before starting up Kylin.
Building Cube
Created 01-07-2019 09:42 AM
KYLIN VERSION is 2.5.2 and Hdp version is 2.6.5. I set SPARK_HOME to KYLIN_HOME/spark
Followed the below document http://kylin.apache.org/docs20/tutorial/cube_spark.html
Except copying the spark-assembly-1.6.3-hadoop2.6.0.jar jar I build a fat jar of spark 2.2 and copied to /kylin/spark directory in hdfs.
As mentioned there is no error. I am able to login to kylin. But when choosing the execution engine as spark for building the cube. it is by default running mapreduce job instead of spark. Any suggestion how to change the application type to spark? Also note in hive-site-xml. I have change the execution engine from tez to spark,.
Created 01-07-2019 10:19 AM
Hello Somnath,
Kylin 2.5 needs Spark 2.1 (not Spark 2.2); The guide for Kylin 2.5 is in https://kylin.apache.org/docs/tutorial/cube_spark.html; (the link you provided above is the tutorial for Kylin 2.0, not matched with your version);
Please try this:
1) build the assembly jar from the spark 2.1 that shipped with Kylin;
2) copy the sample cube to a new one, edit it, and then change the "engine type" to "spark";
Just take a try, it should be minor issue as many users are using Spark now.
Created 01-07-2019 05:14 PM
Thanks Shaofeng for your quick response. I am able to run in spark now. But job is getting failed at Convert Cuboid Data to HFile.
Getting below error now.
Caused by: java.lang.RuntimeException: Could not create interface org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the hadoop compatibility jar on the classpath? at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:73) at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:31) at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:192) ... 15 more Caused by: java.util.NoSuchElementException at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:59) ... 17 more
19/01/07 17:02:23 WARN TaskSetManager: Lost task 1.1 in stage 1.0 (TID 13, data1.hdplab.com, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile
Created 01-08-2019 01:29 AM
Hello Somnath,
This is a known issue and recorded in https://issues.apache.org/jira/browse/KYLIN-3607.
The workaround is: "After I added hbase-hadoop2-compat-*.jar and hbase-hadoop-compat-*.jar into $KYLIN_HOME/spark/jars, then it worked."
The two jar files can be found in HBase's lib folder; As you already make the Spark assembly jar, you may need to re-package that and then upload to HDFS.
After doting that, just resume Kylin's fail job, it will re-submit the spark job. Ideally, it will be okay.
Created 01-08-2019 04:38 PM
Thanks Shaofeng for your help in solving this. I also added hbase-common.* , hbase-client as I was getting other error related to class not found.
Thanks again!!