About falbani

falbani · ‎05-16-2018

@David Sandoval What version of HDP are you running this with? I believe the missing class was added starting HDP 2.6.1 only. I also noticed you are using spark 2.1 with scala 2.10 - Spark 2.1.0 uses Scala 2.11, so you should change this as well. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-16-2018

@Clément Dumont - If I'm correct those errors are showing on the ambari startup operations for the HDFS and HIVE. This means that ambari is trying to reach the ranger admin ui and is failing to communicate for whatever reason. Ambari will use the following configuration settings for the url: for HDFS: ranger.plugin.hdfs.policy.rest.url and for HIVE: ranger.plugin.hive.policy.rest.url I suggest you check on HDFS > Configs if the ranger.plugin.hdfs.policy.rest.url is correctly pointing to the ranger ui url Same for HIVE > Configs if ranger.plugin.hive.policy.rest.url is correctlly pointing to the ranger ui url HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-15-2018

@Abdul Rahim it seems the problem you have now is different. Looks like the input data is not being split on comma. Make sure the map _.split(",") is working because it seems not to be working for you now. Also please mark the other answer I provided as it solved the original parsing issue you had.

falbani · ‎05-15-2018

@Abdul Rahim The error is caused due you are parsing a string that contains a double into a long. Instead you should parse it into a double The following code works fine for me: case class Person(index:Long,item:String,cost:Double,Tax:Double,Total:Double) val peopleDs = sc.textFile("hdpcd/Samplecsv").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toDouble,attributes(3).toDouble,attributes(4).toDouble)).toDF() peopleDs.createOrReplaceTempView("people") val res = spark.sql("Select * from people") res.collect() Results: defined class Person peopleDs: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res24: Array[org.apache.spark.sql.Row] = Array([1,Fruit of the Loom Girls Socks,7.97,0.6,8.57], [2,Rawlings Little League Baseball,2.97,0.22,3.19], [3,Secret Antiperspirant,1.29,0.1,1.39], [4,Deadpool DVD,14.96,1.12,16.08], [5,Maxwell House Coffee 28 oz,7.28,0.55,7.83], [6,Banana Boat Sunscreen,6.68,0.5,7.18], [7,Wrench Set,10.0,0.75,10.75], [8,M and Mz,8.98,0.67,9.65], [9,Bertoli Alfredo Sauce,2.12,0.16,2.28], [10,Large Paperclips,6.19,0.46,6.65]) Note: If you comment this post make sure you tag my name. And If you found this answer helped addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-08-2018

@Bhushan Kandalkar above steps look good to me. Do you see any errors on hiveserver2.log?

falbani · ‎05-08-2018

@Khouloud Landari Do you see it stuck after those WARN Service SparkUI could not bind to port 4041? If that is the case I think the problem maybe is not able to start an application on yarn. What happens is spark2 pyspark launches a yarn application on your cluster and I think this is what is probably failing. Try this command and let me know if this works: SPARK_MAJOR_VERSION=2 pyspark --master local --verbose Also I would advise you to check the Resource Manager logs. RM logs can be found on RM host under /var/log/hadoop-yarn This will probably show what the problem is with yarn and why your zeppelin user is not able to start applications on the hadoop cluster. HTH

falbani · ‎05-08-2018

@Bhushan Kandalkar did you add the hive certificate to the knox host cacerts and restart Knox? This may help resolve the problem. #open console to knox host # run the following command to locate the jdk used by knox ps -ef | grep -i knox # run the following command to import the hive certificate to the default cacerts truststore keytool -import -file hive.crt -keystore /<knox_jdk_path>/jre/lib/security/cacerts -storepass changeit -alias hive Note: if you add any comments to this post please make sure you tag my name. Also If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-08-2018

@RAUI I did run it more than once. I edited the previous comment also mentioning the same. No errors even after several executions of same code. I'm using spark 2.2.0 - HDP 2.6.4 - Could you provide the full error stack? Also did you use specific location for the database? Are you running on master yarn or local?

falbani · ‎05-08-2018

@RAUI What version of HDP and Spark are you using? I tested the same using HDP 2.6.4 on zeppelin and is working fine with spark 2. I run the following code more than one time and it always ended with no errors: spark.sql("show databases").show spark.sql("CREATE DATABASE IF NOT EXISTS abc LOCATION '/user/zeppelin/abc.db'") +------------+ |databaseName| +------------+ | abc| | default| +------------+ res27: org.apache.spark.sql.DataFrame = [] Please provide full error stack and details of spark/hdp version you are using. Note: Please tag my name if you provide a comment to this post using my name and symbol @

falbani · ‎05-08-2018

@Khouloud Landari The error message is very generic. To be able to further help you finding the solution please provide: 1. Check /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-*.log - And copy the important pieces you consider worth sharing 2. From zeppelin UI > Interpreter > Take screenshot of spark2 interpreter configuration and share - Also try restarting the interpreter and check if that helps or not. 3. Run the following command from zeppelin host: SPARK_MAJOR_VERSION=2 pyspark --master yarn --verbose and copy the output you get in the console to this post With this information we should be able to draw further conclusions as to what could be causing the issue. Note: If you add a comment to this post please make sure you tag my name using @ and my name. This way I will be able to know you have updated with more information.

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: Spark-submit to deploy the jar to Spark- java....

Re: Error after activation ranger plugin

Re: Why the below Spark operation on Dataframe API...

Re: Why the below Spark operation on Dataframe API...

Re: Knox over Hive SSL Failed

Re: [Zeppelin]: pyspark is not responding!

Re: Knox over Hive SSL Failed

Re: Why create databases if not exists is throwing...

Re: Why create databases if not exists is throwing...

Re: [Zeppelin]: pyspark is not responding!