About maziyar

maziyar · ‎09-11-2017

I had to link hive-site.xml to spark conf dir manually! export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export SPARK_CONF_DIR=$SPARK_HOME/conf sudo ln -s /etc/hive/conf/hive-site.xml $SPARK_CONF_DIR I am not sure why this is not already in spark conf. Problem has been solved.

maziyar · ‎09-11-2017

Hi, You can change your hue.ini in Cloudera Manager: (take a look at beeswax section: https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini) [beeswax] # A limit to the number of rows that can be downloaded from a query before it is truncated. # A value of -1 means there will be no limit. ## download_row_limit=100000 Just navigate to your Hue service -> Configuration -> search for "hue_safety_valve.ini" As you can see the rest is pretty easy. PS: I was always wondering where this 100K is coming from, so thanks for pointing it out 🙂 Cheers!

maziyar · ‎09-11-2017

Hi, I switched to Spark 2.2 and latest version of Livy and now I am having problem with Hive. For instance this works in spark2-shell: scala> spark.sql("select * from bikeshare.trips") res5: org.apache.spark.sql.DataFrame = [tripid: int, duration: int ... 9 more fields] But the same gives me this error in Hue notebook (Spark2.2) In Zeppelin spark.sql("select * from bikeshare.trips") org.apache.spark.sql.AnalysisException: Table or view not found: `bikeshare`.`trips`; line 1 pos 14; In Hue spark.sql("select * from bikshare.trips") 'Project [*] +- 'UnresolvedRelation `bikshare`.`trips` at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:82) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:66) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) ... 47 elided The spark.catalog shows empty table spark.catalog.listTables.show +----+--------+-----------+---------+-----------+ |name|database|description|tableType|isTemporary| +----+--------+-----------+---------+-----------+ +----+--------+-----------+---------+-----------+ Listing databases in spark-shell displays all the databases but in Hue/Livy spark.sql("show databases").show +------------+ |databaseName| +------------+ | default| +------------+ So of course the bikeshare doesn't exist from Livy spark.sql("use bikshare") org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'bikshare' not found; Is there something I am missing here? It used to work in spark 1.6 wih sqlContext. It's like it couldn't load metastore metadata about all the warehouse directories etc. Update: Hive Service for Spark 2 is enabled Many thanks, Maziyar

maziyar · ‎08-05-2017

Hello, I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table. Things I can do: 1. I can easily read tables from Hive tables in Spark 2.2 2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2 3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table Things I cannot do in Spark 2.2: 4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data. I don't understand what could cause this problem. Any help would be appreciate it. example: scala> val df = sc.parallelize( | Seq( | ("first", Array(2.0, 1.0, 2.1, 5.4)), | ("test", Array(1.5, 0.5, 0.9, 3.7)), | ("choose", Array(8.0, 2.9, 9.1, 2.5)) | ), 3 | ).toDF df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>] scala> df.show +------+--------------------+ | _1| _2| +------+--------------------+ | first|[2.0, 1.0, 2.1, 5.4]| | test|[1.5, 0.5, 0.9, 3.7]| |choose|[8.0, 2.9, 9.1, 2.5]| +------+--------------------+ scala> df.write.saveAsTable("database.test") scala> val savedDF = spark.sql("SELECT * FROM database.test") res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>] scala> savedDF.show +---+---+ |_1|_2| +---+---+ +---+---+ scala> savedDF.count res55: Long = 0 Thanks

maziyar · ‎07-29-2017

It worked! I just ran the script on the node that is my Livy server. Thank you 🙂

maziyar · ‎07-28-2017

Hello, I have both Spark 1.6 and Spark 2.2 installed in my cluster through CDH. Normally my Livy server starts with default Spark 1.6 but now I want to start my Livy with Spark 2.2 which I figured maybe by changing the SPARK_HOME to point to SPARK2 would do the trick: Previously: export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export SPARK_CONF_DIR=$SPARK_HOME/conf New: export SPARK_HOME=/opt/cloudera/parcels/SPARK2 export SPARK_CONF_DIR=$SPARK_HOME/meta But this will led to an error which is understandable as Spark2 in Cloudera is spark2-submit: Exception in thread "main" java.io.IOException: Cannot run program "/opt/cloudera/parcels/SPARK2/bin/spark-submit": error=2, No such file or directory Is there anyway to configure the Livy to find the right spark2-submit and not the default name? I looked every where in the config and the code but maybe I missed something. Many thanks, Maziyar

maziyar · ‎02-06-2017

Same for me in CDH 5.10 and latest Livy. Everything else is ok as long as Hive is not involved.

maziyar · ‎02-04-2017

Ok, the only thing worked. I update the table DBS in warehouse databse in MySQL with a correct URI. Then the alter table .. set location worked on all the exisitng tables. So I am not sure if there is a bug in "/usr/lib/cmf/service/hive/hive.s" when you use "Update Hive Metastore NameNodes" or this only should be enabled when you have HA enabled (I didn't!). Either way, this added the duplicate ports. Best, Maziyar

maziyar · ‎02-04-2017

I also tried metatool to updatehe locaton but it didn't work. hive --config /etc/hive/conf/conf.server --service metatool -updateLocation "hdfs://hadoop-master-1:8020" "hdfs://hadoop-master-1:8020:8020" Initializing HiveMetaTool.. HiveMetaTool:A valid host is required in both old-loc and new-loc Ok now I tried everything possible. No way to update location nor drop the tables.

maziyar · ‎02-04-2017

I have more info. After upgrading to CDH 5.10 I ran "Update Hive Metastore NameNodes" from Cloudera Manager. And that made the duplicates port into HiveMetaTool. I checked with the new table that was working, after updaing metastore namenodes now it has duplicate port in its URI. Is there a way to fix this in: /usr/lib/cmf/service/hive/hive.sh Many thanks, maziyar

Online	Offline
Last Visited	‎01-04-2022 01:48 AM

Member Since	‎11-04-2016 03:47 AM
Last Visited	‎01-04-2022 01:48 AM
Posts	74
Kudos received	16

Cloudera Community

Re: CM 6.1 hosts status Unknown Health randomly an...

Re: Cloudera Express 6.x release

Re: Can't start cloudera-scm-server after upgradin...

Re: Saving Spark 2.2 dataframs in Hive table

Re: Livy cannot find the database

Re: Livy cannot find the database

Re: HUE parameter change from Cloudera Manager:

Livy cannot find the database

Saving Spark 2.2 dataframs in Hive table

Re: Spark 2.2 and Livy

Spark 2.2 and Livy

Re: java.lang.IllegalStateException(RPC channel is...

Re: [CDH 5.10 upgrade] Wrong FS Hive tables

Re: [CDH 5.10 upgrade] Wrong FS Hive tables

Re: [CDH 5.10 upgrade] Wrong FS Hive tables