Member since
11-04-2016
74
Posts
16
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3247 | 02-28-2019 03:22 AM | |
2884 | 02-01-2019 01:15 AM | |
4119 | 04-16-2018 03:38 AM | |
32485 | 09-16-2017 04:36 AM | |
9044 | 09-11-2017 02:43 PM |
09-11-2017
02:43 PM
I had to link hive-site.xml to spark conf dir manually! export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export SPARK_CONF_DIR=$SPARK_HOME/conf sudo ln -s /etc/hive/conf/hive-site.xml $SPARK_CONF_DIR I am not sure why this is not already in spark conf. Problem has been solved.
... View more
09-11-2017
11:47 AM
1 Kudo
Hi, You can change your hue.ini in Cloudera Manager: (take a look at beeswax section: https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini) [beeswax] # A limit to the number of rows that can be downloaded from a query before it is truncated. # A value of -1 means there will be no limit. ## download_row_limit=100000 Just navigate to your Hue service -> Configuration -> search for "hue_safety_valve.ini" As you can see the rest is pretty easy. PS: I was always wondering where this 100K is coming from, so thanks for pointing it out 🙂 Cheers!
... View more
09-11-2017
09:18 AM
Hi, I switched to Spark 2.2 and latest version of Livy and now I am having problem with Hive. For instance this works in spark2-shell: scala> spark.sql("select * from bikeshare.trips")
res5: org.apache.spark.sql.DataFrame = [tripid: int, duration: int ... 9 more fields] But the same gives me this error in Hue notebook (Spark2.2) In Zeppelin spark.sql("select * from bikeshare.trips") org.apache.spark.sql.AnalysisException: Table or view not found: `bikeshare`.`trips`; line 1 pos 14; In Hue spark.sql("select * from bikshare.trips")
'Project [*] +- 'UnresolvedRelation `bikshare`.`trips` at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:82) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:66) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) ... 47 elided The spark.catalog shows empty table spark.catalog.listTables.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+ Listing databases in spark-shell displays all the databases but in Hue/Livy spark.sql("show databases").show
+------------+
|databaseName|
+------------+
| default|
+------------+ So of course the bikeshare doesn't exist from Livy spark.sql("use bikshare")
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'bikshare' not found; Is there something I am missing here? It used to work in spark 1.6 wih sqlContext. It's like it couldn't load metastore metadata about all the warehouse directories etc. Update: Hive Service for Spark 2 is enabled Many thanks, Maziyar
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Cloudera Hue
08-05-2017
12:57 PM
1 Kudo
Hello, I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table. Things I can do: 1. I can easily read tables from Hive tables in Spark 2.2 2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2 3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table Things I cannot do in Spark 2.2: 4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data. I don't understand what could cause this problem. Any help would be appreciate it. example: scala> val df = sc.parallelize(
| Seq(
| ("first", Array(2.0, 1.0, 2.1, 5.4)),
| ("test", Array(1.5, 0.5, 0.9, 3.7)),
| ("choose", Array(8.0, 2.9, 9.1, 2.5))
| ), 3
| ).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]
scala> df.show
+------+--------------------+
| _1| _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
| test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+
scala> df.write.saveAsTable("database.test")
scala> val savedDF = spark.sql("SELECT * FROM database.test")
res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]
scala> savedDF.show
+---+---+
|_1|_2|
+---+---+
+---+---+
scala> savedDF.count
res55: Long = 0 Thanks
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-29-2017
02:56 PM
It worked! I just ran the script on the node that is my Livy server. Thank you 🙂
... View more
07-28-2017
06:20 AM
1 Kudo
Hello, I have both Spark 1.6 and Spark 2.2 installed in my cluster through CDH. Normally my Livy server starts with default Spark 1.6 but now I want to start my Livy with Spark 2.2 which I figured maybe by changing the SPARK_HOME to point to SPARK2 would do the trick: Previously: export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export SPARK_CONF_DIR=$SPARK_HOME/conf New: export SPARK_HOME=/opt/cloudera/parcels/SPARK2 export SPARK_CONF_DIR=$SPARK_HOME/meta But this will led to an error which is understandable as Spark2 in Cloudera is spark2-submit: Exception in thread "main" java.io.IOException: Cannot run program "/opt/cloudera/parcels/SPARK2/bin/spark-submit": error=2, No such file or directory Is there anyway to configure the Livy to find the right spark2-submit and not the default name? I looked every where in the config and the code but maybe I missed something. Many thanks, Maziyar
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Hue
02-06-2017
11:15 AM
Same for me in CDH 5.10 and latest Livy. Everything else is ok as long as Hive is not involved.
... View more
02-04-2017
04:59 AM
Ok, the only thing worked. I update the table DBS in warehouse databse in MySQL with a correct URI. Then the alter table .. set location worked on all the exisitng tables. So I am not sure if there is a bug in "/usr/lib/cmf/service/hive/hive.s" when you use "Update Hive Metastore NameNodes" or this only should be enabled when you have HA enabled (I didn't!). Either way, this added the duplicate ports. Best, Maziyar
... View more
02-04-2017
04:34 AM
I also tried metatool to updatehe locaton but it didn't work. hive --config /etc/hive/conf/conf.server --service metatool -updateLocation "hdfs://hadoop-master-1:8020" "hdfs://hadoop-master-1:8020:8020" Initializing HiveMetaTool.. HiveMetaTool:A valid host is required in both old-loc and new-loc Ok now I tried everything possible. No way to update location nor drop the tables.
... View more
02-04-2017
03:56 AM
I have more info. After upgrading to CDH 5.10 I ran "Update Hive Metastore NameNodes" from Cloudera Manager. And that made the duplicates port into HiveMetaTool. I checked with the new table that was working, after updaing metastore namenodes now it has duplicate port in its URI. Is there a way to fix this in: /usr/lib/cmf/service/hive/hive.sh Many thanks, maziyar
... View more
- « Previous
- Next »