Created 06-01-2018 09:43 PM
I'm running spark-sql under the Hortonworks HDP 2.6.4 Sandbox environment on a Virtualbox VM. Now, when I run SQL code in pyspark, which I'm running under spark.sql("SELECT query details").show(), the column headings and borders appear as default. However, when I run spark-sql queries from the spark-sql> prompt, there are no column headings showing as a default display, and I can't find any print.column.heading type settings in Ambari under spark2 config. Anyone know how to turn column header names on in spark-sql?
Created 06-01-2018 10:24 PM
@Kai Chaza Try to run spark-sql like this:
$ SPARK_MAJOR_VERSION=2 spark-sql --conf "spark.hadoop.hive.cli.print.header=true" spark-sql> select * from test.test3_falbani; id name 1 Felix 2 Jhon Time taken: 3.015 seconds
You can also add the above config spark.hadoop.hive.cli.print.header=true to the Custom spark-defaults using ambari. I haven't been able to find one for the borders still but perhaps the above helps you in finding the the solution.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 06-02-2018 12:35 AM
Hi Felix, thanks for responding. I ran spark-sql with the above command prompt as you suggested but it's not working. I then added the line "spark.hadoop.hive.cli.print.header=true" to my custom Ambari settings under spark, and I also changed the default "false" value in advanced hive Ambari settings in "hive.cli.print.header". I have restarted both spark-sql and hive and I'm still not getting any column headings showing.
[root@sandbox-hdp ~]# SPARK_MAJOR_VERSION=2 spark-sql --conf "spark.hadoop.hive.cli.print.header=true" SPARK_MAJOR_VERSION is set to 2, using Spark2 18/06/02 00:13:45 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 18/06/02 00:13:45 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 18/06/02 00:13:48 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. spark-sql> use sparkpluralsight; Time taken: 2.701 seconds spark-sql> select * from customers; 2222 Emily WA 1111 John WA 3333 Ricky WA 4444 Jane CA 5555 Amit NJ 6666 Nina NY Time taken: 3.106 seconds, Fetched 6 row(s) spark-sql>
Created 06-02-2018 02:13 AM
@Kai Chaza I was testing with HDP 2.6.5, sorry about that. For HDP 2.6.4 I was able to make it work by adding the following property:
<property> <name>hive.cli.print.header</name> <value>true</value> </property>
To the /etc/spark2/conf/hive-site.xml (you have to edit the file manually not via ambari)
Here are the results in HDP 2.6.4:
SPARK_MAJOR_VERSION=2 spark-sql SPARK_MAJOR_VERSION is set to 2, using Spark2 18/06/02 02:06:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 18/06/02 02:06:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException spark-sql> select * from test; id name 1 Felix 2Jhon Time taken: 3.321 seconds, Fetched 2 row(s)
Please test it and let me know if this works for you!
Created 06-02-2018 12:33 PM
It works, thanks a mill Felix. I had to make the change manually in the hive-site.xml file as per your suggestion and bypass Ambari. When I rebooted I still had column names.
spark-sql> use sparkpluralsight; Response code Time taken: 2.14 seconds spark-sql> select * from customers; ID NAME ADDRESS 2222 Emily WA 1111 John WA 3333 Ricky WA 4444 Jane CA 5555 Amit NJ 6666 Nina NY Time taken: 2.815 seconds, Fetched 6 row(s) spark-sql>
Created 06-04-2018 03:01 PM
@Kai Chaza Glad to know it worked for you as well. Please take a moment to login and click the "accept" link on the answer.
Created 06-02-2018 02:43 AM
@Kai Chaza I was checking the code differences between HDP 2.6.5 and HDP 2.6.4. And found the following 2 bugs fixes which are present on HDP 2.6.5 but not on HDP 2.6.4
[SPARK-21637][SPARK-21451]
These explain why the --conf "spark.hadoop.hive.cli.print.header=true" is not working on 2.6.4 while it works fine on HDP 2.6.5.
Created 06-02-2018 12:57 AM
It works inline in hive at the command prompt, but it doesn't work in spark-sql at the command prompt.
Hive works fine:
[root@sandbox-hdp ~]# hive log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.6.4.0-91/0/hive-log4j.properties hive> USE SparkPluralsight; OK Time taken: 2.988 seconds hive> SET hive.cli.print.header=true; hive> SELECT * FROM Customers; OK customers.id customers.name customers.address 2222 Emily WA 1111 John WA 3333 Ricky WA 4444 Jane CA 5555 Amit NJ 6666 Nina NY Time taken: 1.008 seconds, Fetched: 6 row(s) hive>
But spark-sql doesn't work:
spark-sql> USE SparkPluralsight; Time taken: 0.028 seconds spark-sql> SET spark.hadoop.hive.cli.print.header=true; spark.hadoop.hive.cli.print.header true Time taken: 0.016 seconds, Fetched 1 row(s) spark-sql> SELECT * FROM Customers; 2222 Emily WA 1111 John WA 3333 Ricky WA 4444 Jane CA 5555 Amit NJ 6666 Nina NY Time taken: 0.212 seconds, Fetched 6 row(s) spark-sql>