Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can I display column headings in spark-sql

avatar
Explorer

I'm running spark-sql under the Hortonworks HDP 2.6.4 Sandbox environment on a Virtualbox VM. Now, when I run SQL code in pyspark, which I'm running under spark.sql("SELECT query details").show(), the column headings and borders appear as default. However, when I run spark-sql queries from the spark-sql> prompt, there are no column headings showing as a default display, and I can't find any print.column.heading type settings in Ambari under spark2 config. Anyone know how to turn column header names on in spark-sql?

7 REPLIES 7

avatar

@Kai Chaza Try to run spark-sql like this:

$ SPARK_MAJOR_VERSION=2 spark-sql --conf "spark.hadoop.hive.cli.print.header=true"

spark-sql> select * from test.test3_falbani;
id	name
1	Felix
2	Jhon
Time taken: 3.015 seconds

You can also add the above config spark.hadoop.hive.cli.print.header=true to the Custom spark-defaults using ambari. I haven't been able to find one for the borders still but perhaps the above helps you in finding the the solution.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Explorer

Hi Felix, thanks for responding. I ran spark-sql with the above command prompt as you suggested but it's not working. I then added the line "spark.hadoop.hive.cli.print.header=true" to my custom Ambari settings under spark, and I also changed the default "false" value in advanced hive Ambari settings in "hive.cli.print.header". I have restarted both spark-sql and hive and I'm still not getting any column headings showing.

[root@sandbox-hdp ~]# SPARK_MAJOR_VERSION=2 spark-sql --conf "spark.hadoop.hive.cli.print.header=true"
SPARK_MAJOR_VERSION is set to 2, using Spark2
18/06/02 00:13:45 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/06/02 00:13:45 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/06/02 00:13:48 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
spark-sql> use sparkpluralsight;
Time taken: 2.701 seconds
spark-sql> select * from customers;
2222    Emily   WA
1111    John    WA
3333    Ricky   WA
4444    Jane    CA
5555    Amit    NJ
6666    Nina    NY
Time taken: 3.106 seconds, Fetched 6 row(s)
spark-sql>



avatar

@Kai Chaza I was testing with HDP 2.6.5, sorry about that. For HDP 2.6.4 I was able to make it work by adding the following property:

<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

To the /etc/spark2/conf/hive-site.xml (you have to edit the file manually not via ambari)

Here are the results in HDP 2.6.4:

SPARK_MAJOR_VERSION=2 spark-sql
SPARK_MAJOR_VERSION is set to 2, using Spark2
18/06/02 02:06:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/06/02 02:06:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
spark-sql> select * from test;
id      name
1	Felix
2Jhon
Time taken: 3.321 seconds, Fetched 2 row(s)

Please test it and let me know if this works for you!

avatar
Explorer

It works, thanks a mill Felix. I had to make the change manually in the hive-site.xml file as per your suggestion and bypass Ambari. When I rebooted I still had column names.

spark-sql> use sparkpluralsight;
Response code
Time taken: 2.14 seconds
spark-sql> select * from customers;
ID      NAME    ADDRESS
2222    Emily   WA
1111    John    WA
3333    Ricky   WA
4444    Jane    CA
5555    Amit    NJ
6666    Nina    NY
Time taken: 2.815 seconds, Fetched 6 row(s)
spark-sql>



avatar

@Kai Chaza Glad to know it worked for you as well. Please take a moment to login and click the "accept" link on the answer.

avatar

@Kai Chaza I was checking the code differences between HDP 2.6.5 and HDP 2.6.4. And found the following 2 bugs fixes which are present on HDP 2.6.5 but not on HDP 2.6.4

[SPARK-21637][SPARK-21451]

These explain why the --conf "spark.hadoop.hive.cli.print.header=true" is not working on 2.6.4 while it works fine on HDP 2.6.5.

avatar
Explorer

It works inline in hive at the command prompt, but it doesn't work in spark-sql at the command prompt.

Hive works fine:

[root@sandbox-hdp ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.4.0-91/0/hive-log4j.properties
hive> USE SparkPluralsight;
OK
Time taken: 2.988 seconds
hive> SET hive.cli.print.header=true;
hive> SELECT * FROM Customers;
OK
customers.id    customers.name  customers.address
2222    Emily   WA
1111    John    WA
3333    Ricky   WA
4444    Jane    CA
5555    Amit    NJ
6666    Nina    NY
Time taken: 1.008 seconds, Fetched: 6 row(s)
hive>


But spark-sql doesn't work:

spark-sql> USE SparkPluralsight;
Time taken: 0.028 seconds
spark-sql> SET spark.hadoop.hive.cli.print.header=true;
spark.hadoop.hive.cli.print.header      true
Time taken: 0.016 seconds, Fetched 1 row(s)
spark-sql> SELECT * FROM Customers;
2222    Emily   WA
1111    John    WA
3333    Ricky   WA
4444    Jane    CA
5555    Amit    NJ
6666    Nina    NY
Time taken: 0.212 seconds, Fetched 6 row(s)
spark-sql>