Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive context save file

avatar
Expert Contributor

Hello

I work with Hive- Context to load and manipulate data in my orc format. I would now please know how to save in the hdfs file the results of a sql queries ?

Help me please?

Here is my Hive-Context code, I would like to save the contents of hive_context in a file on my hdfs :

Thanks you in advance

from pyspark.sql import HiveContext from pyspark import SparkContext
sc =SparkContext()
hive_context = HiveContext(sc) qvol = hive_context.table("<bdd_name>.<table_name>") qvol.registerTempTable("qvol_temp") hive_context.sql("select * from qvol_temp limit 10").show()
1 ACCEPTED SOLUTION

avatar
Super Guru

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

View solution in original post

7 REPLIES 7

avatar
Super Guru

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

avatar
Expert Contributor

Thank you. But here are the errors generated by the two attributes saveAsHadoopFile and .write.format:

4795-untitled.png

4796-untitled2.png

This means that these two attributes are not recognized by HiveContext.

Thank you !!!

avatar
Super Guru

saveAsHadoopFile is applicable for RDD and is not for DF, can you try hive_context.write.format("orc").save("test_orc")

avatar
Expert Contributor

I tried with hive_context.write.format("orc").save("test_orc") but I receive this error:

>>> hive_context.write.format("orc").save("hdfs://dev/datalake/app/des/dev/transformer/test_orc") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'HiveContext' object has no attribute 'write'

Thanks

avatar
Super Guru

could you please modify your program in this way and see if you still see any excepton

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
sc = SparkContext()
sqlContext = HiveContext(sc)
sqlContext.sql("select * from default.aaa limit 3").write.format("orc").save("test_orc2")

avatar
Expert Contributor
Hello, thank you.
Here it goes. And i also found parquet file. Currently I am also looking to save as csv file and text if possible.
Cordially

avatar
Expert Contributor
It's good I found !!!
thank you