Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

hive context save file

Rising Star

Hello

I work with Hive- Context to load and manipulate data in my orc format. I would now please know how to save in the hdfs file the results of a sql queries ?

Help me please?

Here is my Hive-Context code, I would like to save the contents of hive_context in a file on my hdfs :

Thanks you in advance

from pyspark.sql import HiveContext from pyspark import SparkContext
sc =SparkContext()
hive_context = HiveContext(sc) qvol = hive_context.table("<bdd_name>.<table_name>") qvol.registerTempTable("qvol_temp") hive_context.sql("select * from qvol_temp limit 10").show()
1 ACCEPTED SOLUTION

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

View solution in original post

7 REPLIES 7

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

Rising Star

Thank you. But here are the errors generated by the two attributes saveAsHadoopFile and .write.format:

4795-untitled.png

4796-untitled2.png

This means that these two attributes are not recognized by HiveContext.

Thank you !!!

saveAsHadoopFile is applicable for RDD and is not for DF, can you try hive_context.write.format("orc").save("test_orc")

Rising Star

I tried with hive_context.write.format("orc").save("test_orc") but I receive this error:

>>> hive_context.write.format("orc").save("hdfs://dev/datalake/app/des/dev/transformer/test_orc") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'HiveContext' object has no attribute 'write'

Thanks

could you please modify your program in this way and see if you still see any excepton

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
sc = SparkContext()
sqlContext = HiveContext(sc)
sqlContext.sql("select * from default.aaa limit 3").write.format("orc").save("test_orc2")

Rising Star
Hello, thank you.
Here it goes. And i also found parquet file. Currently I am also looking to save as csv file and text if possible.
Cordially

Rising Star
It's good I found !!!
thank you
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.