Support Questions

nanyim_alain · ‎06-07-2016

Hello

I work with Hive- Context to load and manipulate data in my orc format. I would now please know how to save in the hdfs file the results of a sql queries ?

Help me please?

Here is my Hive-Context code, I would like to save the contents of hive_context in a file on my hdfs :

Thanks you in advance

from pyspark.sql import HiveContext from pyspark import SparkContext
sc =SparkContext()
hive_context = HiveContext(sc) qvol = hive_context.table("<bdd_name>.<table_name>") qvol.registerTempTable("qvol_temp") hive_context.sql("select * from qvol_temp limit 10").show()

rajkumar_singh · ‎06-07-2016

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

View solution in original post

rajkumar_singh · ‎06-07-2016

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

nanyim_alain · ‎06-07-2016

Thank you. But here are the errors generated by the two attributes saveAsHadoopFile and .write.format:

This means that these two attributes are not recognized by HiveContext.

Thank you !!!

rajkumar_singh · ‎06-07-2016

saveAsHadoopFile is applicable for RDD and is not for DF, can you try hive_context.write.format("orc").save("test_orc")

nanyim_alain · ‎06-07-2016

I tried with hive_context.write.format("orc").save("test_orc") but I receive this error:

>>> hive_context.write.format("orc").save("hdfs://dev/datalake/app/des/dev/transformer/test_orc") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'HiveContext' object has no attribute 'write'

Thanks

rajkumar_singh · ‎06-07-2016

could you please modify your program in this way and see if you still see any excepton

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
sc = SparkContext()
sqlContext = HiveContext(sc)
sqlContext.sql("select * from default.aaa limit 3").write.format("orc").save("test_orc2")

nanyim_alain · ‎06-08-2016

Hello, thank you.

Here it goes. And i also found parquet file. Currently I am also looking to save as csv file and text if possible.

Cordially

nanyim_alain · ‎06-09-2016

It's good I found !!!

thank you

Cloudera Community

Support Questions

hive context save file