Support Questions

nanyim_alain · ‎06-07-2016

Hello

I work with Hive- Context to load and manipulate data in my orc format. I would now please know how to save in the hdfs file the results of a sql queries ?

Help me please?

Here is my Hive-Context code, I would like to save the contents of hive_context in a file on my hdfs :

Thanks you in advance

from pyspark.sql import HiveContext from pyspark import SparkContext
sc =SparkContext()
hive_context = HiveContext(sc) qvol = hive_context.table("<bdd_name>.<table_name>") qvol.registerTempTable("qvol_temp") hive_context.sql("select * from qvol_temp limit 10").show()

rajkumar_singh · ‎06-07-2016

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

View solution in original post

rajkumar_singh · ‎06-07-2016

@alain TSAFACK please use saveAsHadoopFile while will write to hdfs

saveAsHadoopFile(<file-name>, <file output format>", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

or

hive_context.write.format("orc").save("test_orc")

nanyim_alain · ‎06-07-2016

Thank you. But here are the errors generated by the two attributes saveAsHadoopFile and .write.format:

This means that these two attributes are not recognized by HiveContext.

Thank you !!!

rajkumar_singh · ‎06-07-2016

saveAsHadoopFile is applicable for RDD and is not for DF, can you try hive_context.write.format("orc").save("test_orc")

nanyim_alain · ‎06-07-2016

I tried with hive_context.write.format("orc").save("test_orc") but I receive this error:

>>> hive_context.write.format("orc").save("hdfs://dev/datalake/app/des/dev/transformer/test_orc") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'HiveContext' object has no attribute 'write'

Thanks

rajkumar_singh · ‎06-07-2016

could you please modify your program in this way and see if you still see any excepton

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
sc = SparkContext()
sqlContext = HiveContext(sc)
sqlContext.sql("select * from default.aaa limit 3").write.format("orc").save("test_orc2")

nanyim_alain · ‎06-08-2016

Hello, thank you.

Here it goes. And i also found parquet file. Currently I am also looking to save as csv file and text if possible.

Cordially

nanyim_alain · ‎06-09-2016

It's good I found !!!

thank you

Cloudera Community

Support Questions

hive context save file

how to save streaming context in elasticsearch?

Uploading Files for Cloudera Support - alternate m...

Custom Context Enrichment for Ranger Tag based ABA...

Nifi: Context Parameter usage

Hive Context in CDH 5.3.x

Migrate Hive saved queries from Hue 3.9.0 of CDH 5...

How to compact ORC files on Hive.

Save Spark DataFrame table into Phoenix

Converting CSV Files to Apache Hive Tables with Ap...

NIFI - save file with date of file tittle