About Aditya

Aditya · ‎01-26-2019

Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following. 1. Read the input csv file in SPARK and do the transformation of the data according to requirement. 2. Save the data back into an output csv file in HDFS 3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.

Aditya · ‎01-11-2017

Currently I was studying the compression codecs and found out the codec which compresses the most and the one which compresses the least also found out the codec which is the slowest among all the compression codecs available but I couldn't find out which is the fastest compression codec. In Tom White book only a reference is provided that LZO, LZ4 and SNAPPY is faster than GZIP there is no point which tells the fastest codec among the three. In Cloudera documentation also there is just an reference SNAPPY is faster than LZO but again it tells to do testing on data to find out the time taken by LZO and SNAPPY to compress and de-compress. On searching Google I found some documentation which claims that LZ4 is the fastest among the three and they did testing on some data, below is the location of the document. I am not sure about it as the authentication of the document cannot be verified. http://www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom210c So, can someone help me to identify which is the fastest compression codec between LZO, LZ4 and SNAPPY.

Aditya · ‎08-18-2016

Data can be directly read from the Oracle DB and write in the JSON format using SPARK and there is no need for SQOOP in between it. val DF1 = sqlContext.read.format("jdbc").option("url","<connection string>").option("dbtable","<table name>").option("user","<user name>").option("password","<password>").load() DF1.write.format("org.apache.spark.sql.json").save(<path>) This post is under the wrong section but I wanted to tell the way data can be directly loaded from Oracle DB and save as JSON format without SQOOP.

Online	Offline
Last Visited	‎05-11-2019 08:58 AM

Member Since	‎04-20-2016 01:28 PM
Last Visited	‎05-11-2019 08:58 AM
Posts	8
Kudos received	1

Cloudera Community

Re: How to insert data into Hive from SparkSQL

LZO, LZ4, SNAPPY - which is the fastest compressio...

Re: Sqoop to write JSON in HDFS