Member since
04-20-2016
8
Posts
1
Kudos Received
0
Solutions
01-26-2019
08:07 AM
Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following. 1. Read the input csv file in SPARK and do the transformation of the data according to requirement. 2. Save the data back into an output csv file in HDFS 3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.
... View more
08-16-2017
03:07 AM
Great Suggestion! Thanks for that! But i would like to ask one question that, if i want to have a struct or array fields in target, then how i should transform the mysql data, so that it will fit in HCatalog schema. The need here is to just have nested data from other collection instead of foreign key representation. Currently we are using sqoop import only, and we are trying to modify the query so that it will be accepted by hcat schema. Thanks & Regards, Mahendra
... View more
01-12-2017
08:35 AM
Hi, The speed of the compression codec is only part of the story, you should also consider the support for the codec in different parts of the Hadoop stack. Gaining slightly faster compression at the expense of compatibility is probably not a good trade off. Snappy is supported by pretty much all of the stack for example, whereas LZ4 is not currently supported by Impala. If in doubt I would stick with Snappy since it is a reasonably fast and splittable codec. If performance is an issue you're likely to find greater benefit focusing on other parts of the stack rather than data compression. Regards, Jim
... View more