Created 03-07-2019 09:09 AM
Hi,
Is there is any advantage in importing data from DB2 to hive directly using Sqoop compared to importing to HDFS first and then loading to Hive table? Which is recommeded or preferred approach?
Thanks
Created 03-07-2019 08:36 PM
Created on 03-12-2019 09:33 AM - edited 03-12-2019 10:51 AM
Thanks for the response.
I want to store them as parquet format. Sqoop supports parquet but hive does not recognize those parquet files. So in both options, I have to sqoop them in text format and then move to another hive table in parquet format.
option 1 - sqoop to hdfs in text format (hdfs location is hive external table). Then insert into hive table with parition and paquet format
option 2 - sqoop to hive internal table in text format, iinsert into hive table with parition and paquet format
Based on your response, hive part is done after import to hdfs, I think I can avoid that extra work (time) with option 1. Agree?
Created 03-13-2019 07:00 PM