Hi! I need to load Oracle data via Sqoop. The data must be available in Hive / Impala and Spark batch, the best way is parquet/snappy format. In Hive, data must be partitioned by key date (like 2018-01-01). Once a day, I'm going to delete some partitions (a date window) and repeat the process, run Sqoop again, and add new data with the days that have been removed and more new records. What would be the best way to do this? I tried several ways without success. Thx a lot guys!
... View more