Created 01-23-2017 04:11 PM
This is very superb article and any I have following clarifications?
Clarification 1:-base_table is not a external table.How we are loading the data into base_table is not clear during first run is not clear?could you please provide input on this?we are not using any load statement or insert into statement?
I think after using below statement.Manually we have to load the data from files present in path:- /user/hive/incremental_table/incremental_table into table base_table using load data statement
sqoop import --connect jdbc:teradata://{host name or ip address}/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table SOURCE_TBL --target-dir /user/hive/incremental_table -m 1
Clarification 2:-During first run only base_table will be loaded and there is no need to implement the Reconcile,compact and Purge process since we do not have incremental data.Please correct me if i am wrong
Created 01-23-2017 05:15 PM
@vamsi valiveti In the Reconcile step an external hive table is created that points to the location in HDFS. This will create a base_table that contains the data imported by sqoop.
On Clarification 2, you are correct. The base_table is only to import the initial data present in the source table. The rest of the data will be loaded incrementally.
Also see this question for another suggested approach. I have not tried this one yet but thought I would mention it:
Created 01-24-2017 09:32 AM
Hi @cdubyThanks for input. For loading base table we are following:
sqoop import-->create external table-->load the data into base_table of ORC format from external.
one small clarification on this.When I run sqoop import it will create so many part-m* files+_success file+ other files since map reduce job will be triggered. But in external table consists of data related to only part-m* files,so in this case do I need to delete other files(_success file,job.xml) or do we have any other option for this to skip these files for external table?
sqoop import --connect jdbc:teradata://{host name}/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table SOURCE_TBL --target-dir /user/hive/base_table -m 1
CREATE TABLE base_table ( id STRING, field1 STRING, modified_date DATE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;