Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading of Base table on FOUR STEP STRATEGY FOR INCREMENTAL UPDATES IN APACHE HIVE

Loading of Base table on FOUR STEP STRATEGY FOR INCREMENTAL UPDATES IN APACHE HIVE

Contributor

This is very superb article and any I have following clarifications?

https://community.hortonworks.com/articles/29410/four-step-strategy-for-incremental-updates-in-apac....

Clarification 1:-base_table is not a external table.How we are loading the data into base_table is not clear during first run is not clear?could you please provide input on this?we are not using any load statement or insert into statement?

I think after using below statement.Manually we have to load the data from files present in path:- /user/hive/incremental_table/incremental_table into table base_table using load data statement

sqoop import --connect jdbc:teradata://{host name or ip address}/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table SOURCE_TBL --target-dir /user/hive/incremental_table -m 1

Clarification 2:-During first run only base_table will be loaded and there is no need to implement the Reconcile,compact and Purge process since we do not have incremental data.Please correct me if i am wrong

2 REPLIES 2

Re: Loading of Base table on FOUR STEP STRATEGY FOR INCREMENTAL UPDATES IN APACHE HIVE

@vamsi valiveti In the Reconcile step an external hive table is created that points to the location in HDFS. This will create a base_table that contains the data imported by sqoop.

On Clarification 2, you are correct. The base_table is only to import the initial data present in the source table. The rest of the data will be loaded incrementally.

Also see this question for another suggested approach. I have not tried this one yet but thought I would mention it:

https://community.hortonworks.com/questions/58103/best-approach-for-incremental-sqoop-of-data-from-r...

Highlighted

Re: Loading of Base table on FOUR STEP STRATEGY FOR INCREMENTAL UPDATES IN APACHE HIVE

Contributor

Hi @cdubyThanks for input. For loading base table we are following:

sqoop import-->create external table-->load the data into base_table of ORC format from external.

one small clarification on this.When I run sqoop import it will create so many part-m* files+_success file+ other files since map reduce job will be triggered. But in external table consists of data related to only part-m* files,so in this case do I need to delete other files(_success file,job.xml) or do we have any other option for this to skip these files for external table?

sqoop import --connect jdbc:teradata://{host name}/Database=retail 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc 
--password dbc --table SOURCE_TBL --target-dir /user/hive/base_table -m 1


CREATE TABLE base_table (
        id STRING,
        field1 STRING,
        modified_date DATE)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS ORC;
Don't have an account?
Coming from Hortonworks? Activate your account here