Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Sqoop incremental.last.value not updated when incremental append to HBase

avatar
New Contributor

Hi,

 

On CentOS 6.5, I am using Sqoop Version 1.4.4-cdh5.1.0 and HBase Version 0.98.1-cdh5.1.0.

 

Here is my Sqoop job of incremental append from Intersystems Cache to HBase.

sudo -u hdfs sqoop job --create JobName \
-- \
import \
--verbose \
-m 1 \
--connect jdbc:Cache://xx.xx.xx.xx:1972/USER \
--driver com.intersys.jdbc.CacheDriver \
--username username \
--password password \
--column-family CFNAME \
--hbase-table TABLENAME \
--hbase-row-key NUM \
--query "select * from TABLENAME where \$CONDITIONS" \
--incremental append \
--check-column NUM \
--last-value 9

 

I execute it as

sudo -u hdfs sqoop job --exec JobName

 

I can see the new data rows appended successfully to the table in HBase. But Sqoop's "incremental.last.value" does not get updated to the new last-value. I tried it several times and found "incremental.last.value" never changed. It is 9 all the time as given during the sqoop job creation. Below is Sqoop output message.

14/11/20 15:28:33 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0018/
14/11/20 15:28:33 INFO mapreduce.Job: Running job: job_1416271983912_0018
14/11/20 15:28:45 INFO mapreduce.Job: Job job_1416271983912_0018 running in uber mode : false
14/11/20 15:28:45 INFO mapreduce.Job: map 0% reduce 0%
14/11/20 15:28:55 INFO mapreduce.Job: map 100% reduce 0%
14/11/20 15:28:55 INFO mapreduce.Job: Job job_1416271983912_0018 completed successfully
14/11/20 15:28:55 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=137110
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=4216
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4216
Total vcore-seconds taken by all map tasks=8432
Total megabyte-seconds taken by all map tasks=2158592
Map-Reduce Framework
Map input records=7
Map output records=7
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=20
CPU time spent (ms)=2600
Physical memory (bytes) snapshot=360058880
Virtual memory (bytes) snapshot=1601982464
Total committed heap usage (bytes)=792199168
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 24.6684 seconds (0 bytes/sec)
14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Retrieved 7 records.
14/11/20 15:28:55 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@5ac524dd
14/11/20 15:28:55 ERROR tool.ImportTool: Imported Failed: Can not create a Path from a null string
14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Closing connection

 

However, when I change the target to HDFS. Sqoop's "incremental.last.value" does update without any problem. Here is Sqoop's incremental job to HDFS.

sudo -u hdfs sqoop job --create jobname \
-- \
import \
--verbose \
-m 1 \
--connect jdbc:Cache://xx.xx.xx.xx:1972/USER \
--driver com.intersys.jdbc.CacheDriver \
--username username \
--password password \
--query "select * from tablename where \$CONDITIONS" \
--target-dir /user/hdfs/tablename \
--incremental append \
--check-column NUM \
--last-value 9

 

 

Below is Sqoop output message when incremental import to HDFS.

14/11/20 16:01:27 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0021/
14/11/20 16:01:27 INFO mapreduce.Job: Running job: job_1416271983912_0021
14/11/20 16:01:39 INFO mapreduce.Job: Job job_1416271983912_0021 running in uber mode : false
14/11/20 16:01:39 INFO mapreduce.Job:  map 0% reduce 0%
14/11/20 16:01:48 INFO mapreduce.Job:  map 100% reduce 0%
14/11/20 16:01:48 INFO mapreduce.Job: Job job_1416271983912_0021 completed successfully
14/11/20 16:01:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=111444
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=24
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3566
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=3566
                Total vcore-seconds taken by all map tasks=7132
                Total megabyte-seconds taken by all map tasks=1825792
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=27
                CPU time spent (ms)=1230
                Physical memory (bytes) snapshot=335708160
                Virtual memory (bytes) snapshot=1624330240
                Total committed heap usage (bytes)=792199168
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=24
14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Transferred 24 bytes in 23.3872 seconds (1.0262 bytes/sec)
14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Retrieved 2 records.
14/11/20 16:01:48 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@44f757b9
14/11/20 16:01:48 INFO util.AppendUtils: Appending to directory tablename
14/11/20 16:01:48 INFO util.AppendUtils: Using found partition 2
14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: _SUCCESS ignored
14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00002
14/11/20 16:01:48 DEBUG util.AppendUtils: Deleting temporary folder 20160124000000386_46839_rapid01.sun.roche.com_bcd0ee9d
14/11/20 16:01:48 INFO tool.ImportTool: Saving incremental import state to the metastore
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => 0
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => SQOOP_SESSIONS
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Creating job: jobname
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.tool with class schema => import
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.tool with class schema
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => import
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.property.set.id with class schema => 0
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.property.set.id with class schema
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => 0
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting bulk properties for class SqoopOptions
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property verbose with class SqoopOptions => true
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property verbose with class SqoopOptions
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => true
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property incremental.last.value with class SqoopOptions => 12.00
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property incremental.last.value with class SqoopOptions
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => 9.00
.
.
.
.
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Saving mapreduce.client.genericoptionsparser.used => true / null
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property mapreduce.client.genericoptionsparser.used with class config => true
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property mapreduce.client.genericoptionsparser.used with class config
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage:  => true
14/11/20 16:01:48 INFO tool.ImportTool: Updated data for job: jobname
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection

 

 

Apprently, comparing above two output messages, hsqldb did not invoke when execute incremental importing job to HBase. Such that "incremental.last.value" never got updated.  Is this a bug? Or did I miss anything else?  Thanks.

 

Who agreed with this topic