28804
DISCUSSIONS
102195
MEMBERS
3161
ARTICLES
Created 11-21-2014 05:53 PM
Hi,
On CentOS 6.5, I am using Sqoop Version 1.4.4-cdh5.1.0 and HBase Version 0.98.1-cdh5.1.0.
Here is my Sqoop job of incremental append from Intersystems Cache to HBase.
sudo -u hdfs sqoop job --create JobName \ -- \ import \ --verbose \ -m 1 \ --connect jdbc:Cache://xx.xx.xx.xx:1972/USER \ --driver com.intersys.jdbc.CacheDriver \ --username username \ --password password \ --column-family CFNAME \ --hbase-table TABLENAME \ --hbase-row-key NUM \ --query "select * from TABLENAME where \$CONDITIONS" \ --incremental append \ --check-column NUM \ --last-value 9
I execute it as
sudo -u hdfs sqoop job --exec JobName
I can see the new data rows appended successfully to the table in HBase. But Sqoop's "incremental.last.value" does not get updated to the new last-value. I tried it several times and found "incremental.last.value" never changed. It is 9 all the time as given during the sqoop job creation. Below is Sqoop output message.
14/11/20 15:28:33 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0018/ 14/11/20 15:28:33 INFO mapreduce.Job: Running job: job_1416271983912_0018 14/11/20 15:28:45 INFO mapreduce.Job: Job job_1416271983912_0018 running in uber mode : false 14/11/20 15:28:45 INFO mapreduce.Job: map 0% reduce 0% 14/11/20 15:28:55 INFO mapreduce.Job: map 100% reduce 0% 14/11/20 15:28:55 INFO mapreduce.Job: Job job_1416271983912_0018 completed successfully 14/11/20 15:28:55 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=137110 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4216 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4216 Total vcore-seconds taken by all map tasks=8432 Total megabyte-seconds taken by all map tasks=2158592 Map-Reduce Framework Map input records=7 Map output records=7 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=20 CPU time spent (ms)=2600 Physical memory (bytes) snapshot=360058880 Virtual memory (bytes) snapshot=1601982464 Total committed heap usage (bytes)=792199168 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 24.6684 seconds (0 bytes/sec) 14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Retrieved 7 records. 14/11/20 15:28:55 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@5ac524dd 14/11/20 15:28:55 ERROR tool.ImportTool: Imported Failed: Can not create a Path from a null string 14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Closing connection
However, when I change the target to HDFS. Sqoop's "incremental.last.value" does update without any problem. Here is Sqoop's incremental job to HDFS.
sudo -u hdfs sqoop job --create jobname \ -- \ import \ --verbose \ -m 1 \ --connect jdbc:Cache://xx.xx.xx.xx:1972/USER \ --driver com.intersys.jdbc.CacheDriver \ --username username \ --password password \ --query "select * from tablename where \$CONDITIONS" \ --target-dir /user/hdfs/tablename \ --incremental append \ --check-column NUM \ --last-value 9
Below is Sqoop output message when incremental import to HDFS.
14/11/20 16:01:27 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0021/ 14/11/20 16:01:27 INFO mapreduce.Job: Running job: job_1416271983912_0021 14/11/20 16:01:39 INFO mapreduce.Job: Job job_1416271983912_0021 running in uber mode : false 14/11/20 16:01:39 INFO mapreduce.Job: map 0% reduce 0% 14/11/20 16:01:48 INFO mapreduce.Job: map 100% reduce 0% 14/11/20 16:01:48 INFO mapreduce.Job: Job job_1416271983912_0021 completed successfully 14/11/20 16:01:48 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=111444 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=24 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3566 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3566 Total vcore-seconds taken by all map tasks=7132 Total megabyte-seconds taken by all map tasks=1825792 Map-Reduce Framework Map input records=2 Map output records=2 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=27 CPU time spent (ms)=1230 Physical memory (bytes) snapshot=335708160 Virtual memory (bytes) snapshot=1624330240 Total committed heap usage (bytes)=792199168 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=24 14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Transferred 24 bytes in 23.3872 seconds (1.0262 bytes/sec) 14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Retrieved 2 records. 14/11/20 16:01:48 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@44f757b9 14/11/20 16:01:48 INFO util.AppendUtils: Appending to directory tablename 14/11/20 16:01:48 INFO util.AppendUtils: Using found partition 2 14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: _SUCCESS ignored 14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00002 14/11/20 16:01:48 DEBUG util.AppendUtils: Deleting temporary folder 20160124000000386_46839_rapid01.sun.roche.com_bcd0ee9d 14/11/20 16:01:48 INFO tool.ImportTool: Saving incremental import state to the metastore 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Creating job: jobname 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.tool with class schema => import 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.tool with class schema 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => import 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.property.set.id with class schema => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.property.set.id with class schema 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting bulk properties for class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property verbose with class SqoopOptions => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property verbose with class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property incremental.last.value with class SqoopOptions => 12.00 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property incremental.last.value with class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 9.00 . . . . 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Saving mapreduce.client.genericoptionsparser.used => true / null 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property mapreduce.client.genericoptionsparser.used with class config => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property mapreduce.client.genericoptionsparser.used with class config 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => true 14/11/20 16:01:48 INFO tool.ImportTool: Updated data for job: jobname 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection
Apprently, comparing above two output messages, hsqldb did not invoke when execute incremental importing job to HBase. Such that "incremental.last.value" never got updated. Is this a bug? Or did I miss anything else? Thanks.