11-21-2014 05:53 PM
Hi,
On CentOS 6.5, I am using Sqoop Version 1.4.4-cdh5.1.0 and HBase Version 0.98.1-cdh5.1.0.
Here is my Sqoop job of incremental append from Intersystems Cache to HBase.
sudo -u hdfs sqoop job --create JobName \ -- \ import \ --verbose \ -m 1 \ --connect jdbc:Cache://xx.xx.xx.xx:1972/USER \ --driver com.intersys.jdbc.CacheDriver \ --username username \ --password password \ --column-family CFNAME \ --hbase-table TABLENAME \ --hbase-row-key NUM \ --query "select * from TABLENAME where \$CONDITIONS" \ --incremental append \ --check-column NUM \ --last-value 9
I execute it as
sudo -u hdfs sqoop job --exec JobName
I can see the new data rows appended successfully to the table in HBase. But Sqoop's "incremental.last.value" does not get updated to the new last-value. I tried it several times and found "incremental.last.value" never changed. It is 9 all the time as given during the sqoop job creation. Below is Sqoop output message.
14/11/20 15:28:33 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0018/ 14/11/20 15:28:33 INFO mapreduce.Job: Running job: job_1416271983912_0018 14/11/20 15:28:45 INFO mapreduce.Job: Job job_1416271983912_0018 running in uber mode : false 14/11/20 15:28:45 INFO mapreduce.Job: map 0% reduce 0% 14/11/20 15:28:55 INFO mapreduce.Job: map 100% reduce 0% 14/11/20 15:28:55 INFO mapreduce.Job: Job job_1416271983912_0018 completed successfully 14/11/20 15:28:55 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=137110 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4216 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4216 Total vcore-seconds taken by all map tasks=8432 Total megabyte-seconds taken by all map tasks=2158592 Map-Reduce Framework Map input records=7 Map output records=7 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=20 CPU time spent (ms)=2600 Physical memory (bytes) snapshot=360058880 Virtual memory (bytes) snapshot=1601982464 Total committed heap usage (bytes)=792199168 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 24.6684 seconds (0 bytes/sec) 14/11/20 15:28:55 INFO mapreduce.ImportJobBase: Retrieved 7 records. 14/11/20 15:28:55 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@5ac524dd 14/11/20 15:28:55 ERROR tool.ImportTool: Imported Failed: Can not create a Path from a null string 14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 15:28:55 DEBUG hsqldb.HsqldbJobStorage: Closing connection
However, when I change the target to HDFS. Sqoop's "incremental.last.value" does update without any problem. Here is Sqoop's incremental job to HDFS.
sudo -u hdfs sqoop job --create jobname \ -- \ import \ --verbose \ -m 1 \ --connect jdbc:Cache://xx.xx.xx.xx:1972/USER \ --driver com.intersys.jdbc.CacheDriver \ --username username \ --password password \ --query "select * from tablename where \$CONDITIONS" \ --target-dir /user/hdfs/tablename \ --incremental append \ --check-column NUM \ --last-value 9
Below is Sqoop output message when incremental import to HDFS.
14/11/20 16:01:27 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:8088/proxy/application_1416271983912_0021/ 14/11/20 16:01:27 INFO mapreduce.Job: Running job: job_1416271983912_0021 14/11/20 16:01:39 INFO mapreduce.Job: Job job_1416271983912_0021 running in uber mode : false 14/11/20 16:01:39 INFO mapreduce.Job: map 0% reduce 0% 14/11/20 16:01:48 INFO mapreduce.Job: map 100% reduce 0% 14/11/20 16:01:48 INFO mapreduce.Job: Job job_1416271983912_0021 completed successfully 14/11/20 16:01:48 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=111444 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=24 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3566 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3566 Total vcore-seconds taken by all map tasks=7132 Total megabyte-seconds taken by all map tasks=1825792 Map-Reduce Framework Map input records=2 Map output records=2 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=27 CPU time spent (ms)=1230 Physical memory (bytes) snapshot=335708160 Virtual memory (bytes) snapshot=1624330240 Total committed heap usage (bytes)=792199168 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=24 14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Transferred 24 bytes in 23.3872 seconds (1.0262 bytes/sec) 14/11/20 16:01:48 INFO mapreduce.ImportJobBase: Retrieved 2 records. 14/11/20 16:01:48 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@44f757b9 14/11/20 16:01:48 INFO util.AppendUtils: Appending to directory tablename 14/11/20 16:01:48 INFO util.AppendUtils: Using found partition 2 14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: _SUCCESS ignored 14/11/20 16:01:48 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00002 14/11/20 16:01:48 DEBUG util.AppendUtils: Deleting temporary folder 20160124000000386_46839_rapid01.sun.roche.com_bcd0ee9d 14/11/20 16:01:48 INFO tool.ImportTool: Saving incremental import state to the metastore 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Creating job: jobname 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.tool with class schema => import 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.tool with class schema 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => import 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property sqoop.property.set.id with class schema => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property sqoop.property.set.id with class schema 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 0 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting bulk properties for class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property verbose with class SqoopOptions => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property verbose with class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property incremental.last.value with class SqoopOptions => 12.00 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property incremental.last.value with class SqoopOptions 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => 9.00 . . . . 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Saving mapreduce.client.genericoptionsparser.used => true / null 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Setting property mapreduce.client.genericoptionsparser.used with class config => true 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Job: jobname; Getting property mapreduce.client.genericoptionsparser.used with class config 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: => true 14/11/20 16:01:48 INFO tool.ImportTool: Updated data for job: jobname 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 14/11/20 16:01:48 DEBUG hsqldb.HsqldbJobStorage: Closing connection
Apprently, comparing above two output messages, hsqldb did not invoke when execute incremental importing job to HBase. Such that "incremental.last.value" never got updated. Is this a bug? Or did I miss anything else? Thanks.
08-15-2016 11:47 AM
Oh, There you go. Follow along.
http://www.yourtechchick.com/hadoop/hive/step-step-guide-sqoop-incremental-imports/
Hope that helps!
06-05-2017 10:53 AM
Even Im facing with same issue. Could please tell how did you resolved this issue?
11-27-2018 09:57 AM
Hi, Im facing this very same problem here. I also replicated this condition, as suggested by dwen, and when I change the destination to a hdfs file it works!
It seems that sqoop (tool.ImportTool) is unable to write the metadata when hbase is involved, so the incremental import state gets lost. That is my suspiction because the only difference reported by the sqoop's mapreduce job is an error when it finishes reporting that:
"ERROR tool.ImportTool: Import failed: Can not create a Path from a null string"
Do I need to grant any kind of permission to anyone for this to work?
I would appreciate any hint or guess here...
11-28-2018 01:34 PM
Hi, for those who may be facing the same issue, I may have found its solution:
It seem that the parameter
--target-dir /some/dir/on/hdfs
Is compulsory when trying to make those incremental jobs work with HBase. When I put a
--target-dir /tmp/jobname
It worked fine.
I hope it helps!
11-29-2018 12:29 AM
Hi,
This is a bug in Sqoop that it doesn't update the last-value when Hbase is involved. Untill there is a permanent fix, you need to add --target-dir in the sqoop command to make it run fine.
I guess you can put --target-dir to any HDFS path.
Hope this helps.
Regards
Nitish