Sqoop exercise ----------------- As user it1 create table, load data from a local file mysql> create table st1(id int, name varchar(16), ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); mysql> load data local infile '/home/it1/in2' into table st1 FIELDS TERMINATED BY ','; mysql> select count(*) from st1; +----------+ | count(*) | +----------+ | 5000 | +----------+ Create and show sqoop job [it1@sandbox ~]$ sqoop job --create incjob -- import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --incremental lastmodified -check-column ts --target-dir sqin -m 1 --merge-key id Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/24 00:18:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/01/24 00:18:58 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. [it1@sandbox ~]$ sqoop job --show incjob Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/24 00:22:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Enter password: Job: incjob Tool: import Options: ---------------------------- verbose = false db.connect.string = jdbc:mysql://localhost:3306/test codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 hbase.create.table = false db.require.password = true hdfs.append.dir = false hive.compute.stats.table = false db.table = st1 codegen.input.delimiters.escape = 0 import.fetch.size = null accumulo.create.table = false codegen.input.delimiters.enclose.required = false db.username = it1 reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false incremental.col = ts codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = false codegen.input.delimiters.enclose = 0 accumulo.batch.size = 10240000 hive.drop.delims = false codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 codegen.output.delimiters.field = 44 export.new.update = UpdateOnly incremental.mode = DateLastModified hdfs.file.format = TextFile codegen.compile.dir = /tmp/sqoop-it1/compile/ec873c170d0ae9cdc9b0d00a5aaae3b2 direct.import = false hdfs.target.dir = sqin hive.fail.table.exists = false merge.key.col = id jdbc.driver.class = com.mysql.jdbc.Driver db.batch = false [it1@sandbox ~]$ sqoop job --exec incjob Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/24 00:27:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Enter password: 16/01/24 00:28:03 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 16/01/24 00:28:03 INFO manager.SqlManager: Using default fetchSize of 1000 16/01/24 00:28:03 INFO tool.CodeGenTool: Beginning code generation 16/01/24 00:28:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:28:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:28:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.3.2.0-2950/hadoop-mapreduce Note: /tmp/sqoop-it1/compile/fc3084889fa7b5b7491cd61d3d0f849e/st1.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/01/24 00:28:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-it1/compile/fc3084889fa7b5b7491cd61d3d0f849e/st1.jar 16/01/24 00:28:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:28:09 INFO tool.ImportTool: Incremental import based on column ts 16/01/24 00:28:09 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:28:09.0' 16/01/24 00:28:09 INFO mapreduce.ImportJobBase: Beginning import of st1 16/01/24 00:28:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:28:10 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/01/24 00:28:10 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/01/24 00:28:12 INFO db.DBInputFormat: Using read commited transaction isolation 16/01/24 00:28:12 INFO mapreduce.JobSubmitter: number of splits:1 16/01/24 00:28:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453391437654_0007 16/01/24 00:28:14 INFO impl.YarnClientImpl: Submitted application application_1453391437654_0007 16/01/24 00:28:14 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1453391437654_0007/ 16/01/24 00:28:14 INFO mapreduce.Job: Running job: job_1453391437654_0007 16/01/24 00:28:23 INFO mapreduce.Job: Job job_1453391437654_0007 running in uber mode : false 16/01/24 00:28:23 INFO mapreduce.Job: map 0% reduce 0% 16/01/24 00:28:30 INFO mapreduce.Job: map 100% reduce 0% 16/01/24 00:28:31 INFO mapreduce.Job: Job job_1453391437654_0007 completed successfully 16/01/24 00:28:31 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=145750 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=153893 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4893 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4893 Total vcore-seconds taken by all map tasks=4893 Total megabyte-seconds taken by all map tasks=1223250 Map-Reduce Framework Map input records=5000 Map output records=5000 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=70 CPU time spent (ms)=2010 Physical memory (bytes) snapshot=150691840 Virtual memory (bytes) snapshot=841211904 Total committed heap usage (bytes)=133693440 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=153893 16/01/24 00:28:31 INFO mapreduce.ImportJobBase: Transferred 150.2861 KB in 22.651 seconds (6.6349 KB/sec) 16/01/24 00:28:31 INFO mapreduce.ImportJobBase: Retrieved 5000 records. 16/01/24 00:28:31 INFO tool.ImportTool: Saving incremental import state to the metastore 16/01/24 00:28:31 INFO tool.ImportTool: Updated data for job: incjob Now change some names: mysql> update st1 set name='mary' where id>4800; mysql> update st1 set name='bob' where id>4900; Run sqoop again: [it1@sandbox ~]$ sqoop job --exec incjob Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/24 00:35:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Enter password: 16/01/24 00:35:54 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 16/01/24 00:35:54 INFO manager.SqlManager: Using default fetchSize of 1000 16/01/24 00:35:54 INFO tool.CodeGenTool: Beginning code generation 16/01/24 00:35:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:35:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:35:54 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.3.2.0-2950/hadoop-mapreduce Note: /tmp/sqoop-it1/compile/e22a56612584e7acb2a4c398ea8d224d/st1.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/01/24 00:35:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-it1/compile/e22a56612584e7acb2a4c398ea8d224d/st1.jar 16/01/24 00:35:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:35:59 INFO tool.ImportTool: Incremental import based on column ts 16/01/24 00:35:59 INFO tool.ImportTool: Lower bound value: '2016-01-24 00:28:09.0' 16/01/24 00:35:59 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:35:59.0' 16/01/24 00:35:59 INFO mapreduce.ImportJobBase: Beginning import of st1 16/01/24 00:35:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM st1 AS t WHERE 1=0 16/01/24 00:36:00 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/01/24 00:36:00 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/01/24 00:36:02 INFO db.DBInputFormat: Using read commited transaction isolation 16/01/24 00:36:02 INFO mapreduce.JobSubmitter: number of splits:1 16/01/24 00:36:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453391437654_0008 16/01/24 00:36:02 INFO impl.YarnClientImpl: Submitted application application_1453391437654_0008 16/01/24 00:36:02 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1453391437654_0008/ 16/01/24 00:36:02 INFO mapreduce.Job: Running job: job_1453391437654_0008 16/01/24 00:36:11 INFO mapreduce.Job: Job job_1453391437654_0008 running in uber mode : false 16/01/24 00:36:11 INFO mapreduce.Job: map 0% reduce 0% 16/01/24 00:36:20 INFO mapreduce.Job: map 100% reduce 0% 16/01/24 00:36:20 INFO mapreduce.Job: Job job_1453391437654_0008 completed successfully 16/01/24 00:36:20 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=145787 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=6300 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=6241 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=6241 Total vcore-seconds taken by all map tasks=6241 Total megabyte-seconds taken by all map tasks=1560250 Map-Reduce Framework Map input records=200 Map output records=200 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=69 CPU time spent (ms)=1670 Physical memory (bytes) snapshot=143282176 Virtual memory (bytes) snapshot=825057280 Total committed heap usage (bytes)=133693440 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=6300 16/01/24 00:36:20 INFO mapreduce.ImportJobBase: Transferred 6.1523 KB in 21.349 seconds (295.0957 bytes/sec) 16/01/24 00:36:20 INFO mapreduce.ImportJobBase: Retrieved 200 records. 16/01/24 00:36:21 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/01/24 00:36:21 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/01/24 00:36:22 INFO input.FileInputFormat: Total input paths to process : 2 16/01/24 00:36:22 INFO mapreduce.JobSubmitter: number of splits:2 16/01/24 00:36:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453391437654_0009 16/01/24 00:36:23 INFO impl.YarnClientImpl: Submitted application application_1453391437654_0009 16/01/24 00:36:23 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1453391437654_0009/ 16/01/24 00:36:23 INFO mapreduce.Job: Running job: job_1453391437654_0009 16/01/24 00:36:36 INFO mapreduce.Job: Job job_1453391437654_0009 running in uber mode : false 16/01/24 00:36:36 INFO mapreduce.Job: map 0% reduce 0% 16/01/24 00:36:47 INFO mapreduce.Job: map 50% reduce 0% 16/01/24 00:36:49 INFO mapreduce.Job: map 100% reduce 0% 16/01/24 00:36:56 INFO mapreduce.Job: map 100% reduce 100% 16/01/24 00:36:57 INFO mapreduce.Job: Job job_1453391437654_0009 completed successfully 16/01/24 00:36:57 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=180999 FILE: Number of bytes written=798731 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=160486 HDFS: Number of bytes written=153993 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=18861 Total time spent by all reduces in occupied slots (ms)=6644 Total time spent by all map tasks (ms)=18861 Total time spent by all reduce tasks (ms)=6644 Total vcore-seconds taken by all map tasks=18861 Total vcore-seconds taken by all reduce tasks=6644 Total megabyte-seconds taken by all map tasks=4715250 Total megabyte-seconds taken by all reduce tasks=1661000 Map-Reduce Framework Map input records=5200 Map output records=5200 Map output bytes=170593 Map output materialized bytes=181005 Input split bytes=293 Combine input records=0 Combine output records=0 Reduce input groups=5000 Reduce shuffle bytes=181005 Reduce input records=5200 Reduce output records=5000 Spilled Records=10400 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=226 CPU time spent (ms)=7440 Physical memory (bytes) snapshot=545120256 Virtual memory (bytes) snapshot=2528014336 Total committed heap usage (bytes)=398983168 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=160193 File Output Format Counters Bytes Written=153993 16/01/24 00:36:57 INFO tool.ImportTool: Saving incremental import state to the metastore 16/01/24 00:36:58 INFO tool.ImportTool: Updated data for job: incjob [it1@sandbox ~]$