Created 09-23-2016 07:39 AM
HDP-2.5.0.0 using Ambari 2.4.0.1(both upgraded from HDP 2.4 installations)
I had used the following command to import a SQL Server table in a HCatalog MANAGED table in ORC format :
sqoop import --null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --num-mappers 8 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database VERA_ODP_DW_dbo --hcatalog-table DimSnapshot --create-hcatalog-table --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="ZLIB")' --validate --connect 'jdbc:sqlserver://server_name;database=ODP_DW' --username uname --password passwd --table DimSnapshot -- --schema dbo 2>&1| tee -a ODP_DW_dbo.DimSnapshot.log
Now, I wish to do an incremental load IN A SINGLE STEP but I am facing the following challenges(I am testing for just one row to start with) :
sqoop import --null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database VERA_ODP_DW_dbo --hcatalog-table dimsnapshot --incremental append --table dimsnapshot --check-column SnapShot_Id --last-value 10456476 --connect 'jdbc:sqlserver://server_name;database=ODP_DW' --username uname --password passwd -- --schema dbo Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/09/22 16:36:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245 16/09/22 16:36:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/09/22 16:36:47 WARN tool.BaseSqoopTool: Output field/record delimiter options are not useful in HCatalog jobs for most of the output types except text based formats is text. It is better to use --hive-import in those cases. For non text formats, Append mode for imports is not compatible with HCatalog. Please remove the parameter--append-mode
-bash-4.2$ sqoop import --null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --incremental append --table dimsnapshot --check-column SnapShot_Id --last-value 10456476 --hive-import --hive-table vera_odp_dw_dbo.dimsnapshot --map-column-hive ENGINE_RUNTIME_UNIT=binary --connect 'jdbc:sqlserver://server_name;database=ODP_DW' --username uname --password passwd --table dimsnapshot -- --schema dbo Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/09/23 09:11:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245 16/09/23 09:11:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/09/23 09:11:44 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 16/09/23 09:11:44 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 16/09/23 09:11:44 INFO manager.SqlManager: Using default fetchSize of 1000 16/09/23 09:11:44 INFO manager.SQLServerManager: We will use schema dbo 16/09/23 09:11:44 INFO tool.CodeGenTool: Beginning code generation 16/09/23 09:11:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [dbo].[dimsnapshot] AS t WHERE 1=0 16/09/23 09:11:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce Note: /tmp/sqoop-ojoqcu/compile/61154b6349bbdd9a60aba2b4a1d4a919/dimsnapshot.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/09/23 09:11:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/61154b6349bbdd9a60aba2b4a1d4a919/dimsnapshot.jar 16/09/23 09:11:48 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX([SnapShot_Id]) FROM [dbo].[dimsnapshot] 16/09/23 09:11:48 INFO tool.ImportTool: Incremental import based on column [SnapShot_Id] 16/09/23 09:11:48 INFO tool.ImportTool: Lower bound value: 10456476 16/09/23 09:11:48 INFO tool.ImportTool: Upper bound value: 10456477 16/09/23 09:11:48 INFO mapreduce.ImportJobBase: Beginning import of dimsnapshot 16/09/23 09:11:48 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/ 16/09/23 09:11:48 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050 16/09/23 09:11:49 INFO client.AHSProxy: Connecting to Application History server at l4373t.sss.com/138.106.5.5:10200 16/09/23 09:11:50 INFO db.DBInputFormat: Using read commited transaction isolation 16/09/23 09:11:50 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SnapShot_Id]), MAX([SnapShot_Id]) FROM [dbo].[dimsnapshot] WHERE ( [SnapShot_Id] > 10456476 AND [SnapShot_Id] <= 10456477 ) 16/09/23 09:11:50 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 10456477 to: 10456477 16/09/23 09:11:50 INFO mapreduce.JobSubmitter: number of splits:1 16/09/23 09:11:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474453939755_0013 16/09/23 09:11:51 INFO impl.YarnClientImpl: Submitted application application_1474453939755_0013 16/09/23 09:11:51 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1474453939755_0013/ 16/09/23 09:11:51 INFO mapreduce.Job: Running job: job_1474453939755_0013 16/09/23 09:11:58 INFO mapreduce.Job: Job job_1474453939755_0013 running in uber mode : false 16/09/23 09:11:58 INFO mapreduce.Job: map 0% reduce 0% 16/09/23 09:12:04 INFO mapreduce.Job: map 100% reduce 0% 16/09/23 09:12:04 INFO mapreduce.Job: Job job_1474453939755_0013 completed successfully 16/09/23 09:12:04 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=164186 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=131 HDFS: Number of bytes written=217 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3973 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3973 Total vcore-milliseconds taken by all map tasks=3973 Total megabyte-milliseconds taken by all map tasks=16273408 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=131 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=71 CPU time spent (ms)=1440 Physical memory (bytes) snapshot=234348544 Virtual memory (bytes) snapshot=5465473024 Total committed heap usage (bytes)=217055232 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=217 16/09/23 09:12:04 INFO mapreduce.ImportJobBase: Transferred 217 bytes in 16.2893 seconds (13.3216 bytes/sec) 16/09/23 09:12:04 INFO mapreduce.ImportJobBase: Retrieved 1 records. 16/09/23 09:12:04 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners 16/09/23 09:12:04 INFO util.AppendUtils: Appending to directory dimsnapshot 16/09/23 09:12:04 INFO util.AppendUtils: Using found partition 4 16/09/23 09:12:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [dbo].[dimsnapshot] AS t WHERE 1=0 16/09/23 09:12:04 WARN hive.TableDefWriter: Column od_import_dt had to be cast to a less precise type in Hive 16/09/23 09:12:04 WARN hive.TableDefWriter: Column snapshot_export_dt had to be cast to a less precise type in Hive 16/09/23 09:12:04 WARN hive.TableDefWriter: Column INSERTION_DATE had to be cast to a less precise type in Hive 16/09/23 09:12:04 WARN hive.TableDefWriter: Column utcDate had to be cast to a less precise type in Hive 16/09/23 09:12:04 INFO hive.HiveImport: Loading uploaded data into Hive 16/09/23 09:12:04 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist 16/09/23 09:12:04 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties OK Time taken: 2.031 seconds FAILED: SemanticException [Error 30019]: The file that you are trying to load does not match the file format of the destination table. Destination table is stored as ORC but the file being loaded is not a valid ORC file. -bash-4.2$
How shall I proceed ?
Created 10-06-2016 04:32 PM
Hi "Kaliyug Antagonist!!"
Try setting the sqoop import as a sqoop job. The incremental data import is supported via sqoop job.. and not directly via sqoop import.
check out the link for more examples
Hopefully this helps out.
Created 10-06-2016 04:32 PM
Hi "Kaliyug Antagonist!!"
Try setting the sqoop import as a sqoop job. The incremental data import is supported via sqoop job.. and not directly via sqoop import.
check out the link for more examples
Hopefully this helps out.
Created 02-06-2018 08:34 PM
Moderators: please either correct this post or delete it , its giving wrong information . This post is saying sqoop incremental import is possible into hive ORC table which is incorrect and currently not supported .
Created on 10-21-2019 03:25 PM - edited 10-21-2019 03:31 PM
I could do the Map process on OrcFile, but Reduce fails with ‘.Can’t input data OCR[]’ error.
Do you have some official documentation that confirm that OCR file does not work with incremental lastmodified import?