Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop with HANA Tables that were created with quoted column names

Highlighted

Sqoop with HANA Tables that were created with quoted column names

New Contributor

I have a table that is an SAP BW DSO generated table that I'm attempting to import into HDFS via sqoop. The table cannot be adjusted; something that I learned with HANA is that if a column is created with quotes, you always have to use quotes when specifying the column in a query. Which is the case with this table. Well when I attempt to run it, even though I'm quoting the column name and escaping out the "/" characters properly, I know this because sqoop gets as far as the mappers on the split, the HANA jdbc driver is erring out because sqoop is passing the SQL code that contains the split'ed query without the column names quoted. Not sure if there's anything that can really be done at the command level or not for this but I thought I'd try to reach out to the community.

I'd like to add that if I run the job with a single MR job (-m 1), it runs just fine. However, this is not practical due to the way that this table is handled on the HANA side. It would consume the entire amount of memory in collecting the dataset. I'm thinking that my only option is to setup a manual spit; multiple "-m 1" jobs and partitioning accordingly.

Error Message from one of the MR jobs:

2017-10-10 22:14:05,937 INFO [main] org.apache.hadoop.security.SecurityUtil: Updating Configuration
2017-10-10 22:14:05,986 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2017-10-10 22:14:06,050 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-10-10 22:14:06,050 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-10-10 22:14:06,064 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-10-10 22:14:06,064 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1506423185536_1267, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@1040be71)
2017-10-10 22:14:06,210 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: x.x.x.x:8032,x.x.x.x:8032, Ident: (owner=user, renewer=yarn, realUser=oozie/name, issueDate=1507688011606, maxDate=1508292811606, sequenceNumber=642, masterKeyId=212)
2017-10-10 22:14:06,215 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: MR_DELEGATION_TOKEN, Service: x.x.x.x:10020, Ident: (owner=user, renewer=yarn, realUser=oozie/name, issueDate=1507688011751, maxDate=1508292811751, sequenceNumber=100, masterKeyId=177)
2017-10-10 22:14:06,215 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: kms-dt, Service: 192.168.224.36:9292, Ident: (owner=user, renewer=yarn, realUser=oozie, issueDate=1507688011639, maxDate=1508292811639, sequenceNumber=1405, masterKeyId=16)
2017-10-10 22:14:06,215 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hdp01prod, Ident: (HDFS_DELEGATION_TOKEN token 32636 for user)
2017-10-10 22:14:06,217 WARN [main] org.apache.hadoop.security.token.Token: Cannot find class for token kind HIVE_DELEGATION_TOKEN
2017-10-10 22:14:06,217 WARN [main] org.apache.hadoop.security.token.Token: Cannot find class for token kind HIVE_DELEGATION_TOKEN
Kind: HIVE_DELEGATION_TOKEN, Service: hiveserver2ClientToken, Ident: 00 0c xxx
2017-10-10 22:14:06,217 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: TIMELINE_DELEGATION_TOKEN, Service: x.x.x.x:8188, Ident: (owner=name, renewer=yarn, realUser=oozie, issueDate=1507688011911, maxDate=1508292811911, sequenceNumber=25628, masterKeyId=178)
2017-10-10 22:14:06,311 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-10-10 22:14:06,471 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data/local01/hadoop/yarn/local/usercache/user/appcache/application_1506423185536_1267,/data/local02/hadoop/yarn/local/usercache/user/appcache/application_1506423185536_1267
2017-10-10 22:14:06,698 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-10-10 22:14:07,088 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-10-10 22:14:07,088 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-10-10 22:14:07,097 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2017-10-10 22:14:07,650 INFO [main] org.apache.sqoop.mapreduce.db.DBInputFormat: Using read commited transaction isolation
2017-10-10 22:14:07,790 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: 0CALDAY >= 20160101 AND 0CALDAY < 26192607
2017-10-10 22:14:07,874 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2017-10-10 22:14:07,875 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate]
2017-10-10 22:14:08,058 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: 0CALDAY >= 20160101 AND 0CALDAY < 26192607
2017-10-10 22:14:08,137 ERROR [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception: 
com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near ">=": line 1 col 137 (at pos 137)

here's my sqoop command:

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect "jdbc:sap://<hanahost>:30015/?currentschema=<schema>" \
--driver com.sap.db.jdbc.Driver \
--username HADOOP_PROD \
--password xxxxxx \
--query 'select * from "HANA_TABLE_NAME" > '20160000' AND $CONDITIONS' \
--target-dir "/dev/landing/SAPBWD/raw/HANA_TABLE_NAME" \
--split-by "0CALDAY" \
--delete-target-dir \
--boundary-query "SELECT MIN(CAST("0CALDAY" AS INTEGER)), MAX(CAST("0CALDAY" AS INTEGER)) FROM (select * from \"HANA_BW_TABLE\" WHERE \"0CALDAY\" > '20160000')" \
-m 10
Don't have an account?
Coming from Hortonworks? Activate your account here