Support Questions
Find answers, ask questions, and share your expertise

DB2 Free Form Query Sqoop Import Split By Issues

Contributor
queryFile=query.hql
query=`cat $queryFile`
splitBy=itm_nbr
target_dir='/DI/hive/test/'
sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 -Dmapred.job.queue.name=di --connect jdbc:db2://soemthing.something.something.com:6070/DB2DSNP --username MYUSERNAME --password MYPWD --query "$query" --target-dir $target_dir --delete-target-dir --split-by $splitBy --fields-terminated-by '\t'
SELECT TRIM(T5935.T2580_GIN_ID) AS col1,
       TRIM(T5935.T2598_GLN_ID) AS col2,
       TRIM(T5935.T616_VBU_NBR) AS col3,
       T5935.T024_SOS_ITM_NBR AS col4,
       0 AS col5,
       TRIM(T5935.SLL_PRD_IDF) AS col6,
       TRIM(T2591.T2591_BAR_CD_ID) AS col7,
       T2580.T937_LWS_BRN_CD AS col8,
       T5935.T4480_PRD_CGY_NBR AS col9,
       (CASE WHEN T2582_SSA_OPR_GIN_STS_CD = 1 THEN 1 ELSE 0 END) AS col9,
       coalesce(ORD_LED_TM_NBR, 0) as col10,
       T2580.ADD_DT AS col11,
       CAST(T2580.UPD_DM AS DATE) AS col12
    FROM OURDATABASE.T5935_SSA_DAT_STS_RPR T5935
        INNER JOIN OURDATABASE.T2580_GIN_VBU T2580 ON T5935.T616_VBU_NBR = T2580.T616_VBU_NBR
                                               AND T5935.T2580_GIN_ID = T2580.T2580_GIN_ID
                                               AND T5935.T1989_TGT_CRY_CD = T2580.T1989_TGT_CRY_CD
        INNER JOIN OURDATABASE.T2591_GIN_BAR_CD T2591 ON T5935.T2580_GIN_ID = T2591.T2580_GIN_ID
                                                  AND T5935.T616_VBU_NBR = T2591.T616_VBU_NBR
                                                  AND T5935.T1989_TGT_CRY_CD = T2591.T1989_TGT_CRY_CD
        LEFT JOIN OURDATABASE.T2900_GIN_SOS_DTL T2900 ON T5935.T2580_GIN_ID = T2900.T2580_GIN_ID
                                                  AND T5935.T616_VBU_NBR = T2900.T616_VBU_NBR
    WHERE T5935.T1989_TGT_CRY_CD = 1
        AND T5935.LWS_CSM_UNT_IDC = 'Y'
        AND T5935.T024_SOS_ITM_NBR IS NOT NULL
	AND $CONDITIONS

Trying to run a Sqoop Import of a free form query from DB2. I can only get the command to work if I make the split-by attribute equal to an integer. If I give a column name, it throws the following error, which basically means it contextually doesn't understand the column name you're requesting (i've tried every column, including calling it t1.itm_nbr, 'itm_nbr', and a.itm_nbr (wrapping the whole thing in a subquery, a):

Error: java.io.IOException: SQLException in nextKeyValue
	at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=T1.ITM_NBR, DRIVER=4.16.53

That leads me to two theories that I can try:

  • Maybe I can split by a column number, instead of a name (possible/how?) Tried using split-by 1, but that just makes the bounding query do the min of 1 and max of 1, running with one mapper. Not of use. But at least query runs.
  • The column aliases I'm assigning are working in my query, but Sqoop doesn't appear to recognize them so maybe I should select them from a subquery, by name? <-- update: this does not work 😠

I'm open to anything those who sqoop from DB2 have to say to fix this.

1 REPLY 1

Expert Contributor
@Landon Robinson,

The reason you are not able to get the column name working with split-by is that the column name is not present in the select statement:

TRIM(T5935.T2580_GIN_ID) AS col1, TRIM(T5935.T2598_GLN_ID) AS col2, TRIM(T5935.T616_VBU_NBR) AS col3, T5935.T024_SOS_ITM_NBR AS col4, 0 AS col5, TRIM(T5935.SLL_PRD_IDF) AS col6, TRIM(T2591.T2591_BAR_CD_ID) AS col7, T2580.T937_LWS_BRN_CD AS col8, T5935.T4480_PRD_CGY_NBR AS col9, (CASE WHEN T2582_SSA_OPR_GIN_STS_CD = 1 THEN 1 ELSE 0 END) AS col9, coalesce(ORD_LED_TM_NBR, 0) as col10, T2580.ADD_DT AS col11, CAST(T2580.UPD_DM AS DATE) AS col12 FROM OURDATABASE.T5935_SSA_DAT_STS_RPR T5935

If you can bring the ITM_NBR in to this section and have it as it is and not with Alias name it should work.

Can you try this and let know

-Mahesh