Support Questions

Find answers, ask questions, and share your expertise

Sqoop translates one column incorrectly when we import a table from SQLServer.

avatar
Contributor

Hello guys,

I'm trying to run full import from a table in SQL Server. But sqoop is interpreting the data type incorrectly.

 

Column in SQL Server: ErrorParams (nvarchar(max))
Sqoop translates it to ErrorParams varchar, which generates an error.

 

create table `dataengsandbox`.`task` (
`id` INT,
`recordid` int,
`state` int,
`prevstate` int,
`instanceid` int,
`parentjobid` int,
`closed` boolean,
`requeuecount` int,
`autorequeuecount` int,
`modifytime` string,
`submittime` string,
`starttime` string,
`endtime` string,
`createtime` string,
`timestamp` binary,
`pendingreason` int,
`failurereason` int,
`exitcode` int,
`failednodeid` int,
`errorcode` int,
`errorparams` varchar,
`output` string,
`executionfailureretrycount` int)
stored as parquet

 

I would expect it to translate it to ErrorParams varchar(64) instead, or another value.

FAILED: ParseException line 22:22 mismatched input ',' expecting ( near 'varchar' in primitive type specification

 

Is there a way to make sqoop cast it correctly?

 

Sqoop job:

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--query 'select * from Task where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Task \
--compress \
--compression-codec snappy \
--map-column-hive id=INT \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

1 ACCEPTED SOLUTION

avatar
Contributor

Made it work! 🙂

 

I had to use --map-column-hive 

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select * from Job where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Job \
--compress \
--compression-codec snappy \
--map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

View solution in original post

1 REPLY 1

avatar
Contributor

Made it work! 🙂

 

I had to use --map-column-hive 

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select * from Job where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Job \
--compress \
--compression-codec snappy \
--map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'