Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Sqoop translates one column incorrectly when we import a table from SQLServer.

avatar
Contributor

Hello guys,

I'm trying to run full import from a table in SQL Server. But sqoop is interpreting the data type incorrectly.

 

Column in SQL Server: ErrorParams (nvarchar(max))
Sqoop translates it to ErrorParams varchar, which generates an error.

 

create table `dataengsandbox`.`task` (
`id` INT,
`recordid` int,
`state` int,
`prevstate` int,
`instanceid` int,
`parentjobid` int,
`closed` boolean,
`requeuecount` int,
`autorequeuecount` int,
`modifytime` string,
`submittime` string,
`starttime` string,
`endtime` string,
`createtime` string,
`timestamp` binary,
`pendingreason` int,
`failurereason` int,
`exitcode` int,
`failednodeid` int,
`errorcode` int,
`errorparams` varchar,
`output` string,
`executionfailureretrycount` int)
stored as parquet

 

I would expect it to translate it to ErrorParams varchar(64) instead, or another value.

FAILED: ParseException line 22:22 mismatched input ',' expecting ( near 'varchar' in primitive type specification

 

Is there a way to make sqoop cast it correctly?

 

Sqoop job:

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--query 'select * from Task where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Task \
--compress \
--compression-codec snappy \
--map-column-hive id=INT \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

1 ACCEPTED SOLUTION

avatar
Contributor

Made it work! 🙂

 

I had to use --map-column-hive 

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select * from Job where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Job \
--compress \
--compression-codec snappy \
--map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

View solution in original post

1 REPLY 1

avatar
Contributor

Made it work! 🙂

 

I had to use --map-column-hive 

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select * from Job where id=0 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Job \
--compress \
--compression-codec snappy \
--map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

Labels