Support Questions

Warren · ‎07-08-2014

com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting from a character string to uniqueidentifier

Hi Folks,

I'm attempting to use sqoop2 from Hue to import a single column from one database table in MS SQL Server. That column is defined as 'nvarchar(max)', and contains a json document - which starts and ends with curly braces {...}.

The sqoop2 job fails with: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting from a character string to uniqueidentifier

I suspect it's trying to convert the string value to a UID simply because it starts and ends with curly braces, buy hey, I'm just guessing.

I see in the sqoop (version 1) docs that you can override the default conversions using:

Table�3.�Parameters for overriding mapping

Argument Description

--map-column-java <mapping>	Override mapping from SQL to Java type for configured columns.
--map-column-hive <mapping>	Override mapping from SQL to Hive type for configured columns.

How can I use these overrides in sqoop2 via Hue? Or another way? Or am I off on the wrong path here?

Thanks,

ws

abe · ‎07-09-2014

Could you try adding a column to partition by (that isn't of type uniqueidentifier)? If no partition column is provided, Sqoop will automatically split by the primary key, which is AssetId in this case. You should be able to split by text and date types worst case.

View solution in original post

abe · ‎07-08-2014

Hey there,

I don't think there's a good way of doing this in Sqoop2 yet (https://issues.apache.org/jira/browse/SQOOP-1074).

However, it seems the conversion is failing in the driver and not the connector.

Could you provide the schema of your table?

Warren · ‎07-09-2014

Sure, here's the MS SQL create script:

CREATE TABLE [dbo].[Assets](
   [AssetId] [uniqueidentifier] NOT NULL,
   [Value] [nvarchar](max) NULL,
   [LastModifiedDate] [datetime] NOT NULL,
   [LastSyncronizedDate] [datetime] NULL,
   [SyncId] [nvarchar](128) NULL,
CONSTRAINT [PK_dbo.Assets] PRIMARY KEY CLUSTERED
(
   [AssetId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

abe · ‎07-09-2014

Could you try adding a column to partition by (that isn't of type uniqueidentifier)? If no partition column is provided, Sqoop will automatically split by the primary key, which is AssetId in this case. You should be able to split by text and date types worst case.

Warren · ‎07-09-2014

Yup, that did it. Thanks Abe!

It created a bunch of files, named part-m-00001, part-m-00002, etc. What I really want is a single file, so I can easily create a table from it. Of course I can hack the thing into one file with the hadoop fs -cat command, but is there an easier way to import the data sqoop2 retrieved?

abe · ‎07-09-2014

Warren,

Glad that worked. I'm not really sure if Sqoop2 has a feature to do that. You might be able to reduce the number of mappers down to 1 using throttlers. When using Hue or the Sqoop2 client, the number of throttlers during the extraction phase and load phase can be numerically defined. If left blank or 0 is provided, then the execution framework may ignore it. If you throttle the number of extractors to 1, then you'll have 1 mapper perform the extraction. If you throttle the number of loaders to 1, you'll have at most 1 loader write to HDFS.

If that doesn't work, your best bet is performing the concatenation yourself as you've described.

Warren · ‎07-09-2014

Thanks Abe. I assume I'd have to do this from the sqoop2 command line? I don't see any mapper or loader options in Hue.

Another minor snag: The text files sqoop2 created contain the correct data, but strings are single-quoted, and when I import the file(s) the single quote becomes part of the data. How would I get around that?

abe · ‎07-09-2014

It actually should be in Hue when you're creating or editing the job. What version of Sqoop2 are you using?

The quoting is unfortunately not configurable. I'm working something upstream (https://issues.apache.org/jira/browse/SQOOP-777), which should lend itself to improve issues of this nature.

Warren · ‎07-09-2014

Yup you're right - sorry, I was looking at the table creation pages, not the sqoop import pages.

The SQOOP-777 issue seems to reference column and record limiters. Is my single-quote problem one of these? I would think not, since a single quote is being inserted at the beginning AND the end of the column value. It's behaving like it's enclosing the column value in single quotes, rather than a field or record delimiter.

abe · ‎07-09-2014

Indeed. SQOOP-777 is the beginning. We're working on it!