Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SQOOP import of "image" data type into hive

avatar
Contributor

I am trying to import a table from MSSQL which has few columns with data type "Image"

The value in the columns are of binary format(LOBS) .

Ex

database type : MSSQL

column name : Settings

Data_type : image

value of the column:

( 0x1F8B0800000400EDBD07601C499625262FD7E0CC188CDE692EC1D69472329AB2A81CA6501320CCE74A10880604010ED9D....)

 

When I import using sqoop the COLUMNS is automatically taken as string in hive but the data is corrupted.

 

COMMAND USED :

 

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect 'jdbc:sqlserver://IP:PORT;database=DB;' --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select GUID,Name,cast(Settings AS binary) AS Settings FROM my_table_name where \$CONDITIONS; " --m 1

 

 

 

7 REPLIES 7

avatar
Contributor

@VidyaSargur 

 

Any sqoop expert who can throw some light on this issue ?

 

avatar
Community Manager

@EricL Is this something that you can help with?

Thanks,

Vidya



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Guru
@sow,

Sorry, did not noticed your question here. I am wondering if you have tried using "--map-column-hive" option of Sqoop, which is mentioned in the doc here:
https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_controlling_type_mapping

So in your case, the command will look like below:

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect 'jdbc:sqlserver://IP:PORT;database=DB;' --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select GUID,Name,Settings FROM my_table_name where \$CONDITIONS; " --m 1 --map-column-hive Settings=binary

Note that you do not need to CAST anymore from MSSQL side. See if it helps.

Cheers
Eric

avatar
Contributor

@EricL 

 

I tried the above. the map-column-hive is not making any impact.

the column data type is taken as string in hive after import and the data is still corrupted.

 

Using hive-import and hive-table with it helped in getting the data type as binary in hive. But the data is still corrupted

cmd used :
sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect <> --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --hive-import --hive-table testc --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select * FROM my_table where \$CONDITIONS; " --m 1 --map-column-hive Settings=binary

 

avatar
Contributor

@EricL 

 

I tried -map-column-java Settings=String and got the data type as string in hive.

however the data doesnt look corrupted but the hexadecimal representation of the value is not retained

 

cmd used : 

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect <> --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --table 'my_table' --as-parquetfile --delete-target-dir --target-dir /user/test/ -map-column-java Settings=String --m 1 

 

Result of above query: 

column data type in source : image

column value in source : 0xFFffDeDJBF.......dDf.      --> ( A hexadecimal value )

column data type in hive: string 

column value in hive: FFffDeDJBF.......dDf

(0x is not retained)

 

 

 

 

avatar
Explorer

@sow @EricL  I am getting same result as Sow is getting . ' 0x' is not retained. 

my case is exactly same . Any solutions you were able to find ? Thanks in advance. 

avatar
Community Manager

@DaveNepal Hello! Welcome to the Welcome to the Cloudera Community!

As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: