Support Questions

Find answers, ask questions, and share your expertise

SQOOP import of "image" data type into hive

Contributor

I am trying to import a table from MSSQL which has few columns with data type "Image"

The value in the columns are of binary format(LOBS) .

Ex

database type : MSSQL

column name : Settings

Data_type : image

value of the column:

( 0x1F8B0800000400EDBD07601C499625262FD7E0CC188CDE692EC1D69472329AB2A81CA6501320CCE74A10880604010ED9D....)

 

When I import using sqoop the COLUMNS is automatically taken as string in hive but the data is corrupted.

 

COMMAND USED :

 

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect 'jdbc:sqlserver://IP:PORT;database=DB;' --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select GUID,Name,cast(Settings AS binary) AS Settings FROM my_table_name where \$CONDITIONS; " --m 1

 

 

 

5 REPLIES 5

Contributor

@VidyaSargur 

 

Any sqoop expert who can throw some light on this issue ?

 

Community Manager

@EricL Is this something that you can help with?

Thanks,

Vidya



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

Guru
@sow,

Sorry, did not noticed your question here. I am wondering if you have tried using "--map-column-hive" option of Sqoop, which is mentioned in the doc here:
https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_controlling_type_mapping

So in your case, the command will look like below:

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect 'jdbc:sqlserver://IP:PORT;database=DB;' --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select GUID,Name,Settings FROM my_table_name where \$CONDITIONS; " --m 1 --map-column-hive Settings=binary

Note that you do not need to CAST anymore from MSSQL side. See if it helps.

Cheers
Eric

Contributor

@EricL 

 

I tried the above. the map-column-hive is not making any impact.

the column data type is taken as string in hive after import and the data is still corrupted.

 

Using hive-import and hive-table with it helped in getting the data type as binary in hive. But the data is still corrupted

cmd used :
sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect <> --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --hive-import --hive-table testc --as-parquetfile --delete-target-dir --target-dir '/user/test/' --query "select * FROM my_table where \$CONDITIONS; " --m 1 --map-column-hive Settings=binary

 

Contributor

@EricL 

 

I tried -map-column-java Settings=String and got the data type as string in hive.

however the data doesnt look corrupted but the hexadecimal representation of the value is not retained

 

cmd used : 

sqoop import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect <> --connection-manager 'org.apache.sqoop.manager.SQLServerManager' --username <> -password <> --table 'my_table' --as-parquetfile --delete-target-dir --target-dir /user/test/ -map-column-java Settings=String --m 1 

 

Result of above query: 

column data type in source : image

column value in source : 0xFFffDeDJBF.......dDf.      --> ( A hexadecimal value )

column data type in hive: string 

column value in hive: FFffDeDJBF.......dDf

(0x is not retained)

 

 

 

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.