Support Questions

Find answers, ask questions, and share your expertise

SQOOP import of "image" data type into hive

avatar
Explorer

I found the old thread that is exactly similar to the issue I am having .

https://community.cloudera.com/t5/Support-Questions/SQOOP-import-of-quot-image-quot-data-type-into-h...

column data type in source [sql server] : imagecolumn value in source : 0xFFffDeDJBF.......dDf.      --> ( A hexadecimal value )

column data type in hive: string 

column value in hive: FFffDeDJBF.......dDf (0x is not retained)

Also I have tried binary data type in hive. yet the value would not match with what oracle has. 

Is there a data type hive table can handle and retain the  'Image' datatype orginal value in SQL source while sqoop importing into the hive table ? 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

In Hive, there is no specific built-in data type that directly corresponds to the SQL Image data type for retaining the original binary image data from an SQL source. Hive primarily deals with structured data types like strings, numbers, and complex types such as arrays, maps, and structs.

To store binary data like images in Hive, you typically use the BINARY data type or store them as STRING data, especially if you want to represent them in base64-encoded format. However, neither of these data types inherently retains the original binary value as is. You would need to handle the encoding and decoding of the binary data yourself.

Here's an example of how you might store binary image data in Hive using the BINARY data type:

 

CREATE TABLE image_data (
    image_id INT,
    image_content BINARY
);

 

When you insert data into this table, you would need to encode the binary image data into a binary format suitable for storage in Hive.

If retaining the original binary image data in its original format is crucial, you may want to consider other data storage solutions that are specifically designed for binary data, such as a distributed file system or binary data storage services. Hive, being primarily designed for structured data, may not be the best choice for this use case if you need to maintain the exact original binary data without encoding or modification.



View solution in original post

5 REPLIES 5

avatar
Community Manager

@DaveNepal, Welcome to our community! To help you get the best possible answer, I have tagged our Sqoop experts @ChethanYM @mszurap @ShankerSharma  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Explorer

 @ChethanYM @mszurap @ShankerSharma  

Can you please help ?

avatar
Master Collaborator

In Hive, there is no specific built-in data type that directly corresponds to the SQL Image data type for retaining the original binary image data from an SQL source. Hive primarily deals with structured data types like strings, numbers, and complex types such as arrays, maps, and structs.

To store binary data like images in Hive, you typically use the BINARY data type or store them as STRING data, especially if you want to represent them in base64-encoded format. However, neither of these data types inherently retains the original binary value as is. You would need to handle the encoding and decoding of the binary data yourself.

Here's an example of how you might store binary image data in Hive using the BINARY data type:

 

CREATE TABLE image_data (
    image_id INT,
    image_content BINARY
);

 

When you insert data into this table, you would need to encode the binary image data into a binary format suitable for storage in Hive.

If retaining the original binary image data in its original format is crucial, you may want to consider other data storage solutions that are specifically designed for binary data, such as a distributed file system or binary data storage services. Hive, being primarily designed for structured data, may not be the best choice for this use case if you need to maintain the exact original binary data without encoding or modification.



avatar
Explorer

Thank you for sharing helpful information !!

avatar
Community Manager

@DaveNepal, Thank you for your participation in the Cloudera Community. I'm happy to see you resolved your issue. Could you please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: