Support Questions

Find answers, ask questions, and share your expertise

sqoop import of BLOB columns from oracle database

avatar
Contributor

I have to import table from oracle that contains both blob and clob

 

--map-column-java CLOB_column=String works fine

--map-column-java BLOB_column=String throws error

 

How do we import blob data type columns from oracle using sqoop ?

 

Query used:

 

sqoop import --connect 'jdbc:oracle:thin:@serveri_ip:port/database' --username <> -password <> --query "SELECT *  FROM table_name WHERE \$CONDITIONS" --map-column-java CMID=Integer,DATAPROP=String --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1

 

DATAPROP --> is the BLOB column

 

6 REPLIES 6

avatar
New Contributor

@sow 

 

Actually i guess that you have to map the blob datatype to binary instead of a string. Try the below command hope it works:

sqoop import --connect 'jdbc:oracle:thin:@serveri_ip:port/database' --username <> -password <> --query "SELECT *  FROM table_name WHERE \$CONDITIONS" --map-column-java CMID=Integer,DATAPROP=binary --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1

 

avatar
Contributor

@pavan_kumar 

 

I tried with --map-column-java column_name=binary /Binary

Below is the errors that i get : 

20/01/05 10:04:58 ERROR tool.ImportTool: Import failed: No ResultSet method for Java type binary

 

20/01/05 10:04:07 ERROR tool.ImportTool: Import failed: No ResultSet method for Java type Binary

 

i tried map-column-hive column_name=binary

The sqoop import was success but when i query the table in impala it shows the below error

AnalysisException: Unsupported type 'BINARY' in Table

All other columns are accessible but the blob one is corrupted and hence unable to query

 

 

 

 

 

avatar
Master Mentor

@sow 

 

Impala does not allow binary data. What you can do is use a serialize-deserialize methodology. This means you convert your image to a String format that still contains all the information necessary to transform it back. Once you need to retrieve an image on HDFS you will need to deserialize, meaning converting the string to the original format.

Found this example using Python it  would work like this:

 

import base64

def img_to_string(image_path):
with open(image_path, "rb") as imageFile:
image_string= base64.b64encode(imageFile.read())
print image_string

 

def string_to_img(image_string):
with open("new_image.png", "wb") as imageFile:
imageFile.write(str.decode('base64'))

 

 

 

avatar
Contributor

@Shelton 

 

Thank you for your inputs.

The above shows serialize-deserializing of a file that contains images.

The one I am trying is to import a column in oracle table to HDFS using sqoop.
Other documentations shows mapping the column to string using  --map-column-java as the solution.

But the same throws error:

Caused by: java.sql.SQLException: Invalid column type: getString/getNString not implemented for class oracle.jdbc.driver.T4CBlobAccessor

 

avatar
New Contributor

Hello, I am facing the same situation while exporting hdfs to Oracle BLOB columns. If you have come up with any workaround or solution, please post.  Thanks.

avatar

@sow 

 

I am also having the same issue, did you get any resolution for this issue?