Created on 12-31-2019 03:34 AM - last edited on 12-31-2019 06:49 AM by VidyaSargur
I have to import table from oracle that contains both blob and clob
--map-column-java CLOB_column=String works fine
--map-column-java BLOB_column=String throws error
How do we import blob data type columns from oracle using sqoop ?
Query used:
sqoop import --connect 'jdbc:oracle:thin:@serveri_ip:port/database' --username <> -password <> --query "SELECT * FROM table_name WHERE \$CONDITIONS" --map-column-java CMID=Integer,DATAPROP=String --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1
DATAPROP --> is the BLOB column
Created 01-02-2020 05:52 AM
Actually i guess that you have to map the blob datatype to binary instead of a string. Try the below command hope it works:
sqoop import --connect 'jdbc:oracle:thin:@serveri_ip:port/database' --username <> -password <> --query "SELECT * FROM table_name WHERE \$CONDITIONS" --map-column-java CMID=Integer,DATAPROP=binary --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1
Created on 01-04-2020 11:33 PM - edited 01-05-2020 12:43 AM
I tried with --map-column-java column_name=binary /Binary
Below is the errors that i get :
20/01/05 10:04:58 ERROR tool.ImportTool: Import failed: No ResultSet method for Java type binary
20/01/05 10:04:07 ERROR tool.ImportTool: Import failed: No ResultSet method for Java type Binary
i tried map-column-hive column_name=binary
The sqoop import was success but when i query the table in impala it shows the below error
AnalysisException: Unsupported type 'BINARY' in Table
All other columns are accessible but the blob one is corrupted and hence unable to query
Created 01-05-2020 04:28 PM
Impala does not allow binary data. What you can do is use a serialize-deserialize methodology. This means you convert your image to a String format that still contains all the information necessary to transform it back. Once you need to retrieve an image on HDFS you will need to deserialize, meaning converting the string to the original format.
Found this example using Python it would work like this:
import base64
def img_to_string(image_path):
with open(image_path, "rb") as imageFile:
image_string= base64.b64encode(imageFile.read())
print image_string
def string_to_img(image_string):
with open("new_image.png", "wb") as imageFile:
imageFile.write(str.decode('base64'))
Created 01-05-2020 09:50 PM
Thank you for your inputs.
The above shows serialize-deserializing of a file that contains images.
The one I am trying is to import a column in oracle table to HDFS using sqoop.
Other documentations shows mapping the column to string using --map-column-java as the solution.
But the same throws error:
Caused by: java.sql.SQLException: Invalid column type: getString/getNString not implemented for class oracle.jdbc.driver.T4CBlobAccessor
Created 04-29-2020 09:27 AM
Hello, I am facing the same situation while exporting hdfs to Oracle BLOB columns. If you have come up with any workaround or solution, please post. Thanks.
Created 01-27-2021 01:54 AM