Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop Import BLOB from Oracle

avatar
New Contributor

Need help in extracting/parsing data from an Oracle column that is a BLOB data type.

We are using Sqoop to import all rows from an Oracle table into HDFS, then into Hive. One source table column is a BLOB data type. The column is populated by a field in an application that sits on top of Oracle. The application user can enter data into this field in the application. The field contains process instructions ( e.g. Step 1 - Open the door, etc) and an image that is attached to the field by the user. The image may be a pdf or a Rich Text Field file. The image file will be stored as a string of alphanumeric characters in the Oracle column. Some of these can be 20+MB. The problem is getting out the relevant text data and removing/ignoring the string representation of the image. I need the text data but not the image data in hdfs/hive. There can be multiple image/attachments  with textual data (user entered text) interspersed between the images, so truncating the field is not the best solution.

I can use a REGEXP_REPLACE in the Sqoop user-defined query with a pattern match on the image data, but the Oracle query executed by Sqoop hangs on the larger values. 

I tried using in-line-lob limit in Sqoop. This writes the large data values to a separate hdfs sub-directory as a file type of .lob.  The file appears to be a parquet format but I am not sure. 

I cannot find any documentation on how to make use of the .lob file. Ideally, I would like to strip out the data that is the image related string and tie the remaining text data back to the source record in hdfs. 

Any suggestions would be appreciated.

1 REPLY 1

avatar
Community Manager

@hbutler Welcome to our community! To help you get the best possible answer, I have tagged our Sqoop experts @mszurap @Saurabhatiyal  @ShankerSharma may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

 

 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: