Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ingest data from JDBC connections to Hive : Handling binary columns

Highlighted

Ingest data from JDBC connections to Hive : Handling binary columns

Explorer

Following diagram depicts the simplified ingestion flow we are building to ingest data from different RDBS to Hive.

Step 1: Using JDBC connection to the data-source, source data is streamed and saved in a CSV file on HDFS using HDFS java API.Basically, execute a 'SELECT * ' query and each row is saved in CSV until the ResultSet is exhausted.

Step 2: Using LOAD DATA INPATH command, Hive table is populated using the CSV file created in Step 1.

We use JDBC ResultSet.getString() to get column data. This works fine for non-binary data.

But for BLOC,CLOB type columns, we cannot write column data into a text/CSV file.

My question is; is it possible to use OCR or AVRO format to handle binary columns? Does these formats support row-by-row writes?

87408-hive-jdbc.png

Don't have an account?
Coming from Hortonworks? Activate your account here