Following diagram depicts the simplified ingestion flow we are building to ingest data from different RDBS to Hive.
Step 1: Using JDBC connection to the data-source, source data is streamed and saved in a CSV file on HDFS using HDFS java API.Basically, execute a 'SELECT * ' query and each row is saved in CSV until the ResultSet is exhausted.
Step 2: Using LOAD DATA INPATH command, Hive table is populated using the CSV file created in Step 1.
We use JDBC ResultSet.getString() to get column data. This works fine for non-binary data.
But for BLOC,CLOB type columns, we cannot write column data into a text/CSV file.
My question is; is it possible to use OCR or AVRO format to handle binary columns? Does these formats support row-by-row writes?