Created on 08-17-2016 09:16 AM - edited 09-16-2022 03:35 AM
We have a use case that want to use the binary data type in Hive table:
1. In HDFS directory (e.g /data/work/hive/test/), we have several blob files which we want to store in Hive table (table1) as binary data type.
2. We have another Hive table (table2) storing regular CSV data and row number is the same as the number of above blob files.
3. How we can combine these two tables as a new table (table3 with both tables' columns and rows)?
Created 08-26-2016 08:52 AM
Mr. Chen,
There is nothing that comes out-of-the-box for Hive that will achive this goal. However, you should be able to create a custom UDF to load the data given the file path. If you do create one, this may be an interesting component for hive-contrib.
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html
https://github.com/apache/hive/tree/master/contrib
CREATE TABLE table2( rowid INT, firstname STRING, lastname STRING, hdfs_path STRING); CREATE TABLE table3( rowid INT, mydata BINARY, firstname STRING, lastname STRING); INSERT INTO table3 SELECT rowid, LOAD_DATA(hdfs_path) AS mydata, firstname, lastname FROM table2;
Created 05-01-2018 11:35 PM
Hello Chen,
Can you please through some insight how did you solved this problem?
I am also having a very similar requirement where we have jpeg, pdf files in HDFS and data in Hive tables. There is 1 image/pdf file per record and we want to link those in some way.
Thanks for your help.