Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use Binary Data Type in Hive

avatar
Explorer

We have a use case that want to use the binary data type in Hive table:
1. In HDFS directory (e.g /data/work/hive/test/), we have several blob files which we want to store in Hive table (table1) as binary data type.
2. We have another Hive table (table2) storing regular CSV data and row number is the same as the number of above blob files.
3. How we can combine these two tables as a new table (table3 with both tables' columns and rows)?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Mr. Chen,

 

There is nothing that comes out-of-the-box for Hive that will achive this goal.  However, you should be able to create a custom UDF to load the data given the file path. If you do create one, this may be an interesting component for hive-contrib.


http://stackoverflow.com/questions/27402442/read-an-hdfs-file-from-a-hive-udf-execution-error-return...

 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CreatingCustom...

 

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html

 

https://github.com/apache/hive/tree/master/contrib

 

CREATE TABLE table2(
  rowid INT,
  firstname STRING,
  lastname STRING,
  hdfs_path STRING);

CREATE TABLE table3(
  rowid INT,
  mydata BINARY,
  firstname STRING,
  lastname STRING);

INSERT INTO table3 SELECT rowid, LOAD_DATA(hdfs_path) AS mydata, firstname, lastname FROM table2;

View solution in original post

10 REPLIES 10

avatar
New Contributor

Hello Chen,

 

Can you please through some insight how did you solved this problem?

 

I am also having a very similar requirement where we have jpeg, pdf files in HDFS and data in Hive tables. There is 1 image/pdf file per record and we want to link those in some way.

 

Thanks for your help.