Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Does hive support Photo or images datatypes?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Christian Lunesa

Yes store images in binary format see below but retrieval is another process altogether

Create table image

beeline> ! connect  jdbc:hive2://texas.us.com:10000/default
Enter username for jdbc:hive2://texas.us.com:10000/default: hive
Enter password for jdbc:hive2://texas.us.com:10000/default: ****
Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205)
Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://texas.us.com:10000/default> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| geolocation    |
+----------------+--+
4 rows selected (2.397 seconds)
use geolocation;
Create table image(picture binary);
show tables;

Now to load image in it is as simple as the load data statement as:

hive> show databases;
OK
default
geolocation
Time taken: 1.955 seconds, Fetched: 4 row(s)
hive> use geolocation;
hive> load data local inpath '/tmp/photo.jpg' into table image; 

Now check the image

hive> select count(*) from image;
Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1524208851011_0003)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.87 s
--------------------------------------------------------------------------------
OK
19038
Time taken: 10.114 seconds, Fetched: 1 row(s) 

A select will return gabbled output, but the is loaded.

Store images/videos into Hadoop HDFS

hdfs dfs -put /src_image_file /dst_image_file 

And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/

http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/

https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding

Hope that helps

View solution in original post

3 REPLIES 3

avatar
Rising Star

You can use the BINARY data type in Hive. Store the photo/image as binary in the hive table. You may retrieve it back from the query results and display it in your frontend application.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-MiscTypes

avatar
Contributor

@pmohan 

 

Does this help with columns having data type - image ??

 

converting the image data type to binary again is leading to data corruption.

Please see the issue highlighted below : 

https://community.cloudera.com/t5/Support-Questions/SQOOP-import-of-quot-image-quot-data-type-into-h...

avatar
Master Mentor

@Christian Lunesa

Yes store images in binary format see below but retrieval is another process altogether

Create table image

beeline> ! connect  jdbc:hive2://texas.us.com:10000/default
Enter username for jdbc:hive2://texas.us.com:10000/default: hive
Enter password for jdbc:hive2://texas.us.com:10000/default: ****
Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205)
Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://texas.us.com:10000/default> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| geolocation    |
+----------------+--+
4 rows selected (2.397 seconds)
use geolocation;
Create table image(picture binary);
show tables;

Now to load image in it is as simple as the load data statement as:

hive> show databases;
OK
default
geolocation
Time taken: 1.955 seconds, Fetched: 4 row(s)
hive> use geolocation;
hive> load data local inpath '/tmp/photo.jpg' into table image; 

Now check the image

hive> select count(*) from image;
Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1524208851011_0003)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.87 s
--------------------------------------------------------------------------------
OK
19038
Time taken: 10.114 seconds, Fetched: 1 row(s) 

A select will return gabbled output, but the is loaded.

Store images/videos into Hadoop HDFS

hdfs dfs -put /src_image_file /dst_image_file 

And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/

http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/

https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding

Hope that helps