Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Does hive support Photo or images datatypes?

avatar
 
1 ACCEPTED SOLUTION

avatar
Mentor

@Christian Lunesa

Yes store images in binary format see below but retrieval is another process altogether

Create table image

beeline> ! connect  jdbc:hive2://texas.us.com:10000/default
Enter username for jdbc:hive2://texas.us.com:10000/default: hive
Enter password for jdbc:hive2://texas.us.com:10000/default: ****
Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205)
Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://texas.us.com:10000/default> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| geolocation    |
+----------------+--+
4 rows selected (2.397 seconds)
use geolocation;
Create table image(picture binary);
show tables;

Now to load image in it is as simple as the load data statement as:

hive> show databases;
OK
default
geolocation
Time taken: 1.955 seconds, Fetched: 4 row(s)
hive> use geolocation;
hive> load data local inpath '/tmp/photo.jpg' into table image; 

Now check the image

hive> select count(*) from image;
Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1524208851011_0003)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.87 s
--------------------------------------------------------------------------------
OK
19038
Time taken: 10.114 seconds, Fetched: 1 row(s) 

A select will return gabbled output, but the is loaded.

Store images/videos into Hadoop HDFS

hdfs dfs -put /src_image_file /dst_image_file 

And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/

http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/

https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding

Hope that helps

View solution in original post

3 REPLIES 3

avatar
Contributor

You can use the BINARY data type in Hive. Store the photo/image as binary in the hive table. You may retrieve it back from the query results and display it in your frontend application.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-MiscTypes

avatar
Contributor

@pmohan 

 

Does this help with columns having data type - image ??

 

converting the image data type to binary again is leading to data corruption.

Please see the issue highlighted below : 

https://community.cloudera.com/t5/Support-Questions/SQOOP-import-of-quot-image-quot-data-type-into-h...

avatar
Mentor

@Christian Lunesa

Yes store images in binary format see below but retrieval is another process altogether

Create table image

beeline> ! connect  jdbc:hive2://texas.us.com:10000/default
Enter username for jdbc:hive2://texas.us.com:10000/default: hive
Enter password for jdbc:hive2://texas.us.com:10000/default: ****
Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205)
Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://texas.us.com:10000/default> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| geolocation    |
+----------------+--+
4 rows selected (2.397 seconds)
use geolocation;
Create table image(picture binary);
show tables;

Now to load image in it is as simple as the load data statement as:

hive> show databases;
OK
default
geolocation
Time taken: 1.955 seconds, Fetched: 4 row(s)
hive> use geolocation;
hive> load data local inpath '/tmp/photo.jpg' into table image; 

Now check the image

hive> select count(*) from image;
Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1524208851011_0003)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.87 s
--------------------------------------------------------------------------------
OK
19038
Time taken: 10.114 seconds, Fetched: 1 row(s) 

A select will return gabbled output, but the is loaded.

Store images/videos into Hadoop HDFS

hdfs dfs -put /src_image_file /dst_image_file 

And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/

http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/

https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding

Hope that helps

Labels