Created 04-20-2018 12:47 AM
Created 04-20-2018 08:02 AM
Yes store images in binary format see below but retrieval is another process altogether
Create table image
beeline> ! connect jdbc:hive2://texas.us.com:10000/default Enter username for jdbc:hive2://texas.us.com:10000/default: hive Enter password for jdbc:hive2://texas.us.com:10000/default: **** Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205) Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://texas.us.com:10000/default> show databases; +----------------+--+ | database_name | +----------------+--+ | default | | geolocation | +----------------+--+ 4 rows selected (2.397 seconds) use geolocation; Create table image(picture binary); show tables;
Now to load image in it is as simple as the load data statement as:
hive> show databases; OK default geolocation Time taken: 1.955 seconds, Fetched: 4 row(s) hive> use geolocation; hive> load data local inpath '/tmp/photo.jpg' into table image;
Now check the image
hive> select count(*) from image; Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1524208851011_0003) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.87 s -------------------------------------------------------------------------------- OK 19038 Time taken: 10.114 seconds, Fetched: 1 row(s)
A select will return gabbled output, but the is loaded.
Store images/videos into Hadoop HDFS
hdfs dfs -put /src_image_file /dst_image_file
And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/
http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/
https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding
Hope that helps
Created 04-20-2018 06:51 AM
You can use the BINARY data type in Hive. Store the photo/image as binary in the hive table. You may retrieve it back from the query results and display it in your frontend application.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-MiscTypes
Created 11-25-2019 04:08 AM
Does this help with columns having data type - image ??
converting the image data type to binary again is leading to data corruption.
Please see the issue highlighted below :
Created 04-20-2018 08:02 AM
Yes store images in binary format see below but retrieval is another process altogether
Create table image
beeline> ! connect jdbc:hive2://texas.us.com:10000/default Enter username for jdbc:hive2://texas.us.com:10000/default: hive Enter password for jdbc:hive2://texas.us.com:10000/default: **** Connected to: Apache Hive (version 1.2.1000.2.6.2.0-205) Driver: Hive JDBC (version 1.2.1000.2.6.2.0-205) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://texas.us.com:10000/default> show databases; +----------------+--+ | database_name | +----------------+--+ | default | | geolocation | +----------------+--+ 4 rows selected (2.397 seconds) use geolocation; Create table image(picture binary); show tables;
Now to load image in it is as simple as the load data statement as:
hive> show databases; OK default geolocation Time taken: 1.955 seconds, Fetched: 4 row(s) hive> use geolocation; hive> load data local inpath '/tmp/photo.jpg' into table image;
Now check the image
hive> select count(*) from image; Query ID = geolocation_20180420094947_79e8e1fb-dfb3-40c6-949e-3fb61e8bc7d1 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1524208851011_0003) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.87 s -------------------------------------------------------------------------------- OK 19038 Time taken: 10.114 seconds, Fetched: 1 row(s)
A select will return gabbled output, but the is loaded.
Store images/videos into Hadoop HDFS
hdfs dfs -put /src_image_file /dst_image_file
And if your intent is more than just storing the files, you might find HIPI useful. HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment http://hipi.cs.virginia.edu/
http://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/
https://content.pivotal.io/blog/using-hadoop-mapreduce-for-distributed-video-transcoding
Hope that helps