Created 05-02-2017 03:44 PM
Hi,
I would like to create an external table in Hive on different databases (MySQL, Oracle, DB2..) because I do not want to move the data, either in HDFS or in Hive directly.
How can I do that?
Created 05-02-2017 03:56 PM
Hi,
There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler
Example HQL:
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
id INT,
id_double DOUBLE,
names STRING,
test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
"mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
"mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
"mapred.jdbc.username"="root",
"mapred.jdbc.input.table.name"="JDBCTable",
"mapred.jdbc.output.table.name"="JDBCTable",
"mapred.jdbc.password"="",
"mapred.jdbc.hive.lazy.split"= "false"
);
Created 05-02-2017 03:56 PM
Hi,
There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler
Example HQL:
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
id INT,
id_double DOUBLE,
names STRING,
test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
"mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
"mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
"mapred.jdbc.username"="root",
"mapred.jdbc.input.table.name"="JDBCTable",
"mapred.jdbc.output.table.name"="JDBCTable",
"mapred.jdbc.password"="",
"mapred.jdbc.hive.lazy.split"= "false"
);
Created 05-02-2017 04:07 PM
Thank you @Ward Bekker
Can I use it for any DBs?
Created 05-02-2017 04:43 PM
Looks like any db with a JDBC driver, but I personally never used this handler, so can't vouch for it. I would recommend to test it for you DB's (MySQL, Oracle, db2...).
Created 05-02-2017 07:55 PM
If you are using Greenplum then there is an existing protocol which will take care of your use case. gphdfs protocol. Its simple and easy but it will support only TEXT and CSV as of now.
Check the above link for gphdfs protocol.