Support Questions

Find answers, ask questions, and share your expertise

Hive - Create external table on database (MySQL, Oracle, db2...)

Contributor

Hi,

I would like to create an external table in Hive on different databases (MySQL, Oracle, DB2..) because I do not want to move the data, either in HDFS or in Hive directly.

How can I do that?

1 ACCEPTED SOLUTION

Hi,

There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler

Example HQL:

DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

View solution in original post

4 REPLIES 4

Hi,

There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler

Example HQL:

DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

Contributor

Thank you @Ward Bekker

Can I use it for any DBs?

Looks like any db with a JDBC driver, but I personally never used this handler, so can't vouch for it. I would recommend to test it for you DB's (MySQL, Oracle, db2...).

Hi @Raphaël MARY

If you are using Greenplum then there is an existing protocol which will take care of your use case. gphdfs protocol. Its simple and easy but it will support only TEXT and CSV as of now.

https://discuss.pivotal.io/hc/en-us/articles/202635496-How-to-access-HDFS-data-via-GPDB-external-tab...

Check the above link for gphdfs protocol.