Support Questions

Find answers, ask questions, and share your expertise

Hive - Create external table on database (MySQL, Oracle, db2...)

avatar
Rising Star

Hi,

I would like to create an external table in Hive on different databases (MySQL, Oracle, DB2..) because I do not want to move the data, either in HDFS or in Hive directly.

How can I do that?

1 ACCEPTED SOLUTION

avatar

Hi,

There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler

Example HQL:

DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

View solution in original post

5 REPLIES 5

avatar

Hi,

There is a Hive Storage Handler for JDBC that allows you to do this: https://github.com/qubole/Hive-JDBC-Storage-Handler

Example HQL:

DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

avatar
Rising Star

Thank you @Ward Bekker

Can I use it for any DBs?

avatar

Looks like any db with a JDBC driver, but I personally never used this handler, so can't vouch for it. I would recommend to test it for you DB's (MySQL, Oracle, db2...).

avatar

Hi @Raphaël MARY

If you are using Greenplum then there is an existing protocol which will take care of your use case. gphdfs protocol. Its simple and easy but it will support only TEXT and CSV as of now.

https://discuss.pivotal.io/hc/en-us/articles/202635496-How-to-access-HDFS-data-via-GPDB-external-tab...

Check the above link for gphdfs protocol.

avatar
New Contributor

CREATE EXTERNAL TABLE dwsimp.dim_agrupamento (
id INT,

agrupamento_nome STRING,
agrupamento_ordem INT,
dim_relatorio_id INT,
agrupamento_campo STRING
)
STORED AS ORC
TBLPROPERTIES
(org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler,

mapred.jdbc.driver.class = "oracle.jdbc.OracleDriver",
mapred.jdbc.url = "jdbc:oracle:thin:@//jdbc:oracle:thin:@//host:port/servicename",
mapred.jdbc.username = "user",
mapred.jdbc.password= "password",
mapred.jdbc.input.table.name="JDBCTable",
mapred.jdbc.output.table.name="JDBCTable",
mapred.jdbc.hive.lazy.split"= "false");

 

Error: Error while compiling statement: FAILED: ParseException line 10:2 cannot recognize input near 'org' '.' 'apache' in table properties list (state=42000,code=40000)