Support Questions
Find answers, ask questions, and share your expertise

SPARK2 write to Hive external HBase table fails.

SPARK2 write to Hive external HBase table fails.

New Contributor

Hello, I have the following scenario:

I need to get data from one transactions-enabled Hive table (e.g db1.tb1), process it and then insert it to another Hive table, which points to an external HBase table (e.g db1.tb2 -> ns1:tb2)

I can successfully read from both tables and if I manually insert data into the second table in Hive (tb2) I can see the results from both Hive and HBase. I have tried using two types of queries for the insert statement, one with LLAP through JDBC2 and another with Spark sql. Both queries do NOT work.
When I attempt using Spark sql the spark job just hangs with no further message than this:

18/04/18 08:15:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.1.22:35184 (size: 99.0 KB, free: 366.2 MB)
18/04/18 08:15:41 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/04/18 08:15:41 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at insertInto at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
18/04/18 08:15:41 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
18/04/18 08:15:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ma02, executor 1, partition 0, NODE_LOCAL, 4930 bytes)
18/04/18 08:15:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ma01, executor 2, partition 1, NODE_LOCAL, 4930 bytes)
18/04/18 08:15:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ma01:57394 (size: 99.0 KB, free: 366.2 MB)
18/04/18 08:15:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ma02:60157 (size: 99.0 KB, free: 366.2 MB)
18/04/18 08:15:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ma01:57394 (size: 36.1 KB, free: 366.2 MB)
18/04/18 08:15:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ma02:60157 (size: 36.1 KB, free: 366.2 MB)

And when I do the LLAP approach I receive a rather strange error message:

Py4JJavaError: An error occurred while calling o341.insertInto.
: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:LOCATION may not be specified for HBase.)

P.S. I have written the script in python and I have included all of the necessary jars with the spark-submit (--jars flag) and I am firmly convinced that there are no mismatching versions

Best,

Mitko Benkov