Support Questions

Find answers, ask questions, and share your expertise

Left outer join doesn't work in Hive

avatar
New Contributor

I have two tables, dictionary and data. I need to join them, but if in table with data value is null then need null from dictionary.

I create sql:

select call_history_fas4sec.external_id, call_history_fas4sec.contract, call_history_fas4sec.lac, call_history_fas4sec.cid, lac_cell.address from c1.call_history_fas4sec left outer join pps_adm.lac_cell on lac_cell.cid=substr(call_history_fas4sec.cid,1,4) and lac_cell.lac=call_history_fas4sec.lac where call_history_fas4sec.hday = '2016-01-01' and call_history_fas4sec.external_id in ('674576660') and lac_cell.hday = '2016-01-01';

And error:

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 3, vertexId=vertex_1497438116058_42449_1_00, diagnostics=[Vertex vertex_1497438116058_42449_1_00 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: lac_cell initializer failed, vertex=vertex_1497438116058_42449_1_00 [Map 3], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat.getSplits(BucketizedHiveInputFormat.java:141) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:447) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:299) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235) ... 16 more Caused by: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:182) at org.apache.hadoop.hive.ql.io.AcidUtils.parseBaseBucketFilename(AcidUtils.java:210) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:794) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738) ... 4 more

Can explain what is wrong with my code?

1 ACCEPTED SOLUTION

avatar
New Contributor

Thanks for answers, i resolved this problem.

I set:

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

SET hive.tez.container.size=10240;

SET hive.tez.java.opts=-Xmx8192m;

set tez.runtime.io.sort.mb=200;

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

hmmm are you using bucketing by chance? If you are could be similiar issue to this bug.

If you aren't then we're barking up the wrong tree.

avatar

Are you interacting with an ACID table in non-ACID mode? This can happen, you could try restarting the Hive services after enabling ACID (setting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager) and then try the query.

avatar
New Contributor

Thanks for answers, i resolved this problem.

I set:

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

SET hive.tez.container.size=10240;

SET hive.tez.java.opts=-Xmx8192m;

set tez.runtime.io.sort.mb=200;