Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Left outer join doesn't work in Hive

New Contributor

I have two tables, dictionary and data. I need to join them, but if in table with data value is null then need null from dictionary.

I create sql:

select call_history_fas4sec.external_id, call_history_fas4sec.contract, call_history_fas4sec.lac, call_history_fas4sec.cid, lac_cell.address from c1.call_history_fas4sec left outer join pps_adm.lac_cell on lac_cell.cid=substr(call_history_fas4sec.cid,1,4) and lac_cell.lac=call_history_fas4sec.lac where call_history_fas4sec.hday = '2016-01-01' and call_history_fas4sec.external_id in ('674576660') and lac_cell.hday = '2016-01-01';

And error:

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 3, vertexId=vertex_1497438116058_42449_1_00, diagnostics=[Vertex vertex_1497438116058_42449_1_00 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: lac_cell initializer failed, vertex=vertex_1497438116058_42449_1_00 [Map 3], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat.getSplits(BucketizedHiveInputFormat.java:141) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:447) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:299) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235) ... 16 more Caused by: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:182) at org.apache.hadoop.hive.ql.io.AcidUtils.parseBaseBucketFilename(AcidUtils.java:210) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:794) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738) ... 4 more

Can explain what is wrong with my code?

1 ACCEPTED SOLUTION

New Contributor

Thanks for answers, i resolved this problem.

I set:

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

SET hive.tez.container.size=10240;

SET hive.tez.java.opts=-Xmx8192m;

set tez.runtime.io.sort.mb=200;

View solution in original post

3 REPLIES 3

Expert Contributor

hmmm are you using bucketing by chance? If you are could be similiar issue to this bug.

If you aren't then we're barking up the wrong tree.

Master Collaborator

Are you interacting with an ACID table in non-ACID mode? This can happen, you could try restarting the Hive services after enabling ACID (setting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager) and then try the query.

New Contributor

Thanks for answers, i resolved this problem.

I set:

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

SET hive.tez.container.size=10240;

SET hive.tez.java.opts=-Xmx8192m;

set tez.runtime.io.sort.mb=200;