- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Left outer join doesn't work in Hive
- Labels:
-
Apache Hive
Created ‎07-24-2017 02:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have two tables, dictionary and data. I need to join them, but if in table with data value is null then need null from dictionary.
I create sql:
select call_history_fas4sec.external_id, call_history_fas4sec.contract, call_history_fas4sec.lac, call_history_fas4sec.cid, lac_cell.address from c1.call_history_fas4sec left outer join pps_adm.lac_cell on lac_cell.cid=substr(call_history_fas4sec.cid,1,4) and lac_cell.lac=call_history_fas4sec.lac where call_history_fas4sec.hday = '2016-01-01' and call_history_fas4sec.external_id in ('674576660') and lac_cell.hday = '2016-01-01';
And error:
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 3, vertexId=vertex_1497438116058_42449_1_00, diagnostics=[Vertex vertex_1497438116058_42449_1_00 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: lac_cell initializer failed, vertex=vertex_1497438116058_42449_1_00 [Map 3], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat.getSplits(BucketizedHiveInputFormat.java:141) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:447) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:299) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235) ... 16 more Caused by: java.lang.IllegalArgumentException: delta_6286848_6287847 does not start with base_ at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:182) at org.apache.hadoop.hive.ql.io.AcidUtils.parseBaseBucketFilename(AcidUtils.java:210) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:794) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738) ... 4 more
Can explain what is wrong with my code?
Created ‎07-28-2017 08:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for answers, i resolved this problem.
I set:
set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
SET hive.tez.container.size=10240;
SET hive.tez.java.opts=-Xmx8192m;
set tez.runtime.io.sort.mb=200;
Created ‎07-24-2017 09:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hmmm are you using bucketing by chance? If you are could be similiar issue to this bug.
If you aren't then we're barking up the wrong tree.
Created ‎07-24-2017 09:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you interacting with an ACID table in non-ACID mode? This can happen, you could try restarting the Hive services after enabling ACID (setting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager) and then try the query.
Created ‎07-28-2017 08:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for answers, i resolved this problem.
I set:
set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
SET hive.tez.container.size=10240;
SET hive.tez.java.opts=-Xmx8192m;
set tez.runtime.io.sort.mb=200;
