Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive does not group splits when used with oozie

Hive does not group splits when used with oozie

New Contributor

Hi, I am trying to run a hive process using oozie and it is taking too long to process. The normal hive query takes about 35 min and when scheduled with oozie it takes 2 hours 45 minutes. I checked the logs and found out that there are 75000 splits which the hive query is trying to process. When run from hive prompt or using a shell script it groups the 75000 splits and makes it in to 450 splits. This grouping doesnt happen when I use hive with oozie. I set tez.job.queuename property to prod but that doesnt resolve the problem. Can some one help me to group these splits.

4 REPLIES 4

Re: Hive does not group splits when used with oozie

Master Guru
Oozie does not use all of your Hive properties that the CLI uses, by default. Please pass a HDFS-copied hive-site.xml location as the <job-xml> field in your workflow's Hive action, to make it load all the properties that are responsible for combining your splits.

Re: Hive does not group splits when used with oozie

New Contributor

Harsh, 

I digged in a little more and found that the execution engine when I run through CLI is tez and when I run through oozie is hive. So I added a property hive.execution.engine as shown below. I also added one more property  oozie.hive.details which points to hdfs location as shown below - hive-config.xml. I ran with these changes but still I am unable to set the default engine as "tez"

 

 

<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>

 

<property>
<name>oozie.hive.defaults</name>
<value>/<hdfspath>oozieworkflow/hive-config.xml</value>
</property>

Highlighted

Re: Hive does not group splits when used with oozie

Master Guru
Have you tried the <job-xml> suggestion also? The execution engine isn't the only influencing factor for the splits computation/execution.

If you're looking to go fully manual config, please insert all of your /etc/hive/conf/hive-site.xml properties into the <configuration> section of the action.

Re: Hive does not group splits when used with oozie

New Contributor
Hi,
I tried the job-xml suggesion, I get an error which is mentioned below. I
tried to give the properties as tag in xml also, even that gave the same
error.

9255 [main] INFO org.apache.hadoop.hive.ql.exec.Utilities -
Serializing MapWork via kryo
9256 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
duration=1 from=org.apache.hadoop.hive.ql.exec.Utilities>
9256 [main] INFO org.apache.hadoop.hive.ql.exec.Utilities - Setting
plan: /tmp/hive-yarn/hive_2015-12-18_15-21-58_923_8371738201877445005-1/6d2022dc-8ee9-47bf-965d-3db95345e702/map.xml
9261 [main] INFO org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisher
- created : hdfs://MCDHADOOPUDA/tmp/hive-yarn/hive_2015-12-18_15-21-58_923_8371738201877445005-1/-ext-10001
9261 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
end=1450473722293 duration=8
from=org.apache.hadoop.hive.ql.exec.tez.TezTask>
9268 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
duration=556 from=org.apache.hadoop.hive.ql.exec.tez.TezTask>
9268 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
method=TezSubmitDag from=org.apache.hadoop.hive.ql.exec.tez.TezTask>
13093 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Tez session
was closed. Reopening...
13094 [main] INFO org.apache.hadoop.hive.ql.exec.tez.TezSessionState
- Closing Tez Session
13101 [main] INFO org.apache.hadoop.hive.ql.exec.tez.TezSessionState
- User of session id 3231e7aa-7b4a-4123-a260-ab355b111be1 is daas
13108 [main] INFO org.apache.hadoop.hive.ql.exec.tez.DagUtils - Jar
dir is null/directory doesn't exist. Choosing HIVE_INSTALL_DIR -
hdfs:/user/daas/.hiveJars
13302 [main] INFO org.apache.hadoop.hive.ql.exec.tez.DagUtils -
Localizing resource because it does not exist:
file:/data12/hadoop/yarn/local/filecache/4248/hive-exec-0.13.0.2.1.2.0-402.jar
to dest: hdfs://hdp001-nn:8020/user/daas/.hiveJars/hive-exec-0.13.0.2.1.2.0-402-259a3176ba17db3143afb95dd3b3a726edb95046afe98764687404f83d5e72c6.jar
13303 [main] INFO org.apache.hadoop.hive.ql.exec.tez.DagUtils -
Looks like another thread is writing the same file will wait.
13303 [main] INFO org.apache.hadoop.hive.ql.exec.tez.DagUtils -
Number of wait attempts: 5. Wait interval: 5000
13306 [main] INFO org.apache.hadoop.hive.ql.exec.tez.DagUtils -
Resource modification time: 1449604304433
13307 [main] INFO org.apache.hadoop.hive.ql.exec.tez.TezSessionState
- Opening new Tez Session (id: 3231e7aa-7b4a-4123-a260-ab355b111be1,
scratch dir: hdfs://hdp001-nn:8020/tmp/hive-yarn/_tez_session_dir/3231e7aa-7b4a-4123-a260-ab355b111be1)
13502 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Session
re-established.
21033 [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed to
execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: Application not running,
applicationId=application_1442077641322_61938,
yarnApplicationState=FAILED, finalApplicationStatus=FAILED,
trackingUrl=hdp001-7:8088/cluster/app/application_1442077641322_61938
at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
at org.apache.tez.client.TezSession.waitForProxy(TezSession.java:417)
at org.apache.tez.client.TezSession.submitDAG(TezSession.java:224)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:336)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:172)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:316)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:277)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
21235 [main] INFO org.apache.hadoop.hive.ql.hooks.ATSHook - Created ATS Hook
21235 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
from=org.apache.hadoop.hive.ql.Driver>
21235 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger -
start=1450473734267 end=1450473734267 duration=0
from=org.apache.hadoop.hive.ql.Driver>
21235 [ATS Logger 0] INFO org.apache.hadoop.hive.ql.hooks.ATSHook -
Received post-hook notification for
:yarn_20151218152121_608e6df8-a877-46b6-b256-ee3260cbfe0f