Support Questions

Find answers, ask questions, and share your expertise

Hive cli error - "Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown"

avatar
Contributor

I have gone through so many articles but no luck. This cluster is HDP2.5 without non ambari cluster . Help us to resolve the issue while trying hive cli. It seems to NM localization error. 

In the NM log..

-------------------

2024-01-12 05:50:28,569 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://<namenode host>:8020/tmp/hive/root/_tez_session_dir/1d4c0b74-6820-4070-81b2-af53704bbc66/hive-hcatalog-core.jar transitioned from INIT to DOWNLOADING
2024-01-12 05:50:28,569 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://<namenode host>:8020/tmp/hive/root/_tez_session_dir/1d4c0b74-6820-4070-81b2-af53704bbc66/jets3t-0.9.4.jar transitioned from INIT to DOWNLOADING
2024-01-12 05:50:28,569 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://<namenode>:8020/tmp/hive/root/_tez_session_dir/1d4c0b74-6820-4070-81b2-af53704bbc66/.tez/application_xxxxxxxx_0001/tez.session.local-resources.pb transitioned from INIT to DOWNLOADING
2024-01-12 05:50:28,569 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://<namenode host>:8020/tmp/hive/root/_tez_session_dir/1d4c0b74-6820-4070-81b2-af53704bbc66/.tez/application_xxxxxxxx_0001/tez-conf.pb transitioned from INIT to DOWNLOADING
2024-01-12 05:50:28,569 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(712)) - Created localizer for container_e85_xxxxxxx_0001_01_000001
2024-01-12 05:50:28,570 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1134)) - Localizer failed
java.lang.NullPointerException
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:345)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:485)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1101)
2024-01-12 05:50:28,570 ERROR nodemanager.DeletionService (DeletionService.java:afterExecute(183)) - Exception during execution of task in DeletionService
java.lang.NullPointerException
at org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:276)
at org.apache.hadoop.fs.FileContext.delete(FileContext.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:273)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2024-01-12 05:50:28,570 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e85_1705045750269_0007_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED
2024-01-12 05:50:28,571 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e85_1705045750269_0007_01_000001 transitioned from LOCALIZATION_FAILED to DONE

----------------RM audit log

2024-01-12 02:49:38,628 WARN resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_xxxxxxxxx_0001 failed 2 times due to AM Container for appattempt_xxxxxxxxx_00001_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://<hostname>:8088/cluster/app/application_xxxxxxxx_0001 Then click on links to logs of each attempt.
Diagnostics: null
Failing this attempt. Failing the application. APPID=application_xxxxxxx_0001

-------------------------

 

 

 

4 REPLIES 4

avatar
Master Collaborator

@narasimha8177 I think HADOOP-12252 fixes this issue. This is not available in HDP 2.5.

Do check the disk usage in all your node manager hosts. 

df -Th

If you notice any directory 100% utilized, do clear some files, and make sure all the directories read/writable.

avatar
Contributor

Hi @smruti ,

I have cross checked and there is no space issue.

The issue is not resolved.

Thanks

Narasimha 

avatar
Master Collaborator

@narasimha8177 is it happening for all the jobs? Could you check the utilization of the yarn.nodemanager.local-dirs(YARN NodeManager Local directories) directory? You must have defined a path under YARN configuration? The localized resources gets stored under this location. Try to delete all the contents from the usercache directories on all data nodes, and resubmit the job. While deleting the contents from usercache directory make sure that there is no job in running state. Else, take a downtime to perform this. 

We need to understand why this localization fails. Either the source file is missing, or the target location is not in good shape.

avatar
Contributor

Hello Smruti,

I tried to clear the usercache dir in all danotes as suggested. But the issue is resolved. Basically it is issue with Tez session.  It is happening all the jobs which are involved tez .

 

Thanks Narasimha.