Support Questions

mliem · ‎07-07-2016

Hello,

I have an flow that fetches a file, extract texts, and runs a shell script which in turn runs a hive script. (I am just pulling a date from the file and passing it as -hivevar). My shell script looks something like this:

#!/bin/bash
endDate=$1
CONNECT="jdbc:hive2://master2:10000/default"
beeline -u "$CONNECT" -n root -hivevar endDate="$endDate" -f /pathToScript/Hive_scipt.hql

And my NiFi flow looks like this:

The hive script completes and data is inserted into my table but the ExecuteStreamCommand stays running (1 stays at the top corner) indefinitely and I have to restart nifi.. (is there a better way to handle this?).

I've noticed a few things:

If I reduce the size of the query (my hive query is a number of union's) the ExecuteStreamCommand wont hang.
When the job hangs, the AM on Resource Manager stays Running for quite some time ~10 min (waits for AM timeout). Sort of like when you create a Hive CLI Tez Session. When i reduce the query size and the job doesn't hang the AM goes to finish state right away. Looking through the application log of job that hangs vs one that doesnt, I see that once the DAG is successful, the AM will call to unregister from RM right away:

2016-07-08 09:49:43,828 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Central Dispatcher queue size after DAG completion, before cleanup: 0
2016-07-08 09:49:43,828 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Waiting for next DAG to be submitted.
2016-07-08 09:49:43,829 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Cleaning up DAG: name=INSERT OVERWRITE TABLE extract.r...t3.Rank=1(Stage-1), with id=dag_1467841289379_0087_1
2016-07-08 09:49:43,831 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Completed cleanup for DAG: name=INSERT OVERWRITE TABLE extract.r...t3.Rank=1(Stage-1), with id=dag_1467841289379_0087_1
2016-07-08 09:49:44,151 [INFO] [IPC Server handler 0 on 55823] |client.DAGClientHandler|: Received message to shutdown AM
2016-07-08 09:49:44,151 [INFO] [IPC Server handler 0 on 55823] |rm.TaskSchedulerEventHandler|: TaskScheduler notified that it should unregister from RM
2016-07-08 09:49:44,151 [INFO] [IPC Server handler 0 on 55823] |app.DAGAppMaster|: No current running DAG, shutting down the AM
2016-07-08 09:49:44,151 [INFO] [IPC Server handler 0 on 55823] |app.DAGAppMaster|: DAGAppMasterShutdownHandler invoked
2016-07-08 09:49:44,152 [INFO] [IPC Server handler 0 on 55823] |app.DAGAppMaster|: Handling DAGAppMaster shutdown
2016-07-08 09:49:44,153 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Sleeping for 5 seconds before shutting down

2016-07-08 09:49:49,153 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Calling stop for all the services
2016-07-08 09:49:49,153 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Initiating stop of YarnTaskScheduler
2016-07-08 09:49:49,153 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Releasing held containers
2016-07-08 09:49:49,154 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Removing all pending taskRequests

Whereas, with the job that is hanging, it waits for the AM to timeout rather ending right away:

2016-07-07 20:46:34,575 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Central Dispatcher queue size after DAG completion, before cleanup: 0
2016-07-07 20:46:34,576 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Waiting for next DAG to be submitted.
2016-07-07 20:46:34,576 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Cleaning up DAG: name=INSERT OVERWRITE TABLE extract.r...t3.Rank=1(Stage-1), with id=dag_1467841289379_0085_1
2016-07-07 20:46:34,578 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Completed cleanup for DAG: name=INSERT OVERWRITE TABLE extract.r...t3.Rank=1(Stage-1), with id=dag_1467841289379_0085_1
2016-07-07 20:46:38,380 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout delay expired or is new. Releasing container, containerId=container_e11_1467841289379_0085_01_000008, containerExpiryTime=1467938798314, idleTimeout=10000, taskRequestsCount=0, heldContainers=10, delayedContainers=9, isNew=false
2016-07-07 20:46:38,381 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1467841289379_0085_1][Event:CONTAINER_STOPPED]: containerId=container_e11_1467841289379_0085_01_000008, stoppedTime=1467938798381, exitStatus=0
2016-07-07 20:46:38,381 [INFO] [ContainerLauncher #9] |launcher.ContainerLauncherImpl|: Stopping container_e11_1467841289379_0085_01_000008
2016-07-07 20:46:38,382 [INFO] [ContainerLauncher #9] |impl.ContainerManagementProtocolProxy|: Opening proxy : tf-hdpdata02.greenfieldethanol.com:45454
.
.
.
.
2016-07-07 20:55:43,282 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2501 lastPreemptionHeartbeat: 2500
2016-07-07 20:55:55,823 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2551 lastPreemptionHeartbeat: 2550
2016-07-07 20:56:08,362 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2601 lastPreemptionHeartbeat: 2600
2016-07-07 20:56:20,902 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2651 lastPreemptionHeartbeat: 2650
2016-07-07 20:56:33,442 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2701 lastPreemptionHeartbeat: 2700
2016-07-07 20:56:45,981 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2751 lastPreemptionHeartbeat: 2750
2016-07-07 20:56:58,522 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2801 lastPreemptionHeartbeat: 2800
2016-07-07 20:57:11,061 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:667648, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2851 lastPreemptionHeartbeat: 2850
2016-07-07 20:57:15,823 [INFO] [Timer-1] |app.DAGAppMaster|: Session timed out, lastDAGCompletionTime=1467938794575 ms, sessionTimeoutInterval=600000 ms
2016-07-07 20:57:15,823 [INFO] [Timer-1] |rm.TaskSchedulerEventHandler|: TaskScheduler notified that it should unregister from RM
2016-07-07 20:57:15,823 [INFO] [Timer-1] |app.DAGAppMaster|: No current running DAG, shutting down the AM
2016-07-07 20:57:15,823 [INFO] [Timer-1] |app.DAGAppMaster|: DAGAppMasterShutdownHandler invoked
2016-07-07 20:57:15,824 [INFO] [Timer-1] |app.DAGAppMaster|: Handling DAGAppMaster shutdown

Even after the AM closes (10 min later), the Nifi Process still remains running...

Running the full query or the script manually via command line works fine.
Behavior is not consistent. Sometimes it wont hang, sometimes it will...(most of the time it will..
ExecuteProcess works fine running the same command

Any ideas? Couldn't find anything in app.log.

mliem · ‎09-26-2016

The Nifi team has identified and issue with Hive scripts causing this processor to hang. Basically these hive commands are running Mapreduce or Tez jobs that are producing a lot of standard out which is being returned to the NiFi processor. If the amount of stdout or sterr returned gets large the processor can hang. To prevent this from happening, we recommend adding the “-S” option to hive commands or “—silent=true” to beeline commands that are executed using the NiFi script processors.

View solution in original post

rgelhausen · ‎07-07-2016

I'll hazard a guess that your query exceeds the buffer size for data piped to the processor.

As a workaround, try using PutFile to write the full Hive query to a local file, then running "hive -f myQuery.sql" instead.

mliem · ‎07-07-2016

@Randy Gelhausen The executeStreamCommand is running a shell script that has the "hive -f query.sql" already so I dont think its being piped to the processor. But maybe its the output?

mliem · ‎07-08-2016

Still not sure why this is happening but ExecuteProcess works 100% of the time so my work around right now is to useInvokeHTTP to get VersionID, Stop process, Update and Run...

mliem · ‎09-26-2016

The Nifi team has identified and issue with Hive scripts causing this processor to hang. Basically these hive commands are running Mapreduce or Tez jobs that are producing a lot of standard out which is being returned to the NiFi processor. If the amount of stdout or sterr returned gets large the processor can hang. To prevent this from happening, we recommend adding the “-S” option to hive commands or “—silent=true” to beeline commands that are executed using the NiFi script processors.

Cloudera Community

Support Questions

ExecuteStreamCommand Hangs when executing Hive Script