Support Questions
Find answers, ask questions, and share your expertise

Executing Shell/Python script to a remote machine in NiFi in HDF

I have a 3 nodes HDF cluster(in AWS cloud) where NiFi is running across the cluster, i want to use a NiFi processor to trigger shell/python script on a remote machine (OnPremise) to perform certain action written in the shell/python script.

1 ACCEPTED SOLUTION

Master Guru

@Praveen Singh

NiFi offers a variety of processors that support executing commands or scripts locally to the NiFi installation. In order to execute a script remotely to NiFi you would still need to have a local script that would be called by the processor that could then go perform that task.

Thank you,

Matt

View solution in original post

16 REPLIES 16

Master Guru

@Praveen Singh

NiFi offers a variety of processors that support executing commands or scripts locally to the NiFi installation. In order to execute a script remotely to NiFi you would still need to have a local script that would be called by the processor that could then go perform that task.

Thank you,

Matt

Master Guru

@Praveen Singh

Try using the "ExecuteProcess" processor.

PropertyValue
Commandssh
Command Arguments-i "<path to private key>" <user>@<remotehost> python <script> &

Thanks,

Matt

Hey @Matt Clarke, Is there a way we can connect to the remote machine with password instead of key. i was able to run to do ssh (with password) from backend. Can i do it from "ExecuteProcess" processor. thanks.

Master Guru

@Praveen Singh

You could install sshpass which would allow you to use a password in the ssh connection, but i strongly recommend against this approach. This requires you to set your password in plaintext in your NiFi processor which exposes it to anyone who has access to view that component.

Thanks,

Matt

@Matt I have tried exploring Execute Script processor and Execute Process Processor but was not able to do ssh to remote machine. Could you provide more detail on this or help me with the processor and configurations.

Thanks

Super Guru

Can you ssh to the remote machine from the command line?

In addition to above answers, I've had success using the ExecuteStream processor to stream a script to a remote host and pull the results back into the Dataflow. So you could run it using parameters from the flow and without needing to deploying the script before calling it. In this example, we kinit and run a hive query against a remote kerberised cluster node.

Processor NameReplaceText
Search Value(?s:^.*$)
Replacement Valueklist -s || kinit user@DOMAIN.COM -k -t ~/user.keytab hive -e "Some Hive Query"
Replacement StrategyAlways Replace

Route into:

Processor NameExecuteStreamCommand
Command Arguments-o StrictHostKeyChecking=no user@hostname.fqdn
Command Pathssh

Hey @Dan Chaffelson thanks for sharing, Can you help me with a Processor (with configuration) that can be used to trigger a python script on remote machine (by doing ssh) and pulling back the result to HDF cluster. I have used Execute process processor with below configs to trigger the python script on remote machine. ( As suggest by @Matt Clarke) and was able to run the python script.

PropertyValue
Commandssh
Command Arguments-i "<path to private key>" <user>@<remotehost> python <script> &

Now i want to pull back the output of python command (which is at /home/abc with 777 permission) to HDF cluster. Is there a processor to do that ?

If you use the setup above, put your shell script to be executed on the remote machine in the 'Replacement Value' of the ReplaceText processor. It will bring back everything printed to StdOut. What is your Python script doing? There may be a better solution we could suggest if we knew the outcome you were trying to achieve.

@Dan Chaffelson The python script (is actually a big python code ~ 500 Lines) pulls data from a MS SQL Server and dumps data in CSV format to a specified location in that machine.

Actually, the whole use case is extracting Tabular Data from various SQL sources and putting them to HDFS (HDP), we are using NiFi to orchestrate the process. Hence want to instantiate and schedule the whole flow with NiFi only.

If you want to extract the data with HDF back into the Dataflow this is not going to be the best way to do it, but the steps I outlined above would probably work. With the requirement you describe, I would instead suggest you implement the data movement and conversion sections of your python script using a combination of SQL and TEXT processors, then just run the python script for the bits of logic needed using the ExecuteScript processor, then use HDF again to write it to the right location.

Remote executing a script and pulling the output is a more brittle process and doesn't leverage the flexibility and power of NiFi, I suspect you would spend more time troubleshooting workarounds then using the default processors that achieve the same things.

Hey @Dan Chaffelson

Is there a way we can connect to the remote machine with password instead of key. i was able to run to do ssh (with password) from backend. Can i do it from "ExecuteProcess" processor or any other processor ? thanks.

@Praveen Singh These processors just replicate whatever you could do with piping together bash commands, more or less - so if you want to use an unsecured cleartext password in the processor you could try sshpass or other shell plugins to enable it. Basically if you can solve the problem by piping together bash commands, you should be able to do it in the processor. But I wouldn't recommend using cleartext passwords, keys are far more secure.

Thanks @Matt Clarke It worked fine, i was able to execute the python script.

Is there a way we can get back the output data which is being generated by the python code, which is being saved to a directory in the remote machine ?

Master Guru

@Praveen Singh

Standard out from your script is written to the content of the FlowFile generated by the ExecuteProcess NiFi processor. So perhaps just tweaking your script to write to standard out rather then a file on disk is all you need to do.

@Matt Clarke I am also trying to do a POC using ExecuteProcess to start a .sh file on a remote server. Right now I am just trying to move a file on remote server from one location to another. But I am getting an error (Host Key Verification Failed) on the Nifi Processor. I am able to do the same via terminal on host machine (on which Nifi is installed and running). What could be the issue here? Help!

ssh -t user@hostname 'mv ~/folder1/test.txt ~/folder2/'  <-- I am able to do this successfully on terminal. 

ExecuteProcess Properties :

Command: ssh 
Command Arguments: -i "~/.ssh" user@hostname 'mv ~/folder1/test.txt ~/folder2/' 
Batch Duration : No value set 
Redirect Error Stream : false 
Working Directory : No value set 
Argument Delimiter : No value set
; ;