Created 03-23-2017 11:35 AM
I have a 3 nodes HDF cluster(in AWS cloud) where NiFi is running across the cluster, i want to use a NiFi processor to trigger shell/python script on a remote machine (OnPremise) to perform certain action written in the shell/python script.
Created 03-23-2017 12:39 PM
NiFi offers a variety of processors that support executing commands or scripts locally to the NiFi installation. In order to execute a script remotely to NiFi you would still need to have a local script that would be called by the processor that could then go perform that task.
Thank you,
Matt
Created 03-23-2017 12:39 PM
NiFi offers a variety of processors that support executing commands or scripts locally to the NiFi installation. In order to execute a script remotely to NiFi you would still need to have a local script that would be called by the processor that could then go perform that task.
Thank you,
Matt
Created 03-23-2017 01:44 PM
Try using the "ExecuteProcess" processor.
Property | Value |
Command | ssh |
Command Arguments | -i "<path to private key>" <user>@<remotehost> python <script> & |
Thanks,
Matt
Created 03-30-2017 09:36 AM
Hey @Matt Clarke, Is there a way we can connect to the remote machine with password instead of key. i was able to run to do ssh (with password) from backend. Can i do it from "ExecuteProcess" processor. thanks.
Created 03-30-2017 12:07 PM
You could install sshpass which would allow you to use a password in the ssh connection, but i strongly recommend against this approach. This requires you to set your password in plaintext in your NiFi processor which exposes it to anyone who has access to view that component.
Thanks,
Matt
Created 03-23-2017 01:12 PM
@Matt I have tried exploring Execute Script processor and Execute Process Processor but was not able to do ssh to remote machine. Could you provide more detail on this or help me with the processor and configurations.
Thanks
Created 03-23-2017 03:53 PM
Can you ssh to the remote machine from the command line?
Created 03-23-2017 05:15 PM
In addition to above answers, I've had success using the ExecuteStream processor to stream a script to a remote host and pull the results back into the Dataflow. So you could run it using parameters from the flow and without needing to deploying the script before calling it. In this example, we kinit and run a hive query against a remote kerberised cluster node.
Processor Name | ReplaceText |
Search Value | (?s:^.*$) |
Replacement Value | klist -s || kinit user@DOMAIN.COM -k -t ~/user.keytab hive -e "Some Hive Query" |
Replacement Strategy | Always Replace |
Route into:
Processor Name | ExecuteStreamCommand |
Command Arguments | -o StrictHostKeyChecking=no user@hostname.fqdn |
Command Path | ssh |
Created 03-24-2017 08:03 AM
Hey @Dan Chaffelson thanks for sharing, Can you help me with a Processor (with configuration) that can be used to trigger a python script on remote machine (by doing ssh) and pulling back the result to HDF cluster. I have used Execute process processor with below configs to trigger the python script on remote machine. ( As suggest by @Matt Clarke) and was able to run the python script.
Property | Value |
Command | ssh |
Command Arguments | -i "<path to private key>" <user>@<remotehost> python <script> & |
Now i want to pull back the output of python command (which is at /home/abc with 777 permission) to HDF cluster. Is there a processor to do that ?
Created 03-24-2017 08:15 AM
If you use the setup above, put your shell script to be executed on the remote machine in the 'Replacement Value' of the ReplaceText processor. It will bring back everything printed to StdOut. What is your Python script doing? There may be a better solution we could suggest if we knew the outcome you were trying to achieve.
Created 03-24-2017 08:51 AM
@Dan Chaffelson The python script (is actually a big python code ~ 500 Lines) pulls data from a MS SQL Server and dumps data in CSV format to a specified location in that machine.
Actually, the whole use case is extracting Tabular Data from various SQL sources and putting them to HDFS (HDP), we are using NiFi to orchestrate the process. Hence want to instantiate and schedule the whole flow with NiFi only.
Created 03-24-2017 09:14 AM
If you want to extract the data with HDF back into the Dataflow this is not going to be the best way to do it, but the steps I outlined above would probably work. With the requirement you describe, I would instead suggest you implement the data movement and conversion sections of your python script using a combination of SQL and TEXT processors, then just run the python script for the bits of logic needed using the ExecuteScript processor, then use HDF again to write it to the right location.
Remote executing a script and pulling the output is a more brittle process and doesn't leverage the flexibility and power of NiFi, I suspect you would spend more time troubleshooting workarounds then using the default processors that achieve the same things.
Created 03-30-2017 09:39 AM
Hey @Dan Chaffelson
Is there a way we can connect to the remote machine with password instead of key. i was able to run to do ssh (with password) from backend. Can i do it from "ExecuteProcess" processor or any other processor ? thanks.
Created 03-30-2017 10:42 AM
@Praveen Singh These processors just replicate whatever you could do with piping together bash commands, more or less - so if you want to use an unsecured cleartext password in the processor you could try sshpass or other shell plugins to enable it. Basically if you can solve the problem by piping together bash commands, you should be able to do it in the processor. But I wouldn't recommend using cleartext passwords, keys are far more secure.
Created 03-24-2017 07:25 AM
Thanks @Matt Clarke It worked fine, i was able to execute the python script.
Is there a way we can get back the output data which is being generated by the python code, which is being saved to a directory in the remote machine ?
Created 03-24-2017 12:33 PM
Standard out from your script is written to the content of the FlowFile generated by the ExecuteProcess NiFi processor. So perhaps just tweaking your script to write to standard out rather then a file on disk is all you need to do.
Created 09-26-2018 09:50 PM
@Matt Clarke I am also trying to do a POC using ExecuteProcess to start a .sh file on a remote server. Right now I am just trying to move a file on remote server from one location to another. But I am getting an error (Host Key Verification Failed) on the Nifi Processor. I am able to do the same via terminal on host machine (on which Nifi is installed and running). What could be the issue here? Help!
ssh -t user@hostname 'mv ~/folder1/test.txt ~/folder2/' <-- I am able to do this successfully on terminal.
ExecuteProcess Properties :
Command: ssh Command Arguments: -i "~/.ssh" user@hostname 'mv ~/folder1/test.txt ~/folder2/' Batch Duration : No value set Redirect Error Stream : false Working Directory : No value set Argument Delimiter : No value set