Member since
10-20-2017
63
Posts
0
Kudos Received
0
Solutions
09-18-2018
09:09 PM
@Raj ji Yes you can use it.. PutHBaseJson processor:- 1. expects json individual messages(not an array) 2. You need to extract the values for ServerName,ServerNo from the content using EvaluateJsonPath processor then use Row Identifier
${ServerName},${ServerNo} (or) PutHBaseRecord processor: Using Record processor we don't need to split the array of json messages if you are using this processor but we need to prepare the row_id by using Update Record processor by using concat(/ServerName,',',/ServerNo) function. Refer to this link for more details regards to UpdateRecord processor concat function usage.
... View more
08-30-2018
11:33 AM
@Raj ji Check out this solution here: https://community.hortonworks.com/questions/147226/replacetextprocessor-remove-blank-lines.html If this answer is helpful please choose ACCEPT to mark the question resolved.
... View more
06-27-2018
03:56 AM
@Raj ji You can use ExecuteProcess (doesn't allow any incoming connections) (or) ExecuteStreamCommand processors to trigger the shell script. ExecuteProcess configs: As your executable script is on Machine 4 and NiFi installed on Machine1 so create a shell script on Machine 1 which ssh into Machine 4 and trigger your Python Script. Refer to this and this links describes how to use username/password while doing ssh to remote machine. As you are going to store the logs into a file, so you can use Tail file processor to tail the log file and check is there any ERROR/WARN, by using RouteText Processor then trigger mail. (or) Fetch the application id (or) application name of the process and then use yarn rest api to get the status of the job Please refer to how to monitor yarn applications using NiFi and Starting Spark jobs directly via YARN REST API and this link describes yarn rest api capabilities.
... View more
06-09-2018
03:24 AM
Fantastic and detailed reply. I would try this out and reply if that works .Thanks a lot @Shu
... View more
05-15-2018
08:09 PM
Error shows there are missing blocks Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003 at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:77) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:285) ... 16 more Check Namenode UI to see whether you have missing blocks.
... View more
05-07-2018
08:29 AM
@Raj ji Yes, Symlinks are preferred to tweaking the py codes !!! How I quickly analyzed the first pointer was " parent directory /usr/hdp/2.6.3.0/hive/conf doesn't exist" this by experience is a symlink which points to the configuration files. So without that symlink access to the conf files you can't start the WebHcat server
... View more
04-25-2018
01:18 AM
1 Kudo
@Raj ji
You can use Execute Process (or) Execute Stream Command processors to pass arguments to the shell script. Execute Process Processor:- This processor won't need any upstream connections to trigger the script i.e this processor can run its own based on the schedular. Example:- I'm having sample script which gets 2 command line arguments and echo output. bash$ cat sample_script.sh
#!/bin/bash
echo "First arg: $1"
echo "Second arg: $2" Execution in terminal:- bash$ ./sample_script.sh hello world
First arg: hello
Second arg: world 1.Execution in NiFi using ExecuteProcess Processor:- Command bash Command Arguments /tmp/sample_script.sh hello world //here we are triggering the shell script and passing arguments with space Batch Duration
No value set
Redirect Error Stream false Argument Delimiter space //by default if Argument Delimiter is ; then command arguments would be /tmp/sample_script.sh;hello;world Configs:- Success relation from ExecuteProcess will output the below as content of flowfile First arg: hello
Second arg: world 2.Execution in NiFi using ExecuteStreamCommand processor:- This processor needs some upstream connection to trigger the script. Flow:- We have used generateflowfile processor as a trigger to ExecuteStreamCommand script Generateflowfile Configs:- Added two attributes arg1,arg2 to the flowfile ExecuteStreamCommand processor:- Command Arguments
${arg1};${arg2} Command Path
/tmp/sample_script.sh
Argument Delimiter
;
Now we are using the attributes that added in generateflowfile processor and passing them to the script. Use the OutputStream relation from ExecuteStreamCommand processor and the output flowfile content would be same First arg: hello
Second arg: world By using these processors you can trigger the shell script and pass the arguments also. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
10-30-2017
07:45 PM
I'm running a hive query and it will create a MR . The table is partitioned table and ORC formatted table.I'm not trying to insert values into the tables . I need to filter not null values from the table . When I tried to do that I'm getting the above error. Still couldn't figure out why .
... View more