Member since
07-07-2016
53
Posts
6
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
892 | 06-27-2016 10:00 PM |
01-11-2018
08:37 PM
Agreed. Thanks for suggestion. For now it seems I have a work around by changing the run schedule from 0 seconds to 1 seconds and I dont see Lease holder exception. Even though there is a little latency in writing to HDFS unlike 0 seconds but error has gone. I will work on your suggestion for production. Thanks for help! Srikaran
... View more
01-11-2018
06:32 PM
@Bryan Bende I liked MergeContent option as you suggested. But Please clarify this. In production surveys will come real time as soon as customer write the survey we want to see in HDFS. So my use-case is during 24 hour period which is per day I want to see only 1 file in HDFS and as soon as Surveys were posted I should see that survey in HDFS. If I use Merge Content processor will that be still considered Real-Time? I am guessing it will wait until data reach certain threshold, upon which merge will happen and write to HDFS? During a day there will be times where no surveys at all or bunch of surveys coming at the same time or 1 survey per second. Thanks Srikaran.
... View more
01-11-2018
05:46 PM
@Bryan BendeHi Bryan. We are testing this in DEV and it has only 1 NIFI Node. However the puthdfs cluster has 4 datanodes. Prod we will have 2 nifi nodes and 5 datanodes. Thanks
... View more
01-11-2018
05:29 PM
HI We have a NIFI flow where we are sourcing the social media surveys from an API and writing them to HDFS via PutHDFS processor in with conflict resolution strategy as "append". This flow works if surveys are coming 1 by 1 with a second or 2 seconds delay. We want to test some 20000 surveys all coming at once and "PutHDFS" processor is failing for this scenario. Error is given below: WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: failed to create file XXXXXXXXXXXX for DFSClient_NONMAPREDUCE_XXXXXXXXX because current leaseholder is trying to recreate file. org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:user@XXXXXXXXX (auth:KERBEROS) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file XXXXXXXXXXX for DFSClient_NONMAPREDUCE_XXXXXXXX for client XXXXXXXX because current leaseholder is trying to recreate file. INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from XXXXXXXX Call#XXXXX Retry#0: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file XXXXXXXXX for DFSClient_NONMAPREDUCE_XXXXXXXX because current leaseholder is trying to recreate file. With these exception all the records are getting blocked in nifi queue to puthdfs and eventually they are not writing into HDFS. Is there a way to configure Nifi PutHDFS processor to accomodate this use-case? Rt now its configured under scheduling as "Timer Driven", Concurrent tasks as "1" and with run schedule as 0 seconds. Yield duration is 1 second. Please suggest. Thanks Srikaran
... View more
Labels:
- Labels:
-
Apache NiFi
01-04-2018
06:37 PM
@Karl Fredrickson Hi Karl..Same issue after Stop and restart. I tried 1 hour and 4 hours for Kerberos relogin period as I am using the same relogin period for FetchHDFS/ListHDFS. This is happening only for "GetHDFS". I am assuming "GetHDFS" processor is trying to delete/move or write which might need some other permissions. The HDFS files are owned by hive:hive with 771 permissions. With the same 771 permissions and hive:hive fetchhdfs & listhdfs is working. Thanks
... View more
01-04-2018
05:38 PM
Hi I am using FetchHDFS nifi processor which is running fine to fetch the exact HDFS file. I want to get all HDFS files under a directory hence using GetHDFS by keeping the source file option as "True". But I am getting a Kerberos error saying "ERROR [Timer-Driven Process Thread-1] o.apache.nifi.processors.hadoop.GetHDFS GetHDFS[id=XXXXXXXXXX] Error retrieving file hdfs://XXXXXXXXXXXXXXXXXXXX.0. from HDFS due to java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt): {}
java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:311)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:287)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:287) I am wondering why Same Kerberos credentials are working for "FetchHDFS/ListHDFS" but not "GetHDFS". "GetHDFS" need additional setup? Please suggest. Thanks Srikaran
... View more
Labels:
- Labels:
-
Apache NiFi
12-06-2017
06:26 PM
1 Kudo
@Timothy Spann Thanks a lot. These are very helpful, Let me test the flow and will update accordingly. Thanks
... View more
12-06-2017
06:24 PM
1 Kudo
@anarasimham Looks like GetHDFS will replace HDFS file. I am planning to use fetchHDFS and then invoke http processor. For now I am converting avro file to JSON on Hadoop end and fetching the json and posting it. I will directly test avro & other formats and will update. Thanks!
... View more
12-04-2017
07:17 PM
1 Kudo
Hello. I have a HDFS file for which data needs to be posted to an outside URL (https), I have the user name and password for the URL; I can post a sample JSON via postman from my browser by using the user name and password. Now I have to use Ni-FI for this flow. Please let me know what are the exact nifi processors should I use to get the data from HDFS and post it into the URL via another ni-fi processor. Also kindly let me know what format the HDFS data should be in for these kind of use-cases. Thanks Srikaran
... View more
Labels:
- Labels:
-
Apache NiFi
09-14-2016
03:06 PM
@Predrag Minovic Great options. It looks like from all the options above the 2nd ZK quorum should be installed manually outside Ambari and configure the Kafka accordingly? If that's the case when I do upgrades in future on this cluster I have to take care of 2nd manual ZK quorum upgrade as a separate effort rt? And I like the 2 clusters solution but what if some business logic on cluster 1 is dependent on kafka on cluster 2? In that case I guess "2 clusters " solution will not work rt? Please confirm! Thanks Sri.
... View more