Member since
11-30-2017
44
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
737 | 04-16-2018 07:49 PM | |
971 | 01-05-2018 02:31 PM |
10-08-2018
03:12 PM
Thanks @Aditya Sirna, I think this will get me what I need. What I'm ultimately trying to get is the HDFS location so I can use it in the script I'm writing. Is performing a describe table and then grepping the output the best way to do this?
... View more
10-05-2018
03:27 PM
As part of a script I'm writing, I'm wanting to get the HDFS location of a list of Hive schemas that's passed in via a text file. The best way I can think to do this is to run a beeline command in a loop that performs a `describe schema` command and extracts the HDFS location of each schema from this output. However, this will require me to authenticate every time I run this command which is inefficient. Is there a better way to programtically get the HDFS locations of a list of Hive schemas?
... View more
Labels:
- Labels:
-
Apache Hive
09-17-2018
12:00 PM
Thanks for the explanation @Shu. Do you know if there's any accepted way or best practice for monitoring the cluster status, or is this going to be the best way?
... View more
09-14-2018
08:14 PM
I am wanting to know what the best way would be to monitor the status of a NiFi cluster. Currently I get Ambari alerts if the actual NiFi service stops, but I do not receive any alert if a NiFi node disconnects from the NiFi cluster. I was thinking of having something run within NiFi to get the total amount of nodes connected to the cluster and use this to send alerts. Is there some better way of doing this or generally accepted way?
... View more
Labels:
- Labels:
-
Apache NiFi
09-04-2018
03:36 PM
@Matt Clarke I actually have it set to `USE_DN`, but have nifi.security.user.login.identity.provider set to `kerberos-provider`. LDAP provider is commented out in that file.
... View more
09-04-2018
03:13 PM
1 Kudo
Depending on the user's configuration on their computer, when trying to login to NiFi they either come in as <username> or <username>@<realm>. If the user comes in as <username>@<realm>, for NiFi I have to add them as this user in Ranger instead of their normal username, and then give permissions on <username>@<realm>. This makes permissions difficult to manage. In NiFi's configuration, I have nifi.security.identity.mapping.pattern.kerb set to `EMAILADDRESS=(.*?), CN=([^@]+)` and nifi.security.identity.mapping.value.kerb set to `$2`. I thought this should match up to the `@` sign so only the username would be passed to NiFi for authentication but this doesn't seem to be the case. What's going on here?
... View more
Labels:
- Labels:
-
Apache NiFi
08-29-2018
02:19 PM
1 Kudo
That worked, thanks.
... View more
08-29-2018
01:17 PM
I didn't think of querying the JMX, this works fantastically, thanks! Actually @Jonathan Sneep I am not able to curl to the JMX as I need authorization. This cluster is kerberized and using SSL. I tried searching around on how to pass credentials but couldn't find anything. Any ideas?
... View more
08-29-2018
12:49 PM
1 Kudo
I am currently deleting a large amount of small files and am needing to monitor the number of blocks pending deletion so I do not put too much of a load on the cluster. I know this number can be viewed in the NameNode UI, but I would like to run a command instead which I can then use in combination with `watch` so I can get continuous updates. Does such a command exist?
... View more
Labels:
- Labels:
-
Apache Hadoop
08-28-2018
06:42 PM
@snukavarapu Thanks for the article, this worked great for me. Is this something you keep continuously updated? If so, what's your strategy for keeping the table updated?
... View more
08-23-2018
05:44 PM
This worked, thanks!
... View more
08-21-2018
07:59 PM
I am wanting to delete a process group from NiFi by using a curl command to the API. I tried using the following command. curl -k -X DELETE -H "Authorization: Bearer $token" $cluster_url/nifi-api/process-groups/$process_group_uuid But I get the error that the revision must be specified. When I delete a process group using the UI, I can see using Chrome dev tools that both a revision and client ID are specified. How can I get these values so I can use curl to delete the process group?
... View more
Labels:
- Labels:
-
Apache NiFi
07-31-2018
01:02 PM
@Pierre Villard This works for me if I leave out the `clientID` part as NiFi will generate it's own, but I have a use case where I would like to provide my own UUID. Is this possible? I tried using `uuidgen` to get a UUID in bash but the POST then fails.
... View more
07-31-2018
12:40 PM
I am writing a script that will dynamically create a process group with a given name based on user input. I then am wanting to deploy a template inside of this process group immediately after it gets created. Since NiFi generates the UUID for the new process group after the POST to the API, I am stumped on how to get the UUID of the newly created process group. How can I accomplish this so I can then deploy a template inside of it?
I tried generating a UUID myself for the process group in bash by using `uuidgen` so I could easily track this in a variable, however the POST fails for some reason when I send this along with the name.
Here is the curl that is failing when I provide a uuid I generated. curl -k -X POST -v -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{"revision":{"clientID":"'$uuid'","version":0},"component":{"name":"'$CLUSTER_NAME' '$FLOW_NAME'"}}' https://myhost.com:9091/nifi-api/process-groups/$EXISTING_PROCESS_GROUP/process-groups/ If I remove the `clientID` part of the JSON, the POST works and NiFi generates a UUID for me, which results in my original problem.
... View more
Labels:
- Labels:
-
Apache NiFi
07-23-2018
06:41 PM
@sbabu Getting the token this way does not work for me as there is a '%' in the password I am using, which causes curl to throw an error. How can I get around this? The script I am developing is going to be used by different people so I cannot know ahead of time to escape any special characters.
... View more
06-25-2018
05:54 PM
@Aakash Singh Did you find out how to do this on a kerberized cluster?
... View more
06-25-2018
01:09 PM
@Smart Solutions
Were you able to get an answer for this?
... View more
06-06-2018
01:00 PM
Is there any way to currently get user-level metrics for YARN jobs? I was hoping to get total amount of jobs submitted by the user and total amount of job failures. I am also wanting to get user-level metrics for Hive queries for the same thing, total amount of queries and failures. I was able to find information on getting user-level HDFS metrics but am struggling with these two other services.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache YARN
05-07-2018
02:10 PM
@Matt Clarke
Gotcha. One more question, since it appears the monitorActivity just sends a status, and not the flowfile, is there anyway I can get the attributes of the last flowfile? I'm wanting to include the hostname and original bulletin message in the OK status email, but using the "inactive" relationship seems to make this not possible as the attributes from the original flowfile aren't carried over.
... View more
05-07-2018
01:41 PM
Thanks for the suggestion @Abdelkrim Hadjidj. I'm going to use this to send the initial alert and then use @Matt Clarke's suggestion to send an OK alert when the issue has been resolved.
... View more
05-07-2018
01:20 PM
@Matt Burgess Thanks for opening the JIRA Matt. As a workaround in the meantime, I discovered I can use ${s2s.host} to get the host name, I just then need to use an UpdateAttribute processor to add this as an actual property to the flowfile.
... View more
05-04-2018
07:32 PM
@Matt Clarke Thanks for the suggestion, this should give me what I want. One thing that I thought of as I was implementing this however, is currently I have six different disks I am monitoring with six different DiskUsage reporting tasks, as I have different NiFi repositories split over these disks. To keep things clean, these are all being picked up by one SiteToSiteBulletinReportingTask and being submitted to the same process group for alerting. If say disk1 went over the threshold, then disk2 went over while disk1 was still over the threshold, I would never receive an email alert about disk2. The only way around this I can think of is having a separate MonitorActivity for each disk, but then this requires hardcoding which I would rather avoid as this makes it difficult to deploy this monitoring between environments with different disk configurations. Can you think of any way around this?
... View more
05-04-2018
03:06 PM
1 Kudo
Currently I have a flow in place that will monitor disk usage by using the MonitorDiskUsageReportingTask in conjunction with the
SiteToSiteBulletinReportingTask . At the end of the flow, I have the flowfile, which is essentially each bulletin, being put to email along with a message describing the alert. My issue is, since I have this reporting task set to run every minute to ensure we get the most up to date alerts regarding disk usage, I will receive an email every minute once the disk usage goes over the threshold I set. Is there a way I can configure it so only one email is sent once this goes over the threshold? And then maybe have an "OK" status email go out after the disk usage has decreased beneath the threshold?
... View more
Labels:
- Labels:
-
Apache NiFi
05-04-2018
01:09 PM
@Matt Burgess This would be better as it would be easier to map the hostname but still not a perfect solution. Also, I didn't see this as a field in the bulletin's JSON. I'm currently running HDF 3.0.1 (NiFi 1.2) so I'm assuming it's not in my version?
... View more
05-03-2018
07:47 PM
I am developing a flow where I need the hostname of where a bulletin is coming from. Currently, a large portion of my flow is getting this by me first manually determining what hostname is associated with which bulletinNodeId, and then using the RouteOnAttribute processor to map this to the actual hostname. This isn't going to be scalable if I need to first manually determine which bulletinNodeId is associated with which hostname. Is there a better way to go about getting this hostname?
... View more
Labels:
- Labels:
-
Apache NiFi
04-16-2018
07:49 PM
HDF 3.1.1 is not compatible with HDP 2.6.1. I was told by Hortonworks that HDP 3.0 will be compatible and I must wait until its release.
... View more
04-16-2018
07:32 PM
Hi Matt, I was able to implement these steps and it did exactly what I needed. Thanks!
... View more
04-16-2018
05:06 PM
I am constantly seeing this error message in the nifi-app.log file. 2018-04-16 11:00:36,383 ERROR [nifi.async.batch_nifi.async.batch.hdfs_destWriter] o.a.r.audit.provider.BaseAuditHandler Error writing to log file.
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.ranger.audit.destination.HDFSAuditDestination.getLogFileStream(HDFSAuditDestination.java:271)
at org.apache.ranger.audit.destination.HDFSAuditDestination.access$000(HDFSAuditDestination.java:43)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:157)
at org.apache.ranger.audit.destination.HDFSAuditDestination$1.run(HDFSAuditDestination.java:154)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.ranger.audit.provider.MiscUtil.executePrivilegedAction(MiscUtil.java:524)
at org.apache.ranger.audit.destination.HDFSAuditDestination.logJSON(HDFSAuditDestination.java:154)
at org.apache.ranger.audit.queue.AuditFileSpool.sendEvent(AuditFileSpool.java:880)
at org.apache.ranger.audit.queue.AuditFileSpool.runLogAudit(AuditFileSpool.java:828)
at org.apache.ranger.audit.queue.AuditFileSpool.run(AuditFileSpool.java:758)
at java.lang.Thread.run(Thread.java:745)
Since this is a bug related to the version of NiFi I am on, I do not need to see it as it happens very frequently and is cluttering the log file. How can I exclude this so it is not logged? Is there some configuration I can do in the logback.xml file?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
03-28-2018
01:36 PM
I am not hitting an LDAP server directly and am instead using an LDAP proxy that gives us HA to our LDAP servers. Due to the way this software works, I must use posix attributes. With my current configuration, the users in the group I am syncing appear to be syncing properly as I see them in Ambari when logged in as the default admin user, however I cannot login using my AD credentials and get the message "Unable to sign in. Invalid username/password combination." in the Ambari UI. Below is my Ambari LDAP configuration. authentication.ldap.baseDn=DC=my,DC=company,DC=com
authentication.ldap.bindAnonymously=false
authentication.ldap.dnAttribute=DC=my,DC=company,DC=com
authentication.ldap.groupMembershipAttr=memberUid
authentication.ldap.groupNamingAttr=cn
authentication.ldap.groupObjectClass=posixgroup
authentication.ldap.managerDn=CN=usernameforbind,OU=Application Accounts,DC=my,DC=company,DC=com
authentication.ldap.managerPassword=/etc/ambari-server/conf/ldap-password.dat
authentication.ldap.pagination.enabled=false
authentication.ldap.primaryUrl=myldapproxy.com:389
authentication.ldap.referral=follow
authentication.ldap.secondaryUrl=myldapproxy.com:389
authentication.ldap.useSSL=false
authentication.ldap.userObjectClass=posixaccount
authentication.ldap.usernameAttribute=sAMAccountName
I noticed the group names are lower case in Ambari whereas they are uppercase in AD. Could this cause issues? Is there any way to see how Ambari is querying LDAP so I can debug this further?
... View more
Labels:
- Labels:
-
Apache Ambari
03-21-2018
07:15 PM
@Manmeet Kaur Did you find the solution? I'm facing the same issue with this user also working for Ranger login.
... View more