Member since
08-16-2016
18
Posts
3
Kudos Received
0
Solutions
05-31-2017
06:39 PM
1 Kudo
Is there any plans to have the functionality to JOIN different datasets within NiFi in future releases? because that would be swell.
... View more
05-16-2017
02:45 AM
Alright. What do you think would cause that error?
... View more
05-16-2017
02:44 AM
Hi @Dan Chaffelson, sorry to not update my comment. I was able to troubleshoot it. It was an issue from the CDH side and not with the NAR file. It's working for me now. Thanks for sharing this article. Really helped me out! 🙂
... View more
05-10-2017
10:36 PM
Hi @Timothy Spann, I am trying to store files to Minio using PutS3Object processor but I get this error -
to Amazon S3 due to com.amazonaws.AmazonClientException: Unable to reset stream after calculating AWS4 signature: com.amazonaws.AmazonClientException: Unable to reset stream after calculating AWS4 signature
Is it because of the region setting? My minio instance is hosted in the east coast lab but I am trying to access it via NiFi from the west coast. I tried setting the region to us-west-1, us-west-2, us-east-1 but I get the same error. Can you provide any insight?
... View more
05-01-2017
11:58 PM
Hi @Dan Chaffelson, I had the backward compatibility issue and I followed your steps and pasted the nifi-hive-nar into my NiFi 1.1.2 instance. Now , SelectHiveQL was able to connect and query the table but it only gives me the headers(column names) and doesn't retrieve the data. My query was select * from table limit 100. Any idea why? The nifi-app.log wasn't updated either
... View more
12-22-2016
12:07 AM
I have multiple GetSplunk processors running using a Cron driven scheduling strategy. The Cron expression looks like '0 30 13 * * ?'. They all successfully execute the query the first time it's run. But, the next day it errors out with a 401 error from Splunk. The error from nifi-app.log is as below WARN [Timer-Driven Process Thread-7] o.a.n.c.t.ContinuallyRunProcessorTask Administratively Yielding GetSplunk[id=01581009-026c-114b-5e2e-401ebea6427d] due to uncaught Exception: com.splunk.HttpException: HTTP 401 -- call not properly authenticated
2016-12-21 13:30:00,300 WARN [Timer-Driven Process Thread-2] o.a.n.c.t.ContinuallyRunProcessorTask
com.splunk.HttpException: HTTP 401 -- call not properly authenticated
at com.splunk.HttpException.create(HttpException.java:84) ~[na:na]
at com.splunk.HttpService.send(HttpService.java:452) ~[na:na]
at com.splunk.Service.send(Service.java:1293) ~[na:na]
at com.splunk.HttpService.get(HttpService.java:165) ~[na:na]
at com.splunk.Service.export(Service.java:222) ~[na:na]
at com.splunk.Service.export(Service.java:237) ~[na:na]
at org.apache.nifi.processors.splunk.GetSplunk.onTrigger(GetSplunk.java:461) ~[na:na]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064) ~[nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at org.apache.nifi.controller.scheduling.QuartzSchedulingAgent$2.run(QuartzSchedulingAgent.java:165) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
After doing some research, it seems this might be an issue due to multiple threads. Have anyone of you bumped into this? Help appreciated.
... View more
Labels:
- Labels:
-
Apache NiFi
12-19-2016
08:56 PM
Hey @Matt, I am using a cron job scheduling strategy too and my cron job is "0 30 13 * * ?" similar to what is discussed here. But, for some reason, it just runs for the day I create it and doesn't repeat the next day. Do you know the reason why? I keep NiFi running and don't think it's shutting down for the cron job to not execute. Any help appreciated.
... View more
11-03-2016
05:50 PM
Well, seems like there's no workable workaround for the joins. Would have to use Spark or pushing to HDFS or any other datasource to query. I should also look if I can just trigger Drill and run it as a service. Thanks for the help though.
... View more
11-03-2016
05:26 PM
@Timothy Spann I have 1 common column in both these tables and preferably would want to join them and have one wide row. After that, I have certain case statements that I need to run on those rows which can be possible using splitText and updateAttribute's advanced rule settings. My preference was to get this done within Nifi without making use of any external service. According to your 1st point, you mean to have the common columns have the same name aliases using the SELECT statement within executeSQL and then would mergeContent be able to identify those common names within those AVRO flowfiles? Also, both the tables wouldn't necessarily have the same number of fields and I am not sure why is that necessary here. If you could care to expand on that, it would be great. Also, I don't have much knowledge on drill but can drill be triggerred by NiFi to perform the service and return the result back? Or will I have to send data , then run that JOIN independently in Drill and then make it send to NiFi. Reason I am asking is that I want a continuous flow and don't want to stop in between to go and run another service.
... View more
11-02-2016
11:17 PM
2 Kudos
Hi, I am pulling data from tables residing in different data sources. For example, one ExecuteSQL processor queries a Teradata instance and the other ExecuteSQL processor queries a table in a DB2 instance. After asking the NiFi forum, I got to know that a SQL JOIN feature within NIFI is not available yet but I wanted to ask if there is any workaround to achieve this where I could merge the results of these tables on a common field and then perform additional actions on that merged data. I know it would be easier to have all the tables in a single data source and then directly have a select query with a JOIN but that can't be done in my particular use case. Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
08-30-2016
10:00 PM
@Jobin George Thank you for the tutorial. So, I am trying to get Nifi user authentication by binding it to my company's LDAP server. So, I gave in the details in xml file and got the certs from tinycert. I added the browser cert to the login Keychain(Mac). But, when I try to run Nifi and then access through browser, it doesn't load and it says "the site can't be reached". It worked well when I ran Nifi with http. Is it because I am running Nifi from an Ubuntu VM and accessing the browser through my Mac (I don't really think that would be the issue)? Or is it because of proxy server (Again, that wasn't a problem when i ran Nifi as a non-secure instance on an http port)? Any tips would be greatly appreciated. 🙂
... View more
08-17-2016
11:09 PM
I did as you said but to no avail. It's the same. And, there's nothing populating the bulletin board for me to debug.
... View more
08-16-2016
07:26 PM
Hi, Thank you very much for the tutorial. I am new to Nifi and trying out some use cases. I followed your steps to write a Jython script where I read from an xml file(rss) and then convert it into a string and write to outputStream and routed to putFile. The problem I am facing is that the data is getting queued in the connection to ExecuteScript and not getting into the processor. I am posting the code I wrote here in the ExecuteScript Script body. Did I need to point it to a module directory? import xml.etree.ElementTree as ET
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class xmlParser(StreamCallback):
def __init__(self):
pass
def process(self,inputStream,outputStream):
text= IOUtils.ToString(inputStream,StandardCharsets.UTF_8)
xmlRoot = ET.fromstring(text);
extracted_list=[]
for elmnts in xmlRoot.fromall('item'):
title= elmnts.find("title").text
description = elmnts.find("description").text
extracted_list.append(title)
extracted_list.append(description)
str_extract = ''.join(extracted_list)
outputStream.write(bytearray(str_extract.encode('utf-8')))
flowFile = session.get()
if(flowFile!=None):
flowFile = session.write(flowFile,xmlParser())
flowFile = session.putAtrribute(flowFile, 'filename', 'rss_feed.xml')
session.transfer(flowFile, REL_SUCCESS)
session.commit()
If you can help me with this, it would be great.
... View more