About raghav130593

raghav130593 · ‎05-31-2017

Is there any plans to have the functionality to JOIN different datasets within NiFi in future releases? because that would be swell.

raghav130593 · ‎05-16-2017

Alright. What do you think would cause that error?

raghav130593 · ‎05-16-2017

Hi @Dan Chaffelson, sorry to not update my comment. I was able to troubleshoot it. It was an issue from the CDH side and not with the NAR file. It's working for me now. Thanks for sharing this article. Really helped me out! 🙂

raghav130593 · ‎05-10-2017

Hi @Timothy Spann, I am trying to store files to Minio using PutS3Object processor but I get this error - to Amazon S3 due to com.amazonaws.AmazonClientException: Unable to reset stream after calculating AWS4 signature: com.amazonaws.AmazonClientException: Unable to reset stream after calculating AWS4 signature Is it because of the region setting? My minio instance is hosted in the east coast lab but I am trying to access it via NiFi from the west coast. I tried setting the region to us-west-1, us-west-2, us-east-1 but I get the same error. Can you provide any insight?

raghav130593 · ‎05-01-2017

Hi @Dan Chaffelson, I had the backward compatibility issue and I followed your steps and pasted the nifi-hive-nar into my NiFi 1.1.2 instance. Now , SelectHiveQL was able to connect and query the table but it only gives me the headers(column names) and doesn't retrieve the data. My query was select * from table limit 100. Any idea why? The nifi-app.log wasn't updated either

raghav130593 · ‎12-22-2016

I have multiple GetSplunk processors running using a Cron driven scheduling strategy. The Cron expression looks like '0 30 13 * * ?'. They all successfully execute the query the first time it's run. But, the next day it errors out with a 401 error from Splunk. The error from nifi-app.log is as below WARN [Timer-Driven Process Thread-7] o.a.n.c.t.ContinuallyRunProcessorTask Administratively Yielding GetSplunk[id=01581009-026c-114b-5e2e-401ebea6427d] due to uncaught Exception: com.splunk.HttpException: HTTP 401 -- call not properly authenticated 2016-12-21 13:30:00,300 WARN [Timer-Driven Process Thread-2] o.a.n.c.t.ContinuallyRunProcessorTask com.splunk.HttpException: HTTP 401 -- call not properly authenticated at com.splunk.HttpException.create(HttpException.java:84) ~[na:na] at com.splunk.HttpService.send(HttpService.java:452) ~[na:na] at com.splunk.Service.send(Service.java:1293) ~[na:na] at com.splunk.HttpService.get(HttpService.java:165) ~[na:na] at com.splunk.Service.export(Service.java:222) ~[na:na] at com.splunk.Service.export(Service.java:237) ~[na:na] at org.apache.nifi.processors.splunk.GetSplunk.onTrigger(GetSplunk.java:461) ~[na:na] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064) ~[nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT] at org.apache.nifi.controller.scheduling.QuartzSchedulingAgent$2.run(QuartzSchedulingAgent.java:165) [nifi-framework-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] After doing some research, it seems this might be an issue due to multiple threads. Have anyone of you bumped into this? Help appreciated.

raghav130593 · ‎12-19-2016

Hey @Matt, I am using a cron job scheduling strategy too and my cron job is "0 30 13 * * ?" similar to what is discussed here. But, for some reason, it just runs for the day I create it and doesn't repeat the next day. Do you know the reason why? I keep NiFi running and don't think it's shutting down for the cron job to not execute. Any help appreciated.

raghav130593 · ‎11-03-2016

Well, seems like there's no workable workaround for the joins. Would have to use Spark or pushing to HDFS or any other datasource to query. I should also look if I can just trigger Drill and run it as a service. Thanks for the help though.

raghav130593 · ‎11-03-2016

@Timothy Spann I have 1 common column in both these tables and preferably would want to join them and have one wide row. After that, I have certain case statements that I need to run on those rows which can be possible using splitText and updateAttribute's advanced rule settings. My preference was to get this done within Nifi without making use of any external service. According to your 1st point, you mean to have the common columns have the same name aliases using the SELECT statement within executeSQL and then would mergeContent be able to identify those common names within those AVRO flowfiles? Also, both the tables wouldn't necessarily have the same number of fields and I am not sure why is that necessary here. If you could care to expand on that, it would be great. Also, I don't have much knowledge on drill but can drill be triggerred by NiFi to perform the service and return the result back? Or will I have to send data , then run that JOIN independently in Drill and then make it send to NiFi. Reason I am asking is that I want a continuous flow and don't want to stop in between to go and run another service.

raghav130593 · ‎11-02-2016

Hi, I am pulling data from tables residing in different data sources. For example, one ExecuteSQL processor queries a Teradata instance and the other ExecuteSQL processor queries a table in a DB2 instance. After asking the NiFi forum, I got to know that a SQL JOIN feature within NIFI is not available yet but I wanted to ask if there is any workaround to achieve this where I could merge the results of these tables on a common field and then perform additional actions on that merged data. I know it would be easier to have all the tables in a single data source and then directly have a select query with a JOIN but that can't be done in my particular use case. Thanks

Online	Offline
Last Visited	‎06-01-2017 12:19 AM

Member Since	‎08-16-2016 07:16 PM
Last Visited	‎06-01-2017 12:19 AM
Posts	18
Kudos received	3

Cloudera Community

Re: Joining tables within NiFi

Re: Working with S3 Compatible Data Stores via Apa...

Re: Connecting NiFi to CDH Hive

Re: Working with S3 Compatible Data Stores via Apa...

Re: Connecting NiFi to CDH Hive

Cron scheduling strategy with GetSplunk fails on s...

Re: Helping setting up cron-based nifi processor

Re: Joining tables within NiFi

Re: Joining tables within NiFi

Joining tables within NiFi