Member since
12-13-2016
72
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1459 | 12-27-2017 05:06 AM |
11-04-2020
05:49 PM
Hi, Im trying to put data into hdfs path location, but it didnt work for write content over the respective location. File has been created in hdfs path, but it shows zero bytes and telnet also working fine between nifi server and aws emr cluster i have open all traffic to nifi server. still i couldnt able to write. Please help me out from this issue, here i mentioned logs also; 2020-11-05 01:43:58,148 INFO [NiFi Web Server-204] o.a.n.c.s.StandardProcessScheduler Starting PutHDFS[id=8e0a7636-0175-1000-810b-e0cb6cb164e0] 2020-11-05 01:43:58,168 INFO [Timer-Driven Process Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutHDFS[id=8e0a7636-0175-1000-810b-e0cb6cb164e0] to run with 1 threads 2020-11-05 01:43:58,195 WARN [Thread-135] org.apache.hadoop.hdfs.DataStreamer DataStreamer Exception java.nio.channels.UnresolvedAddressException: null at sun.nio.ch.Net.checkAddress(Net.java:104) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253) at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1725) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716) 2020-11-05 01:43:58,571 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@77fc8f6c // Another save pending = false 2020-11-05 01:44:02,917 INFO [NiFi Web Server-195] o.a.n.c.s.StandardProcessScheduler Stopping PutHDFS[id=8e0a7636-0175-1000-810b-e0cb6cb164e0] 2020-11-05 01:44:02,917 INFO [NiFi Web Server-195] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.hadoop.PutHDFS 2020-11-05 01:44:02,922 INFO [Timer-Driven Process Thread-8] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutHDFS[id=8e0a7636-0175-1000-810b-e0cb6cb164e0] to run 2020-11-05 01:44:03,393 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@77fc8f6c // Another save pending = false
... View more
Labels:
11-01-2020
06:47 AM
Nope @Shelton . Its not kerberized one normal sasl security protocol with sasl.mechanishm plain text only. But i get this error look like this i have tried new nifi version also; 2020-10-31 23:36:16,762 WARN [Timer-Driven Process Thread-5] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-derfsdfdsf-2, groupId=derfsdfdsf] Connection to node -1 (xxxx:9092) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue. ----- 2020-10-30 05:41:18,794 WARN [Timer-Driven Process Thread-6] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-2, groupId=devtes_grp] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials.-
... View more
10-30-2020
02:01 AM
Hi, I'm trying to access kafka broker from nifi using consume_kafka_record_2.0 and kafka has been configured sasl_ssl plaintext manner data needs to consume; While connection on this we facing below stacktrace issue, can you please help me out from this issue; Processor configuration; 2020-10-30 08:52:38,470 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:39,427 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:40,331 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:41,235 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:42,441 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:43,597 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:44,501 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:45,605 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:46,520 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:47,526 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2020-10-30 08:52:48,536 WARN [Timer-Driven Process Thread-8] org.apache.kafka.clients.NetworkClient [Consumer clientId=consumer-20, groupId=devtest_grp11111] Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials.
... View more
Labels:
10-29-2020
11:27 PM
Hi, Im also facing same issue. can you please help how do resolve this issue
... View more
09-07-2020
06:37 AM
Hi, I'm referring this below article; https://community.cloudera.com/t5/Community-Articles/Create-Dynamic-Partitions-based-on-FlowFile-Content-Convert/ta-p/248367 I'm trying to create pipeline in nifi while data coming realtime streaming based say some example kafka, while data put in hdfs in partitioned location, it may ended be with many small files at the same while querying im facing performance lag issue; can you please give some apporaches to resolve small files issue in nifi itself with orc file format;
... View more
06-22-2020
01:21 AM
Hi @hegdemahendra Thanks for the reply. Based on no of incoming flow files. first time incoming file 1 next time incoming file may be 2 or more than.
... View more
06-20-2020
02:16 AM
Hi, Im currently merge content processor with avro files, data will be coming streaming manner, we want to merge with existing file with new file. While i set minimum no of entries as 2 Is there any possiblities to set minimum no of entires as dynamic manner (dynamic no of input flow files).. Can you please help me out.
... View more
Labels:
05-03-2020
11:51 PM
I want to split and transfer the json data in NiFi, Here is my json structure look like this; I want to split json by id1,id2 array of json transfer to respective processor group say example processor_group a,b. I tried with evaluate json path $.id1,$.id2 but i didn't get exact solution. Can you please help me out from this issue; { "id1": [{ "u_name": "aa" }, { "addr": "bb" }], "id2": [{ "u_name": "aa" }, { "addr": "bb" }] }
... View more
Labels:
04-05-2020
10:41 PM
Hi, I'm currently working consumekafka poll messsage every interval 500 to 10000 records, while nifi iteration happen every 30 seconds once. i want to consumekafka once all jobs completed poll the next iteration. But consumekafka processor doesn't support upstream connection. can you help me out from this issue or could please suggest me how i can resolve? once puthdfs success i need to consume next iteration here i mentioned nifi workflow. Workflow: consumekafka -> mergecontent->convertRecord->updateattribute->puthdfs
... View more
Labels:
03-30-2020
05:17 AM
Hi, I try to put/List hdfs from NiFi in aws ec2 instance, But i cant be able to put/List hadoop file system and here is my log in nifi. Please have a look; I have enabled 8020,50010 in emr security group also. Kindly help me out from this issue. 2020-03-30 12:09:30,082 ERROR [Timer-Driven Process Thread-8] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=18f688e9-084d-373b-9f6a-58cb9cdc355a] Failed to properly initialize Processor. If still scheduled to run, NiFi will attempt to initialize and run the Processor again after the 'Administrative Yield Duration' has elapsed. Failure is due to java.nio.channels.UnresolvedAddressException: java.nio.channels.UnresolvedAddressException java.nio.channels.UnresolvedAddressException: null at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:619) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.checkHdfsUriForTimeout(AbstractHadoopProcessor.java:457) at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:363) at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:142) at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:130) at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:75) at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:52) at org.apache.nifi.controller.StandardProcessorNode.lambda$initiateStart$4(StandardProcessorNode.java:1515) at org.apache.nifi.engine.FlowEngine$3.call(FlowEngine.java:123) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
... View more
Labels:
03-25-2020
07:50 PM
Hi, I need a suggestion, while select only rows when certain range met. For example, here is below dataset; id|count|name|status _________________ 1| 10 | xxx | 0 2| 20 | yyy | 0 3| 30 | zzz | 1 4| 40 | qqq | 0 5| 50 | ppp | 0 I want select only between 0 and 1 expected result; id|count|name|status _________________ 1| 10 | xxx | 0 2| 20 | yyy | 0 3| 30 | zzz | 1 sql : select * from tbl where status between 0 and 1 But I got entire values from that table. Please give me any suggestion, im stuck on this.
... View more
Labels:
03-15-2020
11:12 PM
Hi, I created dataflow in NiFi, From two difference sources one from HDFS and another from kafka consumer processor merge those data push it finally put it in HDFS location. Here i need to consume data from two different topic but jobs needs to same, For example; Topic A - if topic A started publishing and i wait for til end publish complete, I started consume data Topic B - If topic B started publishing and i want to wait till the end publish completed, after that start consume data. kafka_consumer(A,B topic) -> update attribute GetHDFS -> update attribute -> merge -> puthdfs Can you please suggest me how can i do that consume from two kafka topic sequentially.
... View more
Labels:
03-04-2020
08:30 PM
@MattWho Thanks for the reply. Sure here i have attached my existing merge_content processor configuration and single node ec2 with 8GB of RAM. can you please clarify if flowfile size gets increase even TB's of data shall i processed with same approach multiple merge_content processor or if it is good with multi-node or increase single node maximum memory availability. JVM Configuration : # JVM memory settings java.arg.2=-Xms2048m java.arg.3=-Xmx2048m
... View more
03-04-2020
01:09 AM
Hi, I'm currently using merge_content processor as merge avro files from two sources like kafka_consumer processor and fetchHDFS file. While converting avro file into one with merge content processor yesterday around 680MB trying to convert but processor drop the file and join with new files and i can't able to recover that data also, because content_repository backup i limit. Can you please help me out for this use_case processor how much size can be good or is there any setting needs to modifiy in nifi.properties.
... View more
Labels:
02-12-2020
01:44 AM
Hi,
Currently i try to update records in hdfs file system, while hdfs partitioned table mapped with hive schema. Data has been dynamically updated in hdfs respective location. We want update our records in hdfs file system, while i try to set hive as transaction table presto doesn't support query that table. is there any alternative to do for incremental updates and external location is hdfs
... View more
Labels:
02-04-2020
04:47 AM
Hi,
We are currently use single node nifi server on aws ec2 instance, We are planning to move multi node cluster architecture. We are in development and testing stage. We need to setup as acvtive and passive nodes, which nifi should works fine whenever one of the server goes down another will pick. can anyone help us to move forward this scenerio or could you suggest any reference architure,
... View more
- Tags:
- NiFi
Labels:
12-29-2019
06:50 AM
Hi, I'm creating dynamic partitioning to push data in hdfs location with date based file structure. while i try to push data respective date folder, i had an issue after splitjson processor number of records mssing before incoming to convertjsontoavro processor. Please help me out from this issue or give me any suggestion. split text_processor split with $.* Dynamic partitioning with orc, i refer this link; https://community.cloudera.com/t5/Community-Articles/Create-Dynamic-Partitions-based-on-FlowFile-Content-Convert/ta-p/248367
... View more
12-15-2019
09:26 PM
Sure thanks. @MattWho. it works!
... View more
12-12-2019
09:27 PM
Hi, We are currently using merge content processor to merge kafka messages with "minimum number of entries" = 500. when 500 messages reached it will merge as single file. That use case works fine. whenever end of the day at 11:59, is there any queue pending in merge content processor of kafka messages needs to dynamically pushed before start to the new date. Kindly help me out from this kind of use case. Workflow : kafka_consumer -> merge_content_prcessor -> puthdfs @mburgess
... View more
Labels:
12-04-2019
05:24 AM
Hi,
We are using aws emr with presto+hive service. while data resides in aws s3. Those tables are configured and data incrementally pusing in s3. While we are benchmarking hdfs vs s3 performance metrics. While consideration, hdfs performance better than s3. But when it comes to data persistence s3 only win. we are strict go with s3 as external location. Please give me any suggestion on increasing performance tuning for presto+s3 on aws emr.
... View more
- Tags:
- Hive
Labels:
12-03-2019
10:59 PM
Hi,
I'm currently working on aws emr and we are using hive-presto service. while data has been exponentially incremente from kafka to put single table(without_partition). Now we are planning to implement partition table, say example with respective date key. when this case, how would we append and compressing orc file respective partition external storage. Please give me suggestion on this;
Existing NiFI Flow;
Lists3->Fetch-s3> update_attribute-> merge_content->put_s3->putHDFS.
kafka_consumer->update_attribute ->
... View more
Labels:
09-22-2019
02:50 AM
Hi,
I'm currently facing issues on apache nifi merge content from two flow files fetchs3object and kafka consume processor. My scenario is whenever new updated records are published into kafka needs to merge with s3 object file and finally pushed into s3 updated file. Here is my flow;
https://imgur.com/a/3CZfsCT
Please help me to resolve from this issue.
Whenever i got updated records sometimes duplicate there or not appending properly. i don't know where is the issue.
... View more
- Tags:
- apache nifi
- aws
- S3
Labels:
09-04-2018
03:18 PM
Hi, We were processing realtime data pushing every minute, i want to pull streaming data from elasticsearch like every second. can we give me suggestion. How could i collect streaming data from elasticsearch. After i collect the data from elasticsearch i need to apply machine intelligence with python support. Can please help me??
... View more
08-31-2018
06:00 AM
Hi, I have planned to create cluster environment with hdp 2.6. we have daily data rate around 40GB. the data size get increase day by day. My question are; 1) is i plan es-hortonworks as in same cluster, it is good solution to achieve faster query response? 2) if i plan like that above, how to calculate number of master node and slave node? Please help me!!!
... View more
02-14-2018
05:07 AM
sorry i wont able get update attribute value to puthdfs file. i have extracted value as you mentioned below https://community.hortonworks.com/questions/170847/how-to-extract-query-param-from-nifi.html Pls suggest me!! thanks
... View more
02-13-2018
07:53 AM
Hi, I have tried more a days on that particular issue, I had issue when i extract based on regexp, i got data in data providence flow file. My aim is After extract text processor -> needs to append the extracted value to filename. Normally filename propertiy added filename using update attribute. Please give me suggestion how do i append the values to puthdfs file. It would much helpful!!! Thanks.
... View more
Labels:
02-06-2018
11:15 AM
1 Kudo
Hi, I had been automate get request based on user query param using nifi with help of InvokeHttp.Here is the example; http://aaa.com/q="bigdata"&api_key="" http://aaa.com/q="apple"&api_key="" I did split text using line by line read and invoke http processor using fetch those response as json format. My question is before invokehttp processor, i want to extract those query param values those query values are static. I need to know which query param using i got response. Please give me some suggestion. It would much helpful. Thanks.
... View more
Labels:
01-24-2018
04:17 PM
@Matt Clarke I don't know which method using fetch Google results either Json/xml format. I'm asking is there any processor available to fetch Google results based on users keywords like Twitter processor.
... View more
01-24-2018
09:55 AM
Hi, I'm working on nifi in hdp 2.5. There is a new requirement normally user entered query/keywords in google and fetch those results either json/xml data format. I need to automate those process, But i didn't able to automate fetch google reuslt using Nifi. Is it possible to do or is there any ways to fetch results like web scrapping instead of programming with help of Nifi. Thanks.
... View more
Labels:
12-27-2017
05:06 AM
I think permission denied issue. Try to make readwrite permission to batch file. Either by using command line or gui. I think below command will able to useful; icacls "C:\Program Files (x86)\Program File" /grant Everyone:M Like that mention your nifi file location. I hope it helps!!!
... View more