About cotopaul

cotopaul · ‎04-28-2023

@luv4diamonds On my instances where I am running the embedded zookeeper, I am using port 2182 and not 2181. Maybe you can try and change it like that and test. In addition, I assume that you have the file myid generated on every node, right?(https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.5.1/nifi-state-management/content/embedded_zookeeper.html)

cotopaul · ‎04-27-2023

@AntonBV, You could give it a try with PartitionRecord, which will place the results of a RecordPath directly into a FlowFile Attribute. I am using it already on AVRO data so it should work for you as well. PartitionRecord: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apache.nifi.processors.standard.PartitionRecord/index.html Record Path: https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html

cotopaul · ‎04-27-2023

Add a UpdateAttribute in front of PutHDFS and use NEL to rename your file from ${filename} to ${filename}.parquet and then save it into HDFS wherever you want.

cotopaul · ‎04-27-2023

@luv4diamonds, I do not have much experience with zookeeper but if you are running a 3 node nifi cluster and you have installed zookeeper on one of those nodes, why don't you use the embedded zookeeper directly? It would make sense to have your external zookeeper made totally external as right now you are assigning resources to your zookeeper instance as well. For example if you will have a problem with the node on which you have your zookeeper client, the entire cluster will be affected, whereas if you had your zookeeper on a different machine, the other nodes would not be affected. This is just an opinion 🙂 Also, please share some light on what NiFi Version you are using and how you configured your properties? I am talking here about nifi.properties, zookeeper.properties and state-management.xml.

cotopaul · ‎04-26-2023

Well in this case, I see two possible options: - You write a custom Script (Groovy, Bash, Python, Java, whatever you want) and you execute it in a ExecuteStreamCommand processor. For example in Python you read your data with sys.stdin and write it down the stream with sys.stdout, after doing all the processing your require. - Another option would be to split each line of your JSON, evaluate the length of each element using NiFi Expression Language and create 10 UpdateAttribute/UpdateRecord to append some white spaced to your data. If you have a large JSON file you might get into trouble, with to many open files and so on.

cotopaul · ‎04-26-2023

If you need the parquet extension you can use PutHDFS and define the path to your location and add the extension after your filename. Something like: /path/to/where/you/want/${filename}.parquet. otherwise you can implement an updateattribute before puthdfs and rename your flowfile from ${filename} into ${filename}.parquet. Or you can use PutParquet and that's is all.

cotopaul · ‎04-26-2023

First of all, you need to identify how the data comes from your kafka processor.Once you identified how the data comes from your kafka brokers, you can define the Record Reader in your MergeRecord Processor --> based on your original description I assume JSON, so you need a JSONTreeReader. Knowing that you want your data to be delivered as parquet, within your Record Writer, you must define a ParquestRecordSetWriter Controller Service, which will transform your data (which is read with the Record Reader) into a Parquet File (which is written with the Record Writer).

cotopaul · ‎04-25-2023

The file can be found within your NiFi logs folder, where you have your other logs as well.

cotopaul · ‎04-24-2023

Hi @ushasri, What @MattWho explained is that if you are using multiple users (or a single user used by several people) on your NiFi Instance, what you are trying to achieve is not quite possible .... or not in an easy way, as you will require lots of work and hard-coded information. If you certain that you want to do such a thing, you could use a TailFile Processor and tail the nifi-user.log file, for any newly added lines. You will get a bunch of lines like: (and a bunch means really A LOT) 2023-04-24 16:15:37,525 INFO [NiFi Web Server-126216] o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 123.123.123.123 [<admin><CN=localhost-yabadabady, OU=NIFI><CN=localhost-yabadabadu, OU=NIFI>] GET https://localhost-yabadabady:9091/nifi-api/flow/current-user 2023-04-24 16:15:37,525 INFO [NiFi Web Server-126216] o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [admin] 123.123.123.123 GET https://localhost-yabadabady:9091/nifi-api/flow/current-user From those lines, you can extract whatever information you need and send it down your stream to have it inserted in your database. It is recommended to first analyze the logfile and see what lines are truly required and only extract those, otherwise, like Matt said, prepare to get a very noisy file. As you can see, any action performed generates at least 2 lines: 1 for request and 1 for acceptance. For extracting your lines, you have many options, depending on what you are trying to achieve like ExtractText and SplitContent. Next, you can use an RouteOnAttribute to identify what you want to keep, then go into your Database with a processor like PutDatabaseRecord or PutSQL. So the logic is up to you, depending on your use case. PS: if going further with this action, make sure that you followed all the recommendations and best practices when installing NiFi --> especially those related to the infrastructure parameters like OPEN Files and MAX USER PROCESSES (ulimit -a)

cotopaul · ‎04-24-2023

Do you still get the error message 2023-04-23 00:43:19,290 ERROR [NiFi logging handler] org.apache.nifi.StdErr Failed to start web server: Failed to bind to /127.0.0.1:8443 2023-04-23 00:43:19,290 ERROR [NiFi logging handler] org.apache.nifi.StdErr Shutting down... Your initial error was with 8443 but you are using 8080. Try using a different port, like 9091, just for testing purpose. If it works, your problem is with your PORT. If it does not work, you problem is with the host. Otherwise, I have no idea. I just downloaded NiFi and used the default configurations (like you have) and it worked directly.

Online	Offline
Last Visited	‎03-14-2024 06:37 AM

Member Since	‎01-27-2023 08:25 AM
Last Visited	‎03-14-2024 06:37 AM
Posts	229
Kudos received	73

Cloudera Community

Re: About mergecontent question

Re: how can get the content of Json record and val...

Re: DBCP Connection Pool can't connect to "Progres...

Re: terminate kafka connection if publish kafka pr...

Re: Not able to delete an inifinite loop built wit...

Re: nifi with external zookeeper errors

Re: NiFi - Extract content from AVRO FlowFile to a...

Re: Kafka-->Nifi--parquet--->HDFS

Re: nifi with external zookeeper errors

Re: Write /Prepare Fixed width length file in Nifi

Re: Kafka-->Nifi--parquet--->HDFS

Re: Kafka-->Nifi--parquet--->HDFS

Re: Extract Username In NIFI

Re: Extract Username In NIFI

Re: Process ID Error when trying to run NiFi