About bbende

bbende · ‎04-05-2018

This is not currently supported, but there is a JIRA for this issue: https://issues.apache.org/jira/browse/NIFI-4487 Part of the issue is that this would only make sense if you are consuming 1 message per flow file, which generally is poor for performance. So what do you do when you consume 10k messages into a single flow file? For ConsumeKafkaRecord then the potentially the timestamp could be put into a field in each record, assuming the schema had a timestamp field, but for regular ConsumeKafka there would be no way to handle it.

bbende · ‎04-04-2018

Glad to hear it! Can you please mark this answer as "accepted" ?

bbende · ‎04-04-2018

The error is coming from trying to read the flow.xml.gz file which is typically in the conf dir. It seems this file may be corrupted. If you don't care about your flow, then you can blow away this file to start fresh. If you do care about your flow then you'll need to figure out what happened to this file. If it is a valid file you should be able to run gunzip to get it back to a regular flow.xml file, and then gzip to get it back to flow.xml.gz

bbende · ‎04-02-2018

Was able to make this work with a few modifications... 1) logicalTypes need to be represented as a sub-type like this: { "name": "session_number", "type": { "type": "bytes", "logicalType": "decimal", "precision": 20 } } Instead of: { "name": "session_number", "type": "bytes", "logicalType": "decimal", "precision": 20 } 2) You can't use "?" for missing values because if the missing value is a number (or something other than string) than it will try to parse "?" into the given type and fail. To work around this you can make the fields in the schema nullable by using a union of "null" and the real type. 3) Your input data has the timestamp in a string format, but the schema specifies it as timestamp millis, so the CSV Reader needs the timestamp property set to yyyy-MM-dd'T'HH:mm:ss'Z' Here is the full schema with the changes mentioned in #1 and #2: { "type": "record", "name": "page_record", "fields" : [ {"name": "session_number", "type": { "type": "bytes", "logicalType": "decimal", "precision": 20} }, {"name": "tracking_uuid", "type": "string"}, {"name": "page_location_domain", "type": "string"}, {"name": "page_title", "type": "string"}, {"name": "referring_page_instance_id", "type": ["null", { "type": "bytes", "logicalType": "decimal", "precision": 20} ]}, {"name": "page_instance_id", "type": { "type": "bytes", "logicalType": "decimal", "precision": 20}}, {"name": "event_timestamp", "type": { "type": "long", "logicalType": "timestamp-millis"}}, {"name": "page_location", "type": "string"}, {"name": "attribution_sequence_in_session", "type": "long"}, {"name": "page_sequence_in_session", "type": "long"}, {"name": "page_sequence_in_attribution", "type": "long"}, {"name": "top_level_window_id", "type": "string"}, {"name": "profile_uuid", "type": ["null", "string"]} ] } And here is the input data with the question marks removed: col_1;col_2;col_3;col_4;col_5;col_6;col_7;col_8;col_9;col_10;col_11;col_12;col_13 36531253;4787ea68-4276-4b3b-b154-d70419d23113;https://www.dummyexample.com;My Dummy website, description;;365311753;2018-01-02T07:08:40Z;https://www.dummyexample.com/fr/openspace/loggin?axes4=priv;1;1;1;_15148769143240.5030172901622478_;

bbende · ‎03-29-2018

Can you provide the full stacktrace from nifi-app.log for the above error?

bbende · ‎03-29-2018

When NiFi refactored it's security model between 0.x and 1.x lines, templates were moved to be associated with the process group where you uploaded the template. This was done so that template was protected by the same security policies as the process group where it was uploaded. Unfortunately the "View Templates" capability is still from the global menu, but should really be from the context palette on the left based on the process group you are in.

bbende · ‎03-13-2018

A possible work-around might be to use a SegmentContent processor on the sending side before the RPG, and then on the receiving side use a MergeContent processor in Defragment Mode. This would break up the file into smaller chunks on the sending send (possibly improve performance with concurrent transfers) and then reassemble it on the receiving side.

bbende · ‎03-12-2018

Native through the standard Hadoop Java client.

bbende · ‎03-09-2018

Matt is correct, and just to elaborate further... the Parquet API only allows writing through the Hadoop FileSystem, so the only way NiFi can write Parquet to local filesystem is through the configuration Matt mentioned above.

bbende · ‎03-05-2018

Please format the code so that it is readable.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Read Kafka timestamp as an attribute in Nifi

Re: How to solve a serialization error in PublishK...

Re: NiFi-1.5 not starting and showing error org.ap...

Re: How to solve a serialization error in PublishK...

Re: How to solve a serialization error in PublishK...

Re: Can't delete process group - "because it conta...

Re: Configuration for sending large flowfiles via ...

Re: Does PutParquet processor support writing to L...

Re: Does PutParquet processor support writing to L...

Re: Transfer relationship not specified and failed...