Member since
01-05-2017
153
Posts
10
Kudos Received
2
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4484 | 02-20-2018 07:40 PM | |
| 3306 | 05-04-2017 06:46 PM |
05-04-2017
06:24 PM
Are you referring to the string used to separate the date in PutHDFS? /topics/minifitest/${allAttributes("syslog_year", "syslog_month", "syslog_day", "syslog_hour"):join("/")} Here is our date format example: 2017-05-04 17:15:14,655 We split up 2017 into syslog_year, 05 into syslog_month, 04 into syslog_day, 17 into syslog_hour, 15 in syslog_minute ... etc etc Ultimately we use this string to generate the filename: ${allAttributes("syslog_year", "syslog_month", "syslog_day", "syslog_hour", "syslog_minute"):join("_")} It all parses into directories correctly but then our files over three minutes end up in three correctly named folders (as screenshot) with missing chunks in the filename missing the minute...
... View more
05-04-2017
06:18 PM
Thanks. Using an external script seems worse in regards to processing time than regex would be and while custom Java processor seems appealing, I don't believe thats the direction we wish to go.
... View more
05-04-2017
04:55 PM
Thanks Matt a better solution but still using regex I guess. Im guessing its impossible to get the timestamp from the log without a regex expression - it seems your regex is better than mine. I will still need to do a followup of substring Nifi expression Language because I need the month, day, year, hour, etc saved into different attributes to be able to correctly form my file's filenames and the directory in HDFS.
... View more
05-04-2017
04:40 PM
So you are saying there is no way to extract a Timestamp from the content of a flow file without using Regex, correct?
... View more
05-04-2017
04:28 PM
1 Kudo
Is there a way to use the Nifi Expression Language instead of traditional Regex to get the timestamp from an event of this format? 2017-05-04 14:43:17,302 foo bar foo bar I would assume I'd need something that could find the second white space, take everything before it and save it as an attribute and then I could easily substring that attribute into its parts.
... View more
Labels:
- Labels:
-
Apache NiFi
05-03-2017
07:42 PM
Thanks for the idea. I corrected that but am still seeing the same behavior.
... View more
05-03-2017
04:56 PM
So here is our setup. Server 1: TailFile -> PublishKafka Server 2: ConsumeKafka -> ExtractText -> Update Attribute -> MergeContent -> UpdateAttribute (create filename) -> PutHDFS We currently have it set up to parse out the timestamp from the files and save them as variable using the ExtractText command so we can create our filename and HDFS directories with the variables in this format: Examples: May_03_16_39 (May 3rd at 16:39 pm) May_03_16_40 (May 3rd at 16:40 pm) May_03_16_41 (May 3rd at 16:41 pm) Our directory structure goes down to the minute: 2017/May/03/16/39 What we see is that during the file rollover, it puts a few seconds of data from the end of one file and a few seconds from the beginning of the next file into a file called: May_03_16_ Please see screenshots of file structure output, PutHDFS config, UpdateAttribute (create filename) config and if you could use anything else that would help let me know. We are using the append function with PutHDFS to put all files of the same minute into a specific file.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Kafka
-
Apache NiFi
04-26-2017
03:52 PM
I restarted Kafka and now it seems to be working. Anyone know why this might be happening?
... View more
04-26-2017
03:01 PM
Hello, I am using the command line to create a kafka topic and when I do I dont see the partition directory for logs created in Kafka-logs directory... I am using this command: ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic dataloss30-1 It then says that the topic was successfully created. I can see the topic when I run a command to list the topics such as : ./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic dataloss30-1 --from-beginning I can even see the data IN the topic itself with the command line but when I go to /kafka-logs I do not see a directory for the topic and this is an issue because when I use Nifi to try to ConsumeKafka with it, it cannot find the topic. When I use Nifi to create a topic through auto-generated topics, it creates the partition directory. Has anyone else experienced anything like this or has any ideas as to why its happening?
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
04-24-2017
05:52 PM
That Matt. Im having a data loss issue I cannot figure out and this clarified that thinking they weren't in the "queue of the processor" isn't the culprit...
... View more