Member since
07-14-2017
99
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1421 | 09-05-2018 09:58 AM | |
1913 | 07-31-2018 12:59 PM | |
1407 | 01-15-2018 12:07 PM | |
1312 | 11-23-2017 04:19 PM |
08-02-2017
01:50 PM
1 Kudo
Hi, I have a stream of data coming in to hdfs. I want to store the data in to hive. --------------------------------------------------------------------------------------- Sample data:(data is in single line but with multiple attributes) sample=data1 _source="/s/o/u" destination="/d/e/s" _ip="0.0.0.0" timestamp=20170802 10:00:00 text="sometext_with$spec_char" sample=data2 destination="/d/e/s" _ip="0.0.0.0" timestamp=20170802 10:00:00 text="sometext_with$spec_char" _source="/s/o/u" technology="r"o"b"ust" sample=data3 _ip="0.0.0.0" timestamp=20170802 10:00:00destination="/d/e/s" text="sometext_with$spec_char" _source="/s/o/u" --------------------------------------------------------------------------------------- Problems with data: 1.data do not follow same order if you can see (sample_data 1 has source, destination, timestamp, text. sample_data2 has destination,timestamp,text, source e.t.c) 2. the attributes dont follow same convention (_source, destination, _ip, timestamp,text etc; but basically with "_" and with out "_". 3. the attributes are not fixed (sample_data1 has source, destination,timestamp,text; sample_data2 has destination, _ip, timestamp,text,source and technology) sample | source| destination | ip | text | technology | data1 |a/b/c | /d/e/s | 0.0.0.0 |sometext_with$spec_char | NULL| data2 |a/b/c | /d/e/s | 0.0.0.0 |sometext_with$spec_char | r"o"b"ust data3 |a/b/c | /d/e/s | 0.0.0.0 |sometext_with$spec_char | NULL| Thanks for your support
... View more
Labels:
- Labels:
-
Apache NiFi
08-02-2017
11:24 AM
@Matt Clarke Also, I need some help, thankful if you could guide me. I have a file in hdfs, which have a lot of fields, which I want to put in to hive. e.g: --------------------------------------------------------------------------------- text in hdfs "These are the attributes to save in hive _source="/a/b/c" _destination="/a/b/d" - - _ip="a.b.c.d" text="hive should save these attributes in different columns"". I made an external table in hive with columnns |source |
destination |
ip |
text | I want to get the key value pairs from above text in hdfs and place in hive in respective columns. --------------------------------------------------------------------------------- In hdfs file, a series of such lines are present, they are unordered and not exactly in the same order of source, destination etc. Any suggestion Thankyou
... View more
08-02-2017
09:24 AM
@Matt Clarke Hi Matt, I have followd your suggestion, I got the expected text. As I am new to Nifi, need more learning. And your suggestions helped me.Thank you.
... View more
08-02-2017
09:21 AM
@Wynner I have replaced RouteOnContent processor, but kept parameters same. Surprisingly, it works pretty fast(seconds). not sure why the old one was not working. Thanks for your extended support.
... View more
08-01-2017
03:28 PM
@Matt Clarke I have used your suggestion, but result is same, it fetches the complete line instead of [hdfs....... .log"] for clarification I will let you know the steps which I am following 1. GetHDFS 2. Splittext: count-1. 3. Extract text:
(\[hdfs.*log"\]) 4. Update Attribute 5. PutHDFS not sure why it is pulling complete line? Thanks
... View more
08-01-2017
09:56 AM
Hi , I have stream data (GetHDFS will be running continuosly ) which contains number of lines. e.g: <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited. A stream of above lines of data will be in file I have to extract text from above message [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] I tried using a extract text processor and used custom property extract: ([hdfs.*log"]). I tried the above in java regex evaluator, it shows correct text extracted. but when I run the flow, output gets the complete text. expected: [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] actual : <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited. Please help me to correct the regex to extract correct text.
... View more
Labels:
- Labels:
-
Apache NiFi
07-31-2017
04:56 AM
I have changed it to 4 concurrent tasks, and run duration of 2s. for 50k messages it took almost 3 hours (never expected case). eg: a message will be like below this_is_an_example_message <1> [some_"text_and_digits_here"_number="121212"] [some_text_here] --similarly 50k messages routeoncontent configuration: Scheduling: concurrent tasks: 4 Run Schedule: 2s Properties: matchrequirement: content must contain match character set: UTF-8 Content Buffer Size :1MB txt: number="121212" update attribute: filename updated here puthdfs: configurations and path updated here Thanks in advance
... View more
07-27-2017
02:36 PM
I tried with changing the concurrent processes with 100(for testing), tested with 1k messages, it took 11 minutes to complete. Any suggestions, please!!
... View more
07-27-2017
02:32 PM
typically each message from split content processor is <=3KB concurrent processor are 1. Also, every second >50000 messages will be received and splitted and sent to route on content processor. I tested it with 50k messages, till route on content it just takes 2-3 second, but after that it is taking almost 3hours!! I will increase the number of concurrent processors and see, it this helps me to improve the performance
... View more
- « Previous
- Next »