- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Extract text using Nifi
- Labels:
-
Apache NiFi
Created ‎08-01-2017 09:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I have stream data (GetHDFS will be running continuosly ) which contains number of lines.
e.g:
<start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.
A stream of above lines of data will be in file
I have to extract text from above message
[hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]
I tried using a extract text processor and used custom property
extract: ([hdfs.*log"]).
I tried the above in java regex evaluator, it shows correct text extracted. but when I run the flow, output gets the complete text.
expected: [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]
actual : <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.
Please help me to correct the regex to extract correct text.
Created ‎08-01-2017 03:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your Java regular expression needs to escape the "[" and "]" since they have reserved meaning in Java.
Try using the following java regular expression instead:
(\[hdfs.*log"\])
Thanks,
Matt
Created ‎08-01-2017 03:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your Java regular expression needs to escape the "[" and "]" since they have reserved meaning in Java.
Try using the following java regular expression instead:
(\[hdfs.*log"\])
Thanks,
Matt
Created ‎08-01-2017 03:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Matt Clarke I have used your suggestion, but result is same, it fetches the complete line instead of [hdfs....... .log"]
for clarification I will let you know the steps which I am following
1. GetHDFS
2. Splittext: count-1.
3. Extract text:
- (\[hdfs.*log"\])
4. Update Attribute
5. PutHDFS
not sure why it is pulling complete line?
Thanks
Created on ‎08-01-2017 04:31 PM - edited ‎08-17-2019 10:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ExtractText processor will extract the text that matches your regex and assign it to an attribute matching the property name on the FlowFile. The content of the FlowFile remains unchanged. Then you update a FlowFiles Attribute and finally use PutHDFS to write the content (which at this time you have not changed at all) to HDFS.
If your intent is to write the modified string to HDFS, you need to update the actual content of the FlowFile and nit just create and modify attributes. For that use case, you would want to use ReplaceText processor instead.
You would configure ReplaceText similar to the following:
The above will result in the actual content of the FlowFile being changed to:
[hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]
Thanks,
Matt
Created ‎08-02-2017 09:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Matt Clarke Hi Matt,
I have followd your suggestion, I got the expected text.
As I am new to Nifi, need more learning. And your suggestions helped me.Thank you.
Created ‎08-02-2017 11:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, I need some help, thankful if you could guide me.
I have a file in hdfs, which have a lot of fields, which I want to put in to hive.
e.g:
---------------------------------------------------------------------------------
text in hdfs
"These are the attributes to save in hive _source="/a/b/c" _destination="/a/b/d" - - _ip="a.b.c.d" text="hive should save these attributes in different columns"".
I made an external table in hive with columnns
|source | destination | ip | text |
I want to get the key value pairs from above text in hdfs and place in hive in respective columns.
---------------------------------------------------------------------------------
In hdfs file, a series of such lines are present, they are unordered and not exactly in the same order of source, destination etc.
Any suggestion
Thankyou
Created ‎08-02-2017 01:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please start a new question rather then asking multiple unrelated questions in a single post. This makes it easier for community users to find similar issues.
It also help other members identify unanswered questions so they may address them. This question would likely go unnoticed otherwise.
I would need to do some investigation to come up with a good solution, but other community members may have already handled this exact scenario. By starting a new question, all members following the "data-processing" or "nifi-processor" or "nifi-streaming" will get notified of your question.
Thanks,
Matt
Created ‎08-02-2017 01:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
