Support Questions

mark_hadoop · ‎08-01-2017

Hi ,

I have stream data (GetHDFS will be running continuosly ) which contains number of lines.

e.g:

<start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.

A stream of above lines of data will be in file

I have to extract text from above message

[hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]

I tried using a extract text processor and used custom property

extract: ([hdfs.*log"]).

I tried the above in java regex evaluator, it shows correct text extracted. but when I run the flow, output gets the complete text.

expected: [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]

actual : <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.

Please help me to correct the regex to extract correct text.

MattWho · ‎08-01-2017

@Hadoop User

Your Java regular expression needs to escape the "[" and "]" since they have reserved meaning in Java.

Try using the following java regular expression instead:

(\[hdfs.*log"\])

Thanks,

Matt

View solution in original post

MattWho · ‎08-01-2017

@Hadoop User

Your Java regular expression needs to escape the "[" and "]" since they have reserved meaning in Java.

Try using the following java regular expression instead:

(\[hdfs.*log"\])

Thanks,

Matt

mark_hadoop · ‎08-01-2017

@Matt Clarke I have used your suggestion, but result is same, it fetches the complete line instead of [hdfs....... .log"]

for clarification I will let you know the steps which I am following

1. GetHDFS

2. Splittext: count-1.

3. Extract text:

(\[hdfs.*log"\])

4. Update Attribute

5. PutHDFS

not sure why it is pulling complete line?

Thanks

MattWho · ‎08-01-2017

@Hadoop User

The ExtractText processor will extract the text that matches your regex and assign it to an attribute matching the property name on the FlowFile. The content of the FlowFile remains unchanged. Then you update a FlowFiles Attribute and finally use PutHDFS to write the content (which at this time you have not changed at all) to HDFS.

If your intent is to write the modified string to HDFS, you need to update the actual content of the FlowFile and nit just create and modify attributes. For that use case, you would want to use ReplaceText processor instead.

You would configure ReplaceText similar to the following:

The above will result in the actual content of the FlowFile being changed to:

[hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]

Thanks,

Matt

mark_hadoop · ‎08-02-2017

@Matt Clarke Hi Matt,

I have followd your suggestion, I got the expected text.

As I am new to Nifi, need more learning. And your suggestions helped me.Thank you.

mark_hadoop · ‎08-02-2017

@Matt Clarke

Also, I need some help, thankful if you could guide me.

I have a file in hdfs, which have a lot of fields, which I want to put in to hive.

e.g:

---------------------------------------------------------------------------------

text in hdfs

"These are the attributes to save in hive _source="/a/b/c" _destination="/a/b/d" - - _ip="a.b.c.d" text="hive should save these attributes in different columns"".

I made an external table in hive with columnns

I want to get the key value pairs from above text in hdfs and place in hive in respective columns.

---------------------------------------------------------------------------------

In hdfs file, a series of such lines are present, they are unordered and not exactly in the same order of source, destination etc.

Any suggestion

Thankyou

MattWho · ‎08-02-2017

@Hadoop User

Please start a new question rather then asking multiple unrelated questions in a single post. This makes it easier for community users to find similar issues.

It also help other members identify unanswered questions so they may address them. This question would likely go unnoticed otherwise.

I would need to do some investigation to come up with a good solution, but other community members may have already handled this exact scenario. By starting a new question, all members following the "data-processing" or "nifi-processor" or "nifi-streaming" will get notified of your question.

Thanks,

Matt

mark_hadoop · ‎08-02-2017

@Matt Clarke

I will start a new question.

Thanks

Cloudera Community

Support Questions

Extract text using Nifi

Extracting data from unstructured logs text from m...

Using Solr's Extracting Request Handler with Apach...

Extracting List Elements using JOLT

Using OpenNLP for Identifying Names From Text

Duplicate results using extract text processor for...

Using Apache NiFi for Speech Processing: Speech to...

Counting lines in text files with NiFi - part 2

How to Extract All Hive Tables DDL

Extract Text - Extracting delimited data

Extract text and Replace text processors regex