- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Extract text and Replace text processors regex
- Labels:
-
Apache NiFi
Created ‎08-03-2017 10:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have below data in hdfs
a="alphabet_123_a" b="alphabetb" c="alphabet"c" is third one"
b="newb" d="alphabet@/d" a="new a"
a="changed a", b="changed b" c="changed c" e="alphabet e"
My idea is:
1. Make a table in hive as orc, with columns a, b, c,d,e.
2. extract the attributes from the above data.
3. Mapping attributes according to column names in hive and storing them in hive.
4. in first line a,b,c; second line b,d,a; third line a,b,c,e
5. now after extracting all the lines and storing in hive, the values which are not present in lines (e.g. first line dont have "d" and "e"; second line dont have "c" and "e"; third line dont have "d") should be NULL, by the time they store in hive.
Approach
1. Table "details" is created with columns a,b,c,d,e
2. Extract text processor is configured with custom properties as
(a=)(.*?(?=\s\w+=|$)) --- [This will extract "alphabet_123a" in line 1 along with quotes(") at begening and ending of the values
(b=)(.*?(?=\s\w+=|$)) --- [This will extract "aphabetb" in line 1 along with quotes...)
3. I am confused in the replace text processor, as
1. how to remove double quotes?
2. insert NULL values if the corresponding column name is missing in the line?
3. how to generalize the replace text for search value?
Also let me know, how can I change the regex in extract text processor(if necessary)?
Please help me
Thanks
Created ‎11-23-2017 04:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have sorted it. closing the question.
Created ‎11-23-2017 04:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have sorted it. closing the question.
