Extract text and Replace text processors regex

mark_hadoop — Fri, 04 Aug 2017 05:58:12 GMT

I have below data in hdfs

a="alphabet_123_a" b="alphabetb" c="alphabet"c" is third one"

b="newb" d="alphabet@/d" a="new a"

a="changed a", b="changed b" c="changed c" e="alphabet e"

My idea is:

1. Make a table in hive as orc, with columns a, b, c,d,e.

2. extract the attributes from the above data.

3. Mapping attributes according to column names in hive and storing them in hive.

4. in first line a,b,c; second line b,d,a; third line a,b,c,e

5. now after extracting all the lines and storing in hive, the values which are not present in lines (e.g. first line dont have "d" and "e"; second line dont have "c" and "e"; third line dont have "d") should be NULL, by the time they store in hive.

Approach

1. Table "details" is created with columns a,b,c,d,e

2. Extract text processor is configured with custom properties as

(a=)(.*?(?=\s\w+=|$)) --- [This will extract "alphabet_123a" in line 1 along with quotes(") at begening and ending of the values

(b=)(.*?(?=\s\w+=|$)) --- [This will extract "aphabetb" in line 1 along with quotes...)

3. I am confused in the replace text processor, as

1. how to remove double quotes?

2. insert NULL values if the corresponding column name is missing in the line?

3. how to generalize the replace text for search value?

Also let me know, how can I change the regex in extract text processor(if necessary)?

Please help me

Thanks

Re: Extract text and Replace text processors regex

mark_hadoop — Fri, 24 Nov 2017 00:19:47 GMT

I have sorted it. closing the question.

question Re: Extract text and Replace text processors regex in Archives of Support Questions (Read Only)

Extract text and Replace text processors regex

Re: Extract text and Replace text processors regex