Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to understand the Regexp_extract sysntax,

Solved Go to solution

Not able to understand the Regexp_extract sysntax,

New Contributor

regexp_extract(col_value,'^(?:([^,]*),?){1}',1)

,
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Not able to understand the Regexp_extract sysntax,

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.

In your case it will return everything from the start until the first comma (comma included). For example if your text is "abc,def,geh", it will return "abc,".

Hope this helps.

View solution in original post

4 REPLIES 4
Highlighted

Re: Not able to understand the Regexp_extract sysntax,

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.

In your case it will return everything from the start until the first comma (comma included). For example if your text is "abc,def,geh", it will return "abc,".

Hope this helps.

View solution in original post

Highlighted

Re: Not able to understand the Regexp_extract sysntax,

New Contributor

Hi Pierre,

thanks for look into my query. Yes it is very much clear to me except one doubt .

i am not clear with ?: in my query and (.*?) in your example.

Sorry for asking very basic things but if you could give me some briefthat can be helpful in writing some other functions.

Regards

Sachin Mittal

Highlighted

Re: Not able to understand the Regexp_extract sysntax,

I'd recommend you having a look to this site : http://regexr.com/

You can enter your regular expression and then click on "Explain" (at the bottom) to have a complete explanation about the regular expression you entered. It also gives you the possibility to test your regular expression with any text you want.

Hope this helps.

Highlighted

Re: Not able to understand the Regexp_extract sysntax,

New Contributor

Hi Pierre,

Very nice of you.

Thanks a lot. I visited the site and cleared my most of the doubts.

Regards

Sachin Mittal

Don't have an account?
Coming from Hortonworks? Activate your account here