- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Not able to understand the Regexp_extract sysntax,
- Labels:
-
Hortonworks Data Platform (HDP)
Created ‎07-26-2016 07:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
regexp_extract(col_value,'^(?:([^,]*),?){1}',1)
,Created ‎07-26-2016 07:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.
In your case it will return everything from the start until the first comma (comma included). For example if your text is "abc,def,geh", it will return "abc,".
Hope this helps.
Created ‎07-26-2016 07:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.
In your case it will return everything from the start until the first comma (comma included). For example if your text is "abc,def,geh", it will return "abc,".
Hope this helps.
Created ‎07-26-2016 08:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pierre,
thanks for look into my query. Yes it is very much clear to me except one doubt .
i am not clear with ?: in my query and (.*?) in your example.
Sorry for asking very basic things but if you could give me some briefthat can be helpful in writing some other functions.
Regards
Sachin Mittal
Created ‎07-26-2016 08:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd recommend you having a look to this site : http://regexr.com/
You can enter your regular expression and then click on "Explain" (at the bottom) to have a complete explanation about the regular expression you entered. It also gives you the possibility to test your regular expression with any text you want.
Hope this helps.
Created ‎07-26-2016 09:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pierre,
Very nice of you.
Thanks a lot. I visited the site and cleared my most of the doubts.
Regards
Sachin Mittal
