- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
regex pattern for hive regex serde
- Labels:
-
Apache Hadoop
-
Apache Hive
Created 08-01-2016 05:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can someone pls help me creating regex pattern to use for creating a hive table with RegEx Serde....
I want hive regex table to be created with the pattern ^^^^^^^^^^ [10 anchor characters] as a delimiter! I am not sure what would be the regex pattern of hive table for this!!
please help.
Thanks a lot in advance
Created 08-05-2016 01:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Raja A, can you try
"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"
You need groups to map to your table fields.
Created 08-01-2016 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will try \\^ for each character. so basically
\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^
Can you try this?
Created 08-01-2016 06:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response @mqureshi
I Tried below two ways, but that did not work...
- "input.regex" = "\\^\^\\^\\^\\^\\^\\^\\^\\^\\^"
- "input.regex" = "(\\^\^\\^\\^\\^\\^\\^\\^\\^\\^)"
Any other thoughts i can try of ?
Created 08-01-2016 07:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Raja A
In both cases the second one has only single slash. Can you try it with two slashes. I think the first one should work if you just add two slashes. Also, why not try to make it work with only "^^" first. Figure out how to make ^^ work and then you can simply extend that to ^^^^^^^^^^.
Created 08-01-2016 09:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response and time @mqureshi
I tried for ^^... its still not working...!! And, sorry, my above comment had a typo !!
Below is what i am trying !!
create external table <table_name> ( col1 string, col2 string ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.RegexSerDe" WITH SERDEPROPERTIES ( "input.regex" = "^^" ) STORED AS TEXTFILE LOCATION "<hdfs_path>";
Created 08-01-2016 09:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will try this on my machine when I get a chance, hopefully later today but I think it should
"input.regex"="^\\^\\^"
The first one signifies the beginning of a string and other two are for your matching pattern.
Created 08-05-2016 01:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Raja A, can you try
"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"
You need groups to map to your table fields.
Created 09-29-2016 09:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for the response. this works!!
However, this is working only when i am having 2 columns in the input... but not working when there are more columns !!
Created 09-29-2016 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, you said you had only 2 columns 🙂 For more columns you can either change the regex, or try MultiDelimitSerDe if you are on Hive-0.14 or newer. By the way, inspired by your question I wrote an article about RegexSerDe.
