- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to split name based on special character in hive
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎01-11-2019 02:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table with column name "fullname" which I would like to split into four columns (LAST_NAME, TITLE, FIRST_NAME, MIDDLE_NAME) while loading into another table.
If it is a person's name, the convention is LAST_NAME TITLE! FIRST_NAME MIDDLE_NAME:
for example I have "abc xxx! def ghi" in my table. This should be split and loaded in to 4 different columns.
the name should be split into LAST, FIRST, MID. Word Preceding the exclamation is TITLE (like xxx here).
If it is an organisation name ("Names of organisations are ended by an exclamation mark) : I should move the entire string to FIRST_NAME.
for example: abc systems! should be loaded to FIRST_NAME.
current state:
fullname |
abc xxx! def ghi |
abc systems! |
expected result:
Last | Title | First | Middle |
abc | xxx | def | ghi |
abc systems |
Can someone help how to write a query for the above requirement? Thanks in Advance!
Created ‎01-11-2019 03:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use below sql to convert the content as required
0: jdbc:hive2://localhost> select * from temptable; +---------------------+--+ | temptable.fullname | +---------------------+--+ | abc xxx! def ghi | | abc systems! | +---------------------+--+ 0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from ( 0: jdbc:hive2://localhost> select case 0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '') 0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t; +-------+--------+--------------+------+--+ | last | title | first | mid | +-------+--------+--------------+------+--+ | abc | xxx | def | ghi | | | | abc systems | | +-------+--------+--------------+------+--+
If my answer helped to resolve the issue, please accept. It might help others in the community.
Created ‎01-13-2019 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tahnks Naresh for your response.
there is one more scenario which I received today along with above requirements.
If the individual name has only three words (I mean without title) then the fullname should be like below:
current state:
fullname |
jkl! mno pqr |
expected
Last | Title | First | Middle |
jkl | mno | pqr |
Created ‎01-14-2019 03:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it.
It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.
