Created 01-11-2019 02:36 PM
I have a table with column name "fullname" which I would like to split into four columns (LAST_NAME, TITLE, FIRST_NAME, MIDDLE_NAME) while loading into another table.
If it is a person's name, the convention is LAST_NAME TITLE! FIRST_NAME MIDDLE_NAME:
for example I have "abc xxx! def ghi" in my table. This should be split and loaded in to 4 different columns.
the name should be split into LAST, FIRST, MID. Word Preceding the exclamation is TITLE (like xxx here).
If it is an organisation name ("Names of organisations are ended by an exclamation mark) : I should move the entire string to FIRST_NAME.
for example: abc systems! should be loaded to FIRST_NAME.
current state:
fullname |
abc xxx! def ghi |
abc systems! |
expected result:
Last | Title | First | Middle |
abc | xxx | def | ghi |
abc systems |
Can someone help how to write a query for the above requirement? Thanks in Advance!
Created 01-11-2019 03:55 PM
You can use below sql to convert the content as required
0: jdbc:hive2://localhost> select * from temptable; +---------------------+--+ | temptable.fullname | +---------------------+--+ | abc xxx! def ghi | | abc systems! | +---------------------+--+ 0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from ( 0: jdbc:hive2://localhost> select case 0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '') 0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t; +-------+--------+--------------+------+--+ | last | title | first | mid | +-------+--------+--------------+------+--+ | abc | xxx | def | ghi | | | | abc systems | | +-------+--------+--------------+------+--+
If my answer helped to resolve the issue, please accept. It might help others in the community.
Created 01-13-2019 11:59 PM
Tahnks Naresh for your response.
there is one more scenario which I received today along with above requirements.
If the individual name has only three words (I mean without title) then the fullname should be like below:
current state:
fullname |
jkl! mno pqr |
expected
Last | Title | First | Middle |
jkl | mno | pqr |
Created 01-14-2019 03:57 AM
If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it.
It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.