Support Questions

Find answers, ask questions, and share your expertise

Find a word or a group of words from a column in Hortonworks and creating it as a separate column

avatar
Explorer

Hi,

I am trying to create a view in Hortonworks that I have in postgres greenplum where I am trying to find a word/words that exists in a column that has lot of words. If the word exists, the new column should show the word and if there are multiple words, the multiple words should be reflected with a space in between. Below is a running code in postgres Greenplum. Can anyone help me with the code in Hive?

CREATE OR REPLACE VIEW XYZ_V AS( Select col1, col2, col3, col4,trim(both ' ' from

case when substring( lower(col4) from 'abcd' ) is null then '' else substring( lower(col4) from 'abcd') end ||' '||

case when substring( lower(col4) from 'pqrs' ) is null then '' else substring( lower(col4) from 'pqrs') end ||' '||

case when substring( lower(col4) from 'mnop' ) is null then '' else substring( lower(col4) from 'mnop') end) as col5

FROM XYZ);

Thanks a lot in advance

Regards

7 REPLIES 7

avatar

Hey @Souveek Ray!

You can change your CREATE OR REPLACE VIEW to CREATE VIEW, in hive they have the same effect (doesn't need the word REPLACE) 🙂
TRIM Function not needed to use 'both' for Hive, the function TRIM in hive applies in both sides automatically.

LOWER function just put the value from COL4

CASE WHEN are the same in this case.

And your substring function, I'm not sure about the logic in your query, but, the sintax in hive are SUBSTR(value, init_pos, len). E.g. SUBSTR('Vinicius', 1,3) = Vin.

Below some good docs about Hive SQL 🙂

https://hortonworks.com/blog/hive-cheat-sheet-for-sql-users/
https://cwiki.apache.org/confluence/display/Hive/LanguageManual

Hope this helps 🙂

avatar
Explorer

Thank you for replying but my main concern is to replicate the part using substring in Hive. I can't figure out a way to get the exact output

avatar

Hmmm I see, so could you share some results from Postgresql? Maybe we can figure out another way do to this 🙂

avatar
Explorer

Let me give you an example of the code that is working in postgres based on the code I gave.

CREATE OR REPLACE VIEW XYZ_V AS( Select col1, col2, col3, col4,trim(both ' ' from

case when substring( lower(col4) from 'abcd' ) is null then '' else substring( lower(col4) from 'abcd') end ||' '||

case when substring( lower(col4) from 'pqrs' ) is null then '' else substring( lower(col4) from 'pqrs') end ||' '||

case when substring( lower(col4) from 'mnop' ) is null then '' else substring( lower(col4) from 'mnop') end) as col5

FROM XYZ);

Example:

COL1\t COL2\t COL3\t COL4\t COL5

1\t pdf\t page1\t efgh opsdw mnop\t mnop

2\t txt\t page4\t abcd qwerty pqrs\t abcd pqrs

3\t txt\t page3\t abcd pqrs xyzsd mnop\t abcd pqrs mnop

(\t denotes the break in columns)

Regards,

Souveek

avatar

Hey @Souveek Ray
Maybe this code can help you (it's just a scratch)

select 
concat_ws(' ',case when regexp_extract('abcd pqrs xyzsd mnop', '(abcd)', 0) is null then '' else 'abcd' end, 
case when regexp_extract('abcd pqrs xyzsd mnop', '(mnop)', 0) is null then '' else 'mnop' end,
case when regexp_extract('abcd pqrs xyzsd mnop', '(pqrs)', 0) is null then '' else 'pqrs' end)

avatar
Explorer

Hi, @Vinicius Higa Murakami

Thank you for taking out your time to help me. The regexp_extract didnt work correctly but I got the idea and modified the code to get the desired results.

Here is what I did:

trim(concat_ws(' ', case when instr( lower(col4) , 'abcd' ) >0 then 'abcd' else '' end,

case when instr( lower(col4) , 'pqrs' ) >0 then 'pqrs' else '' end))


This worked perfectly. Thank you so much


Regards

Souveek

avatar

Awesome! 🙂 Good to know @Souveek Ray