Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need to extract words from a column containing multiple words without the first word

Need to extract words from a column containing multiple words without the first word

New Contributor

I am doing a couple of operations of which I am extracting the first word by using

split(item_desc, ' ')[0]

I also need to extract all words excluding the first word. Can anyone suggest a way please?

Regards,

Souveek

3 REPLIES 3

Re: Need to extract words from a column containing multiple words without the first word

New Contributor

Re: Need to extract words from a column containing multiple words without the first word

Super Guru
@Souveek Ray

Use regexp_extract function as you are doing split on space, write an matching regex to extract until first space as first word and next capture group excludes the first word.

Example:

I'm having firstword secondword thirdword as a value

hive> select regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",1)first_word,
regexp_extract("firstword secondword thirdword","^(.*?)\\s(.*)",2) not_first_word; +-------------+-----------------------+--+ | first_word | not_first_word | +-------------+-----------------------+--+ | firstword | secondword thirdword | +-------------+-----------------------+--+

So in the above example we have extracted the first word by using regexp_extract and 1 capture group and in the next we have used regexp_extract and 2 capture group(excluding firstword).

Highlighted

Re: Need to extract words from a column containing multiple words without the first word

New Contributor

Hi,

Thank you, I figured this out using substring

substr(item_desc,instr(item_desc,' ')+1,length(item_desc))

This is giving me the desired results

Regards

Souveek

Don't have an account?
Coming from Hortonworks? Activate your account here