Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Find a keyword in a column in Hive table

avatar
New Member

I have a Hive table with a number of columns where column X contains a large string of text with many spaces between each word, all delimiters have been removed and all that remains are 0-9 a-z and A-Z characters. I would like to query Column X for a keyword Y and Y^ (eg Java and Javascript) and count the number of unique users from column Z that have mentioned these two words. Either of these words need to be mentioned once in column X for it to be counted. I would then like to count the total number of users that mentioned either of these two words.

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @John Garrigan, your specs are a little imprecise, but can you try this:

select count(distinct Z) from t1 where locate(Y, X)>0;   -- for example:
select count(distinct Z) from t1 where locate("Java", X)>0;

Note that, as stated, your condition on "Y^" (usually expressed as "Y*") is superfluous, since if Y is there then any superstring of Y will be there too.

View solution in original post

4 REPLIES 4

avatar
Master Guru

Hi @John Garrigan, your specs are a little imprecise, but can you try this:

select count(distinct Z) from t1 where locate(Y, X)>0;   -- for example:
select count(distinct Z) from t1 where locate("Java", X)>0;

Note that, as stated, your condition on "Y^" (usually expressed as "Y*") is superfluous, since if Y is there then any superstring of Y will be there too.

avatar
New Member

Hi Predrag, many thanks for your help this worked perfectly.

avatar
New Member

hi @Predrag Minovic

how can i locate some word but in a table, no in a column?

select count(distinct Z)from t1 where locate("Java", X)>0;

thank you.

avatar
Contributor

hi,

 

Were you able to find a solution for this ?