Created 03-27-2016 12:25 AM
I have a Hive table with a number of columns where column X contains a large string of text with many spaces between each word, all delimiters have been removed and all that remains are 0-9 a-z and A-Z characters. I would like to query Column X for a keyword Y and Y^ (eg Java and Javascript) and count the number of unique users from column Z that have mentioned these two words. Either of these words need to be mentioned once in column X for it to be counted. I would then like to count the total number of users that mentioned either of these two words.
Created 03-27-2016 04:56 AM
Hi @John Garrigan, your specs are a little imprecise, but can you try this:
select count(distinct Z) from t1 where locate(Y, X)>0; -- for example: select count(distinct Z) from t1 where locate("Java", X)>0;
Note that, as stated, your condition on "Y^" (usually expressed as "Y*") is superfluous, since if Y is there then any superstring of Y will be there too.
Created 03-27-2016 04:56 AM
Hi @John Garrigan, your specs are a little imprecise, but can you try this:
select count(distinct Z) from t1 where locate(Y, X)>0; -- for example: select count(distinct Z) from t1 where locate("Java", X)>0;
Note that, as stated, your condition on "Y^" (usually expressed as "Y*") is superfluous, since if Y is there then any superstring of Y will be there too.
Created 03-27-2016 11:15 AM
Hi Predrag, many thanks for your help this worked perfectly.
Created 04-19-2018 02:36 PM
how can i locate some word but in a table, no in a column?
select count(distinct Z)from t1 where locate("Java", X)>0;
thank you.
Created 05-27-2020 05:12 AM
hi,
Were you able to find a solution for this ?