Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need output as summarized form of total.

Highlighted

Need output as summarized form of total.

Explorer

Dear Members,

I have a table "Processor types" with column name processor.

Query

SELECT processor, SUM(processor) AS Total from processor-types where (processor like '%i210%') and ddate like '20161009' group by processor

output

Processor Total
i210at 4
i210-at 6
i210 2016 8
intel+ethernet+connection+i210 1
wgi210it+s+ljxt 4
wgi210is_sljxx 3
i210-t1 4
ethernet i210-t1 5
intelâ® ethernet controller i210-at 5
wgi210is 2

But I need output in summarized form of the total and generic name in processor column as i210. It mean whatever the string in column it has to be replaced by 'i210'.

Eg: (ethernet i210-t1) string to replaced by i210

Processor Total
i210 42

will you all please suggest me how to do. If so send me sample query.

4 REPLIES 4
Highlighted

Re: Need output as summarized form of total.

Rising Star

is processor name always 4 letters word? If yes, you can use substr to get only first letters from processor name and group by that. If processor name length varies then look for set of character that comes after processor name and eliminate anything after those characters. Here you can use mix of instr and substr hive function.

Highlighted

Re: Need output as summarized form of total.

Explorer

@Kashif,

shall I have any sample query how to use instr and substr

Highlighted

Re: Need output as summarized form of total.

Rising Star

I guess the easiest would be to have a lookup table with all processor name records, then you can use something like below.

SELECT b.processor_name, count(*) Cnt from process_types_with_count_table A, processor_lookup_table BWHERE instr(b.processor_name, a.processor_name) > 0 Group by b.processor_name;

This is not very efficient as you are joining other table here, but will do the work. Other option is if your processor name has any pattern then you can use regex in hive query to get only processor_name that you want.

Re: Need output as summarized form of total.

Hi @VENKATESH M

I totally agree with @Kashif Khan's answer. Having a second table that you use for data cleansing and categorizing data is very valuable. You would have to keep that table up-to-date, but it is a nice way to do data-driven categorization.

That categorization table could look something like this:

Processor                       Processor_Category
------------                    ---------------------
i210at                          i210 
i210-at                         i210
i210 2016                       i210 
intel+ethernet+connection+i210  i210
wgi210it+s+ljxt                 i210
...

Anyways... If you need a quick and dirty way to aggregate the exact query you provided, you can do it this way:

SELECT 'i210' AS Processor, 
       SUM(processor) AS Total 
from processor-types 
where (processor like '%i210%') 
and ddate like '20161009' 
group by 'i210';

Processor    Total
---------    ------
i210         42

I hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here