Created 01-26-2017 02:05 PM
Dear Members,
I have a table "Processor types" with column name processor.
SELECT processor, SUM(processor) AS Total from processor-types where (processor like '%i210%') and ddate like '20161009' group by processor
Processor | Total |
i210at | 4 |
i210-at | 6 |
i210 2016 | 8 |
intel+ethernet+connection+i210 | 1 |
wgi210it+s+ljxt | 4 |
wgi210is_sljxx | 3 |
i210-t1 | 4 |
ethernet i210-t1 | 5 |
intelâ® ethernet controller i210-at | 5 |
wgi210is | 2 |
But I need output in summarized form of the total and generic name in processor column as i210. It mean whatever the string in column it has to be replaced by 'i210'.
Eg: (ethernet i210-t1) string to replaced by i210
Processor | Total |
i210 | 42 |
will you all please suggest me how to do. If so send me sample query.
Created 01-26-2017 02:18 PM
is processor name always 4 letters word? If yes, you can use substr to get only first letters from processor name and group by that. If processor name length varies then look for set of character that comes after processor name and eliminate anything after those characters. Here you can use mix of instr and substr hive function.
Created 01-26-2017 02:28 PM
@Kashif,
shall I have any sample query how to use instr and substr
Created 01-26-2017 04:05 PM
I guess the easiest would be to have a lookup table with all processor name records, then you can use something like below.
SELECT b.processor_name, count(*) Cnt from process_types_with_count_table A, processor_lookup_table BWHERE instr(b.processor_name, a.processor_name) > 0 Group by b.processor_name;
This is not very efficient as you are joining other table here, but will do the work. Other option is if your processor name has any pattern then you can use regex in hive query to get only processor_name that you want.
Created 01-26-2017 08:18 PM
Hi @VENKATESH M
I totally agree with @Kashif Khan's answer. Having a second table that you use for data cleansing and categorizing data is very valuable. You would have to keep that table up-to-date, but it is a nice way to do data-driven categorization.
That categorization table could look something like this:
Processor Processor_Category ------------ --------------------- i210at i210 i210-at i210 i210 2016 i210 intel+ethernet+connection+i210 i210 wgi210it+s+ljxt i210 ...
Anyways... If you need a quick and dirty way to aggregate the exact query you provided, you can do it this way:
SELECT 'i210' AS Processor, SUM(processor) AS Total from processor-types where (processor like '%i210%') and ddate like '20161009' group by 'i210'; Processor Total --------- ------ i210 42
I hope this helps.