I have setup a POC with Cloudera Enterprise and GoGrid (1 manager and 3 data nodes).
I have created a database, row fromat parquet, with 575M rows and 26 columns.
Most queries are running very fast, but there is 1 query which performs very slow.
select distinct(msisdn) from cdrmainparquet where msisdn=2255763xxxx: 6 seconds : OK
select msisdn from cdrmainparquet where msisdn=225576xxxx: 2.77 seconds: OK
select msisdn,imsi from cdrmainparquet where msisdn=2255763xxxx: 4.58 seconds: ?
select msisdn,imsi,eventtype,eventduration from cdrmainparquet where msisdn=2255763xxxx: 175 seconds : ??
select * (26 columns) takes 5minutes!
Why does this happen? Can I do something to let this single query running faster?
Would you be able to post the runtime profiles of the slow queries? The runtime profiles contain detailed performance counters that may help us identify the problem.