Reply
Contributor
Posts: 34
Registered: ‎09-25-2016

wide table with millions of rows

We have an requirement to select 100 rows from a table from a perticular range of partitions. It's a wide table with 800 columns. One of the columns in the table is precise timestamp of the record ( a column of type TIMESTAMP). Also there is a partition predicate day ( yyyymmdd INT ). User often select range of days and try to find the top 100 ( order by exact timestamp ).

 

If I run query like

select * from table where yyyymmdd between now() and now()-3 months

order by time

top 100

 

This query runs extremely slow.

Reducing the number helps.

I also tried adding partition yyyymmdd to the query's order by ; thinking query planner might use it to only find results by partition and won't wait for results from other partitions if it gets 100 from first one. But I didn't see it working.

 

Any tricks/tips to make this query faster ? Wide table with 100s of millions of rows.

Cloudera Employee
Posts: 821
Registered: ‎03-23-2015

Re: wide table with millions of rows

I see you used SELECT *, do you have to select all columns out? Are you using Parquet table?

If not, I suggest to give it parquet table a try. Parquet is a column storage engine, which stores data by column. So if you have 800 columns and you only need a few columns in your query, it can really help to speed up your query, as it does not need to retrieve the whole row from HDFS.
Contributor
Posts: 34
Registered: ‎09-25-2016

Re: wide table with millions of rows

[ Edited ]

It is parquet table. But it also has lot of rows. If my query has to scan last 6 months of data ( 2.5 Billion rows) ; I'm using order by time clause ( where time is a column of type timestamp), it takes 2 minutes ( after reduced my columns to retrieve only 5 columns ). With all colums it takes like 5 + minutes.

 

I also tried reducing the columns to 200 - 300 but stil the performance of the query is still slow.

Cloudera Employee
Posts: 821
Registered: ‎03-23-2015

Re: wide table with millions of rows

In that case, please provide the Impala PROFILE of the query for review, so that we can see which stage was the bottleneck.

Have you tried to remove the sorting to see if it makes much different?
Contributor
Posts: 34
Registered: ‎09-25-2016

Re: wide table with millions of rows

I was able to finally figure this ...

 

- Upgrading to CDH 5.12 helped. Earlier we used 5.7 which appeared to have some bug with wide table scan.

- Sorting is an issue.But it has it's own limitations ..

   If sorting is removed and limit clause is removed. It takes long time to download the data for obvious reasons. Large data takes time to download. It starts streamign data instantly though.

  Removing sorting but with limit obviously makes the result unpredictable and inconsistent.

- When sorting is applies and used with limit and offset clause. It appeared from profile that each node sorts the data that it has scanned and sends top ( 100 , i.e. limit 100 ) to coordinator. Coordinator than collects all the top results and sends top 100 to client.

- Event though I added partition key to sort statement. The query appearts to not use that i.e. it still tries to scan all records from all partitions before sending it to coordinator. in other words adding partition key to sorting or removing it didn't make any difference. I am not sure but Impala could have done something smarter here to make query run faster.

 

 

Cloudera Employee
Posts: 437
Registered: ‎07-29-2015

Re: wide table with millions of rows

Hi Sunil,

  Glad that 5.12 is working better for you. I'm not sure exactly which issue might have caused that but we have made a lot of improvements since 5.7.

 

I think the optimisation you're describing is interesting. I guess the precondition is that the a prefix of the ordering key can be evaluated from the partition key. Your idea is to process the partitions on each daemon strictly in sorted order and then only sort each partition at a time? I think there are some obstacles currently to doing this but it's interesting to think about.


We don't do it currently because the partitions are processed in an arbitrary order and in parallel, so the table scan may interleave rows from different partitions.