Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Phoenix: Ordering on a column that has the same values within a limit and offset will give incorrect ordering

Phoenix: Ordering on a column that has the same values within a limit and offset will give incorrect ordering

New Contributor

I am using phoenix as a data store for a spring boot application. I recently noticed when trying to do ordering on paged result sets, that the returned results were not ordered corectly. The example table is as such:

create table 
ORDER_TEST(ID varchar not null, 
DAY Timestamp not null, 
name varchar(20), 
CONSTRAINT pk PRIMARY KEY(ID, DAY));

I populated this table with test data from a different table using a csv file. there is over 118000 rows in the table. when i make a query without ordering and just using a limit, this is the result:

select * from ORDER_TEST limit 20;
+-------------+--------------------------+-------+
|     ID      |           DAY            | NAME  |
+-------------+--------------------------+-------+
| 1111113535  | 2018-05-20 11:00:00.000  | bill  |
| 1111126512  | 2018-05-20 11:00:00.000  | bill  |
| 1111196581  | 2018-05-20 11:00:00.000  | bill  |
| 1111265745  | 2018-05-20 11:00:00.000  | bill  |
| 1111313845  | 2018-05-20 11:00:00.000  | bill  |
| 1111426110  | 2018-05-20 11:00:00.000  | bill  |
| 1111555188  | 2018-05-20 11:00:00.000  | bill  |
| 1111718893  | 2018-05-20 11:00:00.000  | bill  |
| 1111750510  | 2018-05-20 11:00:00.000  | bill  |
| 1111803389  | 2018-05-20 11:00:00.000  | bill  |
| 1111807594  | 2018-05-20 11:00:00.000  | bill  |
| 1111816611  | 2018-05-20 11:00:00.000  | bill  |
| 1111819491  | 2018-05-20 11:00:00.000  | bill  |
| 1111925373  | 2018-05-20 11:00:00.000  | bill  |
| 1111992155  | 2018-05-20 11:00:00.000  | bill  |
| 1112096271  | 2018-05-20 11:00:00.000  | bill  |
| 1112214458  | 2018-05-20 11:00:00.000  | bill  |
| 1112218906  | 2018-05-20 11:00:00.000  | bill  |
| 1112297916  | 2018-05-20 11:00:00.000  | bill  |
| 1112334322  | 2018-05-20 11:00:00.000  | bill  |
+-------------+--------------------------+-------+

This output looks correct as the order of the ID values are numerically ordered correctly. And as you can see when i use an offset of 10, it still ordered correctly i.e. the first result is the eleventh result of the first result set.

select * from ORDER_TEST limit 20 offset 10;
+-------------+--------------------------+-------+
|     ID      |           DAY            | NAME  |
+-------------+--------------------------+-------+
| 1111807594  | 2018-05-20 11:00:00.000  | bill  |
| 1111816611  | 2018-05-20 11:00:00.000  | bill  |
| 1111819491  | 2018-05-20 11:00:00.000  | bill  |
| 1111925373  | 2018-05-20 11:00:00.000  | bill  |
| 1111992155  | 2018-05-20 11:00:00.000  | bill  |
| 1112096271  | 2018-05-20 11:00:00.000  | bill  |
| 1112214458  | 2018-05-20 11:00:00.000  | bill  |
| 1112218906  | 2018-05-20 11:00:00.000  | bill  |
| 1112297916  | 2018-05-20 11:00:00.000  | bill  |
| 1112334322  | 2018-05-20 11:00:00.000  | bill  |
| 1112565559  | 2018-05-20 11:00:00.000  | bill  |
| 1112741906  | 2018-05-20 11:00:00.000  | bill  |
| 1112751810  | 2018-05-20 11:00:00.000  | bill  |
| 1112771032  | 2018-05-20 11:00:00.000  | bill  |
| 1112790397  | 2018-05-20 11:00:00.000  | bill  |
| 1112837648  | 2018-05-20 11:00:00.000  | bill  |
| 1113268591  | 2018-05-20 11:00:00.000  | bill  |
| 1113289754  | 2018-05-20 11:00:00.000  | bill  |
| 1113347282  | 2018-05-20 11:00:00.000  | bill  |
| 1113396368  | 2018-05-20 11:00:00.000  | bill  |
+-------------+--------------------------+-------+

So this leads me to believe that the limit and offset functions are working correctly. And when using the order by function when ordering on id, the result is the same as the previous output:

select * from ORDER_TEST order by id limit 20 offset 10;
+-------------+--------------------------+-------+
|     ID      |           DAY            | NAME  |
+-------------+--------------------------+-------+
| 1111807594  | 2018-05-20 11:00:00.000  | bill  |
| 1111816611  | 2018-05-20 11:00:00.000  | bill  |
| 1111819491  | 2018-05-20 11:00:00.000  | bill  |
| 1111925373  | 2018-05-20 11:00:00.000  | bill  |
| 1111992155  | 2018-05-20 11:00:00.000  | bill  |
| 1112096271  | 2018-05-20 11:00:00.000  | bill  |
| 1112214458  | 2018-05-20 11:00:00.000  | bill  |
| 1112218906  | 2018-05-20 11:00:00.000  | bill  |
| 1112297916  | 2018-05-20 11:00:00.000  | bill  |
| 1112334322  | 2018-05-20 11:00:00.000  | bill  |
| 1112565559  | 2018-05-20 11:00:00.000  | bill  |
| 1112741906  | 2018-05-20 11:00:00.000  | bill  |
| 1112751810  | 2018-05-20 11:00:00.000  | bill  |
| 1112771032  | 2018-05-20 11:00:00.000  | bill  |
| 1112790397  | 2018-05-20 11:00:00.000  | bill  |
| 1112837648  | 2018-05-20 11:00:00.000  | bill  |
| 1113268591  | 2018-05-20 11:00:00.000  | bill  |
| 1113289754  | 2018-05-20 11:00:00.000  | bill  |
| 1113347282  | 2018-05-20 11:00:00.000  | bill  |
| 1113396368  | 2018-05-20 11:00:00.000  | bill  |
+-------------+--------------------------+-------+

But when using the order by function on the day column with no offset, the output looks ok:

select * from ORDER_TEST order by day limit 20;
+-------------+--------------------------+----------+
|     ID      |           DAY            |   NAME   |
+-------------+--------------------------+----------+
| 7190093860  | 2018-02-20 11:00:00.000  | bill     |
| 7190093861  | 2018-02-20 11:00:00.000  | bill     |
| 7190093862  | 2018-02-21 12:45:00.000  | Testing  |
| 1111265745  | 2018-05-20 11:00:00.000  | bill     |
| 1112297916  | 2018-05-20 11:00:00.000  | bill     |
| 1112218906  | 2018-05-20 11:00:00.000  | bill     |
| 1112214458  | 2018-05-20 11:00:00.000  | bill     |
| 1111992155  | 2018-05-20 11:00:00.000  | bill     |
| 1111925373  | 2018-05-20 11:00:00.000  | bill     |
| 1111819491  | 2018-05-20 11:00:00.000  | bill     |
| 1111816611  | 2018-05-20 11:00:00.000  | bill     |
| 1111807594  | 2018-05-20 11:00:00.000  | bill     |
| 7189905749  | 2018-05-20 11:00:00.000  | bill     |
| 1111750510  | 2018-05-20 11:00:00.000  | bill     |
| 9999892377  | 2018-05-20 11:00:00.000  | bill     |
| 1111555188  | 2018-05-20 11:00:00.000  | bill     |
| 1111426110  | 2018-05-20 11:00:00.000  | bill     |
| 1112334322  | 2018-05-20 11:00:00.000  | bill     |
| 1111196581  | 2018-05-20 11:00:00.000  | bill     |
| 9999715395  | 2018-05-20 11:00:00.000  | bill     |
+-------------+--------------------------+----------+

But when the same query is made with an offset, the output dosent look correct:

select * from ORDER_TEST order by day limit 20 offset 10;
+-------------+--------------------------+-------+
|     ID      |           DAY            | NAME  |
+-------------+--------------------------+-------+
| 1112751810  | 2018-05-20 11:00:00.000  | bill  |
| 1112741906  | 2018-05-20 11:00:00.000  | bill  |
| 1112565559  | 2018-05-20 11:00:00.000  | bill  |
| 1112334322  | 2018-05-20 11:00:00.000  | bill  |
| 1112297916  | 2018-05-20 11:00:00.000  | bill  |
| 1112218906  | 2018-05-20 11:00:00.000  | bill  |
| 1112214458  | 2018-05-20 11:00:00.000  | bill  |
| 1111816611  | 2018-05-20 11:00:00.000  | bill  |
| 1111925373  | 2018-05-20 11:00:00.000  | bill  |
| 1111819491  | 2018-05-20 11:00:00.000  | bill  |
| 1111196581  | 2018-05-20 11:00:00.000  | bill  |
| 1111807594  | 2018-05-20 11:00:00.000  | bill  |
| 1111803389  | 2018-05-20 11:00:00.000  | bill  |
| 1111750510  | 2018-05-20 11:00:00.000  | bill  |
| 9999892377  | 2018-05-20 11:00:00.000  | bill  |
| 1113289754  | 2018-05-20 11:00:00.000  | bill  |
| 1111426110  | 2018-05-20 11:00:00.000  | bill  |
| 1111313845  | 2018-05-20 11:00:00.000  | bill  |
| 1111992155  | 2018-05-20 11:00:00.000  | bill  |
| 9999715395  | 2018-05-20 11:00:00.000  | bill  |
+-------------+--------------------------+-------+

First thing of note, as this is an offset of ten, the first result should be the eleventh result of the set with no offset, but its not, it dosent even show up in this result set. Second of note, is that the 5th, 6th, 7th, 9th, 10th, 12th, 14th, 15th, 17th and the 20th result is the same as that of the result in the query with no offset. These results are very speratic and not remotley accurate.

Is there something that im missing here or is there a legitimate bug with this. Any input would be much appreciated.

Don't have an account?
Coming from Hortonworks? Activate your account here