I have following table and index in phoenix
CREATE TABLE TEST_TABLE ( C1 VARCHAR NOT NULL, C2 INTEGER NOT NULL, C3 VARCHAR, C4 VARCHAR, C5 VARCHAR, C6 VARCHAR, C7 VARCHAR, C8 VARCHAR, C9 VARCHAR, C10 VARCHAR, C11 DATE, C12 VARCHAR, C13 VARCHAR, C14 DECIMAL(10,2), C15 DECIMAL(10,2), C16 VARCHAR, C17 BOOLEAN, C18 DATE NOT NULL, C19 DATE CONSTRAINT CTKEY PRIMARY KEY (C1, C2, C18 DESC ROW_TIMESTAMP)) DATA_BLOCK_ENCODING='FAST_DIFF', COMPRESSION = 'SNAPPY';
CREATE INDEX TEST_INDEX on TEST_TABLE("C4","C17") INCLUDE("C1","C2","C18") DATA_BLOCK_ENCODING='FAST_DIFF', COMPRESSION = 'SNAPPY';
The above table and index goes out of sync with each other after some time. When I check total number of records in both index and table after some time, it shows less records in index. I checked the HBase logs and don't see any error related to index.
It really depends on the version of Phoenix and the way how you are using it. For 4.7 there is a problem that MR bulk load works incorrectly with ROW_TIMESTAMP fields (it overrides user timestamp with server one). For 4.4 parallel writes to the same row from different clients may cause index getting out of sync. Another possible reason is outdated statistic when SYSTEM.STATS has incorrect boundaries for some region, so part of the data is not covered by scans. This can be fixed by updating statistics for the table. The first thing I would suggest is check whether the row count in physical table is equal to the value that select count produce.
We are not using any bulk load tool to populate this table. A simple upsert statement is used via JDBC connection using phoenix thin client driver. I tried updating statistics but it still went out of sync. The phoenix version is 4.7
Have you checked the number of record in physical hbase table? Is it the same as select count(1) returns? From Phoenix perspective we consider writes to data table and index tables as an atomic operation, so if the write to index table fail, the upsert to the data table would fail as well. There are a few cases when they may get out of sync, but they all are related to the cases when there are a concurrent updates for the same row. Could you please share your scenario?
Is there any way to increase the consistency guarantee between existing index and table so that they never goes out of sync.
Does we need to make index as transnational or table? Can we do it on existing one or need to create a new one?