Created 03-27-2018 10:52 PM
Hello,
We're seeing lots of duplicates in phoenix table, while not in respective Hbase table.
Result : The total count in phoenix table is 3 times to respective Hbase table(say total count, in phoenix is 30 millions and in respective Hbase table is 10 million only).We checked for specific row-keys as well, there are duplicates in Phoenix , but not in Hbase.
More details :For this table we're using SALT_BUCKET property of phoenix and global index on one of the column and Phoenix version is 4.7.
We're consuming data from kafka and storing in Pheonix via Storm-JDBC connector.
Also this is reproducing only when there is so much concurrent requests.Till now we're unable to replicate on normal (dev) environment.
Please guide us if we're missing some config or some other pointers.
@Dhiraj
Created 03-30-2018 06:44 AM
Can you try deleting STATS for table and check count again?
delete from SYSTEM.STATS where physical_name='<TABLE_NAME>'
-Shubham
Created 03-30-2018 06:44 AM
Can you try deleting STATS for table and check count again?
delete from SYSTEM.STATS where physical_name='<TABLE_NAME>'
-Shubham
Created 04-04-2018 03:00 PM
Thanks Shubham!
Now we can't see duplicates , we haven't deleted STATS, may be some metadata sync-up removed the issue(may be we'll see it again in some time).I've few queries:
1) Is it safe to delete these tables on production, i mean is phoenix automatically recreates these tables.
2) I tried to look for some web references where i can see that how Phoenix updates/refer these system tables and if there are some config impacting these scenarios.Please share if there is some reference pointing to this specific section of phoenix.
Created 04-04-2018 06:49 PM
1) Is it safe to delete these tables on production, i mean is phoenix automatically recreates these tables.
Yeah, we can safely remove the entry from STATS table. STATS will get updated automatically after every 15 minutes or during compaction.
How to manually generate STATS - https://phoenix.apache.org/update_statistics.html
2) I tried to look for some web references where i can see that how Phoenix updates/refer these system tables and if there are some config impacting these scenarios.Please share if there is some reference pointing to this specific section of phoenix.
Phoenix uses SYSTEM.STATS table contains stats like guideposts which are used to determine the number of scans.
Some important parameters:
phoenix.stats.guidepost.width - Server-side parameter that specifies the number of bytes between guideposts. A smaller amount increases parallelization, but also increases the number of chunks which must be merged on the client side. The default value is 100 MB.
phoenix.stats.enabled - Whether STATS collection is enabled. By default it is enabled. https://phoenix.apache.org/tuning.html - Details of All parameters.
Let me know if you need more information.
Created 04-19-2018 12:43 PM
@Dhiraj Sardana If information helped you, Could you please accept answer?