Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

phoenix creating duplicates

avatar
Contributor

Hello,

We're seeing lots of duplicates in phoenix table, while not in respective Hbase table.

Result : The total count in phoenix table is 3 times to respective Hbase table(say total count, in phoenix is 30 millions and in respective Hbase table is 10 million only).We checked for specific row-keys as well, there are duplicates in Phoenix , but not in Hbase.

More details :For this table we're using SALT_BUCKET property of phoenix and global index on one of the column and Phoenix version is 4.7.

We're consuming data from kafka and storing in Pheonix via Storm-JDBC connector.

Also this is reproducing only when there is so much concurrent requests.Till now we're unable to replicate on normal (dev) environment.

Please guide us if we're missing some config or some other pointers.

@Dhiraj

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @Dhiraj Sardana

Can you try deleting STATS for table and check count again?

delete from SYSTEM.STATS where physical_name='<TABLE_NAME>'

-Shubham

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hi @Dhiraj Sardana

Can you try deleting STATS for table and check count again?

delete from SYSTEM.STATS where physical_name='<TABLE_NAME>'

-Shubham

avatar
Contributor

Thanks Shubham!

Now we can't see duplicates , we haven't deleted STATS, may be some metadata sync-up removed the issue(may be we'll see it again in some time).I've few queries:

1) Is it safe to delete these tables on production, i mean is phoenix automatically recreates these tables.

2) I tried to look for some web references where i can see that how Phoenix updates/refer these system tables and if there are some config impacting these scenarios.Please share if there is some reference pointing to this specific section of phoenix.

avatar
Expert Contributor

1) Is it safe to delete these tables on production, i mean is phoenix automatically recreates these tables.

Yeah, we can safely remove the entry from STATS table. STATS will get updated automatically after every 15 minutes or during compaction.

How to manually generate STATS - https://phoenix.apache.org/update_statistics.html

2) I tried to look for some web references where i can see that how Phoenix updates/refer these system tables and if there are some config impacting these scenarios.Please share if there is some reference pointing to this specific section of phoenix.

Phoenix uses SYSTEM.STATS table contains stats like guideposts which are used to determine the number of scans.

Some important parameters:

phoenix.stats.guidepost.width - Server-side parameter that specifies the number of bytes between guideposts. A smaller amount increases parallelization, but also increases the number of chunks which must be merged on the client side. The default value is 100 MB.

phoenix.stats.enabled - Whether STATS collection is enabled. By default it is enabled. https://phoenix.apache.org/tuning.html - Details of All parameters.

Let me know if you need more information.

avatar
Expert Contributor

@Dhiraj Sardana If information helped you, Could you please accept answer?