Member since
03-22-2016
27
Posts
9
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3355 | 09-02-2016 05:00 PM | |
2392 | 08-16-2016 06:58 AM | |
4343 | 06-08-2016 12:19 PM |
10-23-2018
04:44 PM
Its better not to disturb the properties on the statistics usage like hive.compute.query.using.stats. It impacts the way the statistics are used in your query for performance optimization and execution plans. It has tremendous influence on execution plans, the statistics stored depends on the file format as well. Therefore definitely not a solution to change any property with regards to statistics. The real reason for count not working correctly is the statistics not updated in the hive due to which it returns 0. When a table is created first, the statistics is written with no data rows. Thereafter any data append/change happens hive requires to update this statistics in the metadata. Depending on the circumstances hive might not be updating this real time. Therefore running the ANALYZE command recomputes this statistics to make this work correctly.
... View more
06-27-2017
06:27 PM
1 Kudo
Well there are many disadvantages of using replication factor 1 and we strongly do not recommend it for below reasons: 1. Data loss --> One or more datanode or disk failure will result in data loss. 2. Performance issues --> Having replication factor of more than 1 results in more parallelization. 3. Handling Failure --> With replication factor > 1, one or more Datanode doesn't result in job failure.
... View more