Regarding to enable HIVE ACID transactions on the cluster.
Nowadays, we have currently 300-node hadoop cluster - Hortonworks Data Platform, version 2.6.4.
We have been analyzing issues if we apply/enable this feature in entire cluster, our legacy hive tables (about 1700 tables) hasn't this feature and we'd like to know if have some issue/damage when enable this on the existing tables.
In 2.6 ACID needs to be enabled on a per table basis in addition to enabling ACID transactions globally in Ambari. It's worth noting that you should consider your workload before enabling ACID on a table, for tables with large volume updates/deletes this could cause issues related to performance
HDP 3.0 features ACID v2 for Hive, with significant improvements, when upgrading to 3.0 ACID v2 is enabled globally since these performance impacts have been reduced to negligible levels
Thank you @rtheron, but we have been studying possible issues if we apply/enable this feature globally in our cluster, our legacy hive tables (about 1700) hasn't this feature and we'd like to know if have some issue/damage when enable this on the existing hive tables.
In HDP 3.0 and above there should be no significant impact to tables that have ACIDv2 applied, in fact, the standard update process to HDP 3.0 will enable this globally by default, keep in mind, tables that have significant amounts of updates/deletes issued to them may see some degradation in performance in various scenarios, please review the 3.0 documentation for more detail on this. the ability to update/delete on a given table is controlled by ranger and users will require the appropriate ranger permissions in order to use these capabilities
Non of us is using HDP 3.0. what about HDP 2.6?
If it ACID always have been "off" the last half year and 100 hive tables (text, Avro, orc) have been created.
If we change to ACID as default, what other effect then performance do we need to consider?
Is old table been converted to ACID tables?
Will old tables still work as expected?