Member since
05-20-2016
14
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1618 | 02-16-2017 10:54 PM |
09-10-2020
08:28 PM
It's best to run: ambar-server setup-security and use Option1 to update https certificates. It will ask for crt and key files, and automatically updates relevant files behind the scene. The solution mentioned above doesn't work for me. After security setup, restart ambari server: ambari-server restart
... View more
07-05-2016
09:24 AM
1 Kudo
Below our findings: As shown in
the DDL above, bucketing is used in the problematic tables. Bucket number gets
decided according to hashing algorithm, out of 10 buckets for each insert 1
bucket will have actual data file and other 9 buckets will have same file name
with zero size. During this hash calculation race condition is happening when inserting
a new row into the bucketed table via multiple different threads/processes, due
to which 2 or more threads/processes are trying to create the same bucket file. In addition,
as discussed here, the current architecture is not really recommended as over the period of time there would be millions of files on HDFS,
which would create extra overhead on the Namenode. Also select * statement
would take lot of time as it will have to merge all the files from bucket. Solutions which solved both issues: Removed buckets from the two
problematic tables, hece the probability of race conditions will be very less Added hive.support.concurrency=true before the insert statements Weekly Oozie workflow that uses implicit Hive concatenate command on both tables to mitigate the small file problem FYI @Ravi Mutyala
... View more