Member since
04-12-2016
30
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1687 | 05-02-2016 06:42 AM |
03-08-2017
02:27 PM
FYI: With above description we were able to upgrade to version 2.5.3 without any Kafka cluster downtime. We only had some issues with a Kafka client written in Go.
... View more
07-05-2016
09:24 AM
1 Kudo
Below our findings: As shown in
the DDL above, bucketing is used in the problematic tables. Bucket number gets
decided according to hashing algorithm, out of 10 buckets for each insert 1
bucket will have actual data file and other 9 buckets will have same file name
with zero size. During this hash calculation race condition is happening when inserting
a new row into the bucketed table via multiple different threads/processes, due
to which 2 or more threads/processes are trying to create the same bucket file. In addition,
as discussed here, the current architecture is not really recommended as over the period of time there would be millions of files on HDFS,
which would create extra overhead on the Namenode. Also select * statement
would take lot of time as it will have to merge all the files from bucket. Solutions which solved both issues: Removed buckets from the two
problematic tables, hece the probability of race conditions will be very less Added hive.support.concurrency=true before the insert statements Weekly Oozie workflow that uses implicit Hive concatenate command on both tables to mitigate the small file problem FYI @Ravi Mutyala
... View more
04-19-2016
02:55 PM
Latest 2.3 release notes http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4.7/bk_HDP_RelNotes/content/errata_flume_kafka_sink.html You do not need Flume 1.6 on HDP to have it working.
... View more
04-20-2016
03:16 PM
I found it easier to download the json to a file, edit it on disk, then use the `-d @filename.json` in the final step to avoid lengthy command lines. The only caveat: the "href" line needs to be deleted from the json obtained in step two.
... View more
01-11-2017
01:39 PM
@vperiasamy It worked for me as well.. Thanks.
... View more
11-19-2018
02:32 AM
Our Ambari version is 2.5.1, but this problem still occurs: after adding a new data disk on the datanode, there is a corrupt blocks[1] on the ambari UI, but hdfs fsck / (even with the -includeSnapshot option) still shows There is no broken block, we restarted the ambari server, the ambari agent, the ambari metrics, still useless. Please help, thank you very much.
... View more