About ravi1

ravi1 · ‎05-05-2016

If it is a small cluster, you can skip passwordless ssh and use manual ambari-agent install. Steps for that are here. While this is not a solution to your passwordless ssh issue, this works very well on smaller clusters (I did manual registration on larger clusters too since there were policies of no passwordless ssh for su user).

felix_karabanov · ‎05-04-2016

Thank you @bsaini it worked great.

figo1984 · ‎03-27-2019

The link is invalid. JOAO's link is valid now.

chri · ‎07-05-2016

Below our findings: As shown in the DDL above, bucketing is used in the problematic tables. Bucket number gets decided according to hashing algorithm, out of 10 buckets for each insert 1 bucket will have actual data file and other 9 buckets will have same file name with zero size. During this hash calculation race condition is happening when inserting a new row into the bucketed table via multiple different threads/processes, due to which 2 or more threads/processes are trying to create the same bucket file. In addition, as discussed here, the current architecture is not really recommended as over the period of time there would be millions of files on HDFS, which would create extra overhead on the Namenode. Also select * statement would take lot of time as it will have to merge all the files from bucket. Solutions which solved both issues: Removed buckets from the two problematic tables, hece the probability of race conditions will be very less Added hive.support.concurrency=true before the insert statements Weekly Oozie workflow that uses implicit Hive concatenate command on both tables to mitigate the small file problem FYI @Ravi Mutyala

TimothySpann · ‎10-08-2017

easy to integrate NiFi -> Kafka -> Spark or Storm or Flink or APEX Also NiFi -> S2s -> Spark / Flink / ...

ravi1 · ‎04-30-2016

ulimit -n 8096 Try this and restart DN and NN and see if this works. I haven't seen your DN logs but it looks like you are running into max open files issue.

ravi1 · ‎04-29-2016

As @nyadav pointed out, you need to use the URL as jdbc:sqlserver://xx.xx.x.xxx:1433;databaseName=sample instead of the way that you are entereing for SQL Server. List databases worked since you haven't used a database in the jdbc URL there.

egarelnabi · ‎07-24-2017

For a comparison between compression formats take a look at this link: http://comphadoop.weebly.com/

ravi1 · ‎05-13-2016

@alain TSAFACK Please try accepting the answer that answered your question. Avoid accepting your own answers unless you have done your research after asking the question and have an answer.

sperry_it · ‎06-01-2016

Thanks Ravi. This solved my rpoblem also.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Please assist on ssh key gen

Re: Last step of Ambari HDP installation fails for...

Re: What's the difference between Bridge and Hook ...

Re: INSERT INTO TABLE failing with error while mo...

Re: Machine Learning in Apache Nifi

Re: Ambari's HDFS service on ambari not running on...

Re: hi, i am able to list the databases but i am u...

Re: Sqoop Import to Hive with Compression

Re: Import data directly in as-parquetfile format

Re: Problem during Ambari Confirm Hosts (Error whi...