About Harsh J

frisch · ‎10-28-2019

Since Hadoop 2.8, it is possible to make a directory protected and so all its files cannot be deleted, using : fs.protected.directories property. From documentation: "A comma-separated list of directories which cannot be deleted even by the superuser unless they are empty. This setting can be used to guard important system directories against accidental deletion due to administrator error." It does not exactly answer the question but it is a possibility.

sparkr · ‎10-24-2019

The application might be expecting the log folder to be there in order to generate logs in it. Seems like your problem can be solved by creating the folder in the driver node: /some/path/to/edgeNode/ I hope you also know that you have mentioned the log4j file only for driver program. In order for executors to generate logs you may need to specify the following option in spark-submit "spark.executor.extraJavaOptions=-Dlog4j.configuration=driver_log4j.properties"

JonesSarmento · ‎10-16-2019

hello, @amirmam Did you manage to solve? I have the same problem with the current version of CDH 6.1.1 thanks!

npoorna20 · ‎10-07-2019

Hi Dwill, Is it worked for you Sqoop import with ssl enabled oracle db ? I got the same kind of requirement to use sqoop import with ssl enabled db and I am trying connecting through oracle wallet but getting network adapter issues. Could you please provide me the steps if it is working fine for you ? Thank you.

Harsh J · ‎10-05-2019

The original issue described here is not applicable to your version. In your case it could simply be a misconfiguration that's causing oozie to not load the right hive configuration required to talk to the hive service. Try enabling debug logging on the oozie server if you are unable to find an error in it. Also try to locate files or jars in your workflow that may be supplying an invalid hive client XML.

Krishna · ‎09-25-2019

Hi Harsha, Thanks for the explanation. In extension to the topic, I need small clarification - we recently implemented sentry on impala, based on below KB [1] article, we can't execute "Invalidate all metadata and rebuild index" and "Perform incremental metadata update" , since we don't have access to all the DB's, it's fair as well. Now my question is - 1. I am not able to see new DB in Hue impala, I can see the same from beeline or impala shell. How to fix or solve this ? 2. I can execute invalidate metadata on table from impala shell but I have 50+ DB's and 10's of tables in each db. Is there any option to run invalidate metadata ion DB level instead of individual table? [1] https://my.cloudera.com/knowledge/INVALIDATE-METADATA--Sentry-Enabled--ERROR? id=71141 Thanks Krishna

kranthi9625 · ‎08-23-2019

removing hidden files worked for me. thanks a lot

benhadoop · ‎08-20-2019

Hi @lsouvleros, as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context. Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance. In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities. In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant. For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments. In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS. This is my personal opinion and both EMC and Cloudera might have stronger arguments for their respective storage (e.g. [EMC link]). You can also look for the latest announcement for the blog. Regards, Benjamin

vinayk · ‎08-08-2019

Hey Harsh Thanks for responding. As multiple client are requesting data to hbase, at some point, sometimes user don’t get data, EOF exception or connection interruptions occur. We are not able to track the record of requested data and size of input and output data sending to end user. Regards Vinay K

Zookeeper · ‎08-04-2019

Zookeeper works on quorum. And quorum holds the majority of servers rules. If you have 3 servers and one is down, then Majority of servers are working. you can read further Zookeeper Quorum

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Undeletable HDFS Files

Re: Config log4j in Spark

Re: Include latest hbase-spark in CDH

Re: Sqoop from Oracle with SSL

Re: Oozie/hive-server2 not able to connect to hive...

Re: HUE Clear Cache, Perform Incremental Metadata ...

Re: host acquiring installation lock forever

Re: Isilon HDFS vs CDH HDFS

Re: view queries and results in logs running from ...

Re: Behavior of 3 node Zookeeper quorum when 1 nod...