About Harsh J

Harsh J · ‎05-09-2019

Are all of your processes connecting onto the same Impala Daemon, or are you using a load balancer / varying connection options? Each Impala Daemon can only accept a finite total number of active client connections, which is likely what you are running into. Typically for concurrent access to a DB, it is better to use a connection pooling pattern with finite connections shared between threads of a single application. This avoids overloading a target server. While I haven't used it, pyodbc may support connection pooling and reuse which you can utilise via threads in python, instead of creating separate processes. Alternatively, spread the connections around, either by introducing a load balancer, or by varying the target options for each spawned process. See https://www.cloudera.com/documentation/enterprise/latest/topics/impala_dedicated_coordinator.html and http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf for further guidance and examples on this.

Harsh J · ‎05-08-2019

Are you looking for a sequentially growing ID or just a universally unique ID? For the former, you can use Curator over ZooKeeper with this recipe: https://curator.apache.org/curator-recipes/distributed-atomic-long.html For the latter, a UUID generator may suffice. For a more 'distributed' solution, checkout Twitter's Snowflake: https://github.com/twitter-archive/snowflake/tree/snowflake-2010

SidAhuja · ‎05-08-2019

@jbowles Yes, it is advisable to clean up the NM local directories when changing LCE setting, please see https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_sg_other_hadoop_security.html#topic_18_3 Important: Configuration changes to the Linux container executor could result in local NodeManager directories (such as usercache) being left with incorrect permissions. To avoid this, when making changes using either Cloudera Manager or the command line, first manually remove the existing NodeManager local directories from all configured local directories (yarn.nodemanager.local-dirs), and let the NodeManager recreate the directory structure.

Harsh J · ‎05-08-2019

Running over a public IP may not be a good idea if it is open to the internet. Consider using a VPC? That said, you can point HBase Master and RegionServer to use the address from a specific interface name (eth0, eth1, etc.) and/or a specific DNS resolver (IP or name that can answer to a dns:// resolving call) via advanced config properties: hbase.master.dns.interface hbase.master.dns.nameserver hbase.regionserver.dns.interface hbase.regionserver.dns.nameserver By default the services will use whatever is the host's default name and resolving address: getent hosts $(hostname -f) and publish this to clients.

Harsh J · ‎05-07-2019

Depends on what you mean by 'storage locations'. If you mean "can other apps use HDFS?" then the answer is yes, as HDFS is an independent system unrelated to YARN and has its own access and control mechanisms not governed by a YARN scheduler. If you mean "can other apps use the scratch space on NM nodes" then the answer is no, as only local containers get to use that. If you're looking to strictly split both storage and compute, as opposed to just some form of compute, then it may be better to divide up the cluster entirely.

Harsh J · ‎05-07-2019

HDFS only stores two time points in its INode data structures/persistence: The modification time and the access time [1]. For files, the mtime is effectively the time of when the file was last closed (such as when originally written and closed, or when reopened for append and closed). In general use this does not change very much for most files you'll place on HDFS and can serve as a "good enough" creation time. Is there a specific use-case you have in mind that requires preservation of the original create time? [1] https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeAttributes.java#L61-L65

Harsh J · ‎05-07-2019

The simplest way is through Cloudera Hue. See http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/ That said, if you've attempted something and have run into issues, please add more details so the community can help you on specific topics.

Harsh J · ‎05-07-2019

It would help if you add along some description of what you have found or attempted, instead of just a broad question. What load balancer are you choosing to use? We have some sample HAProxy configs at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html#tut_proxy for Impala that can be repurposed for other components. Hue also offers its own pre-optimized Load Balancer as roles in Cloudera Manager that you can add and have it setup automatically: https://www.cloudera.com/documentation/enterprise/latest/topics/hue_perf_tuning.html

Harsh J · ‎05-05-2019

> So If i want to fetch all defined mapreduce properties,can i use this Api or it does have any pre-requisites? Yes you can. The default role group mostly always exists even if role instances do not, but if not (such as in a heavily API driven install) you can create one before you fetch. > Also does it require any privileges to access this api? A read-only user should also be able to fetch configs as a GET call over API. However, if there are configs marked as secured (such as configs that carry passwords, etc.) then the value retrieval will require admin privileges - they will otherwise appear redacted.

FrankyFlowB · ‎04-30-2019

In some cases, when a Daemon has troubles with AD connection protocol, from that server it´s impossible to retrieve user-group assignation information. If your work casually is launched from that server, you obtain an error, but if the work is launched from another server without that problems, you look as the launch was fine. It´s strange, but a possibility...

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Error message from ODBC connection

Re: Generating Unique ID using Zookeeper

Re: Can't create directory /yarn/nm/usercache/urik...

Re: HBase : Zookeeper serves Internal IP to Remote...

Re: Is it possible to reserve whole nodes for excl...

Re: How to get file or directory creation time in ...

Re: how to run a sample hive query using oozie?

Re: how to configure and setup load balancer for H...

Re: How to fetch mapreduce properties using cm res...

Re: Granted permissions of tables to user but stil...