Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1543 | 07-09-2019 12:53 AM | |
9297 | 06-23-2019 08:37 PM | |
8052 | 06-18-2019 11:28 PM | |
8677 | 05-23-2019 08:46 PM | |
3474 | 05-20-2019 01:14 AM |
05-09-2019
01:33 AM
Are all of your processes connecting onto the same Impala Daemon, or are you using a load balancer / varying connection options? Each Impala Daemon can only accept a finite total number of active client connections, which is likely what you are running into. Typically for concurrent access to a DB, it is better to use a connection pooling pattern with finite connections shared between threads of a single application. This avoids overloading a target server. While I haven't used it, pyodbc may support connection pooling and reuse which you can utilise via threads in python, instead of creating separate processes. Alternatively, spread the connections around, either by introducing a load balancer, or by varying the target options for each spawned process. See https://www.cloudera.com/documentation/enterprise/latest/topics/impala_dedicated_coordinator.html and http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf for further guidance and examples on this.
... View more
05-08-2019
07:33 PM
1 Kudo
Are you looking for a sequentially growing ID or just a universally unique ID? For the former, you can use Curator over ZooKeeper with this recipe: https://curator.apache.org/curator-recipes/distributed-atomic-long.html For the latter, a UUID generator may suffice. For a more 'distributed' solution, checkout Twitter's Snowflake: https://github.com/twitter-archive/snowflake/tree/snowflake-2010
... View more
05-08-2019
06:57 PM
1 Kudo
@jbowles Yes, it is advisable to clean up the NM local directories when changing LCE setting, please see https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_sg_other_hadoop_security.html#topic_18_3 Important: Configuration changes to the Linux container executor could result in local NodeManager directories (such as usercache) being left with incorrect permissions. To avoid this, when making changes using either Cloudera Manager or the command line, first manually remove the existing NodeManager local directories from all configured local directories (yarn.nodemanager.local-dirs), and let the NodeManager recreate the directory structure.
... View more
05-08-2019
06:42 PM
1 Kudo
Running over a public IP may not be a good idea if it is open to the internet. Consider using a VPC? That said, you can point HBase Master and RegionServer to use the address from a specific interface name (eth0, eth1, etc.) and/or a specific DNS resolver (IP or name that can answer to a dns:// resolving call) via advanced config properties: hbase.master.dns.interface hbase.master.dns.nameserver hbase.regionserver.dns.interface hbase.regionserver.dns.nameserver By default the services will use whatever is the host's default name and resolving address: getent hosts $(hostname -f) and publish this to clients.
... View more
05-07-2019
09:58 PM
Depends on what you mean by 'storage locations'. If you mean "can other apps use HDFS?" then the answer is yes, as HDFS is an independent system unrelated to YARN and has its own access and control mechanisms not governed by a YARN scheduler. If you mean "can other apps use the scratch space on NM nodes" then the answer is no, as only local containers get to use that. If you're looking to strictly split both storage and compute, as opposed to just some form of compute, then it may be better to divide up the cluster entirely.
... View more
05-07-2019
05:48 PM
HDFS only stores two time points in its INode data structures/persistence: The modification time and the access time [1]. For files, the mtime is effectively the time of when the file was last closed (such as when originally written and closed, or when reopened for append and closed). In general use this does not change very much for most files you'll place on HDFS and can serve as a "good enough" creation time. Is there a specific use-case you have in mind that requires preservation of the original create time? [1] https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeAttributes.java#L61-L65
... View more
05-07-2019
05:24 PM
The simplest way is through Cloudera Hue. See http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/ That said, if you've attempted something and have run into issues, please add more details so the community can help you on specific topics.
... View more
05-07-2019
05:21 PM
It would help if you add along some description of what you have found or attempted, instead of just a broad question. What load balancer are you choosing to use? We have some sample HAProxy configs at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html#tut_proxy for Impala that can be repurposed for other components. Hue also offers its own pre-optimized Load Balancer as roles in Cloudera Manager that you can add and have it setup automatically: https://www.cloudera.com/documentation/enterprise/latest/topics/hue_perf_tuning.html
... View more
05-05-2019
08:58 PM
> So If i want to fetch all defined mapreduce properties,can i use this Api or it does have any pre-requisites? Yes you can. The default role group mostly always exists even if role instances do not, but if not (such as in a heavily API driven install) you can create one before you fetch. > Also does it require any privileges to access this api? A read-only user should also be able to fetch configs as a GET call over API. However, if there are configs marked as secured (such as configs that carry passwords, etc.) then the value retrieval will require admin privileges - they will otherwise appear redacted.
... View more
04-30-2019
06:44 AM
In some cases, when a Daemon has troubles with AD connection protocol, from that server it´s impossible to retrieve user-group assignation information. If your work casually is launched from that server, you obtain an error, but if the work is launched from another server without that problems, you look as the launch was fine. It´s strange, but a possibility...
... View more