Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3978 | 10-13-2017 09:42 PM | |
| 7475 | 09-14-2017 11:15 AM | |
| 3799 | 09-13-2017 10:35 PM | |
| 6035 | 09-13-2017 10:25 PM | |
| 6602 | 09-13-2017 10:05 PM |
06-06-2017
10:50 AM
1 Kudo
Yes, this is by design. The master roles for Impala preform functions for all Impala daemons, like caching metadata from HMS and HDFS block locations, maintaining a list of available Impala daemons, etc. But they do not manage connections. Each individual Impala daemon will managed the connections made to it and act as the coordinator for those connections. For production, I recommend putting a load balancer in front of your Impala daemons to spread that connection load across all. Otherwise, having all users connect to a single one will exhaust the memory for that Impala daemon quickly. Another option I have seen is assigning blocks of Impala daemons to specific users groups.
... View more
06-06-2017
10:33 AM
What do you have for your placement rules? I forget the CDH defaults but I know a fresh cluster will have that setting as true and it will create user queues automatically. The Hadoop docs do have this note regarding the settings. That is why I asked about the placement rules you have. "If a queue placement policy is given in the allocations file, this property is ignored." https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
... View more
06-06-2017
10:22 AM
SASL is an over-the-wire encryption method. I don't think it used by Impala. It is used by thrift clients live Hive CLI and Beeline. Impala has the option to enable SSL for encryption or to have nothing. Do you have SSL enabled for Impala? Port 1433 is not the default port for Impala. Did you change it?
... View more
06-01-2017
10:49 PM
You want AuthMech=1 and all of the settings that go with that (principal name, realm, fqdn, etc.). AuthMech 3 is for LDAP authentication. Also, make sure the users has a valid ticket prior to trying the driver. For *nix and Mac klist should be available for windows you can install the MIT Windows Ticket Manager to view and retrieve a Kerberos ticket.
... View more
06-01-2017
10:51 AM
Which nodes is not activated on? Check the CM server logs and CM agents on the problematic hosts. You can also try restarting the CM service to see if that stops the process. I'd be interested if it has actually finished the process on all nodes. You can check the /opt/cloudera/parcels directory to see if the symlink to the version being activated is in place.
... View more
05-31-2017
09:51 PM
1 Kudo
Do either of those ids work with only one listed? That will rule out bad entity ids. Based on one of the other calls you could try the below URL. http://vm-cloudera-59:7187/api/v10/lineage3/?entityIds=4500&entityIds=5267
... View more
05-31-2017
09:10 PM
I don't know as I have only ever installed CM with an external DB. If that is displayed in the CM UI, I feel like yes, it will go away once you start using an external DB, especially since it mentions the embedded DB.
... View more
05-31-2017
10:29 AM
3 Kudos
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container). Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings. BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores). https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_yarn_tuning.html
... View more
05-31-2017
09:36 AM
1 Kudo
The difference between the paths is that the Path A, with the installer, installs the JDK and the Postgres DB. You can install the version you need of JDK, following Cloudera's recommendations, update alternatives, and remove the old. On the DB side, install your DB of choice, configure it based on Cloudera's recommendations, create the users, databases, and tables, migrate data if you need to retain the metrics and update the db.properties for CM. You will need to create users, dbs, and tables for the CM services as well, rmon and amon, and update their configuration.
... View more
05-31-2017
09:25 AM
What process is using up more CPU on the one node? Is it the DN, NM, or YARN containers? I think it is likely a imbalance of data or blocks that is causing it. That would cause more containers to run on that node compared to the rest. The hdfs dfsadmin -report will give you the info to determine if it balanced or not on the data front. It is escaping me right now the best way to check for the block distribution (I think CM has a graph some where).
... View more