About mbigelow

mbigelow · ‎06-06-2017

Yes, this is by design. The master roles for Impala preform functions for all Impala daemons, like caching metadata from HMS and HDFS block locations, maintaining a list of available Impala daemons, etc. But they do not manage connections. Each individual Impala daemon will managed the connections made to it and act as the coordinator for those connections. For production, I recommend putting a load balancer in front of your Impala daemons to spread that connection load across all. Otherwise, having all users connect to a single one will exhaust the memory for that Impala daemon quickly. Another option I have seen is assigning blocks of Impala daemons to specific users groups.

mbigelow · ‎06-06-2017

What do you have for your placement rules? I forget the CDH defaults but I know a fresh cluster will have that setting as true and it will create user queues automatically. The Hadoop docs do have this note regarding the settings. That is why I asked about the placement rules you have. "If a queue placement policy is given in the allocations file, this property is ignored." https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

mbigelow · ‎06-06-2017

SASL is an over-the-wire encryption method. I don't think it used by Impala. It is used by thrift clients live Hive CLI and Beeline. Impala has the option to enable SSL for encryption or to have nothing. Do you have SSL enabled for Impala? Port 1433 is not the default port for Impala. Did you change it?

mbigelow · ‎06-01-2017

You want AuthMech=1 and all of the settings that go with that (principal name, realm, fqdn, etc.). AuthMech 3 is for LDAP authentication. Also, make sure the users has a valid ticket prior to trying the driver. For *nix and Mac klist should be available for windows you can install the MIT Windows Ticket Manager to view and retrieve a Kerberos ticket.

mbigelow · ‎06-01-2017

Which nodes is not activated on? Check the CM server logs and CM agents on the problematic hosts. You can also try restarting the CM service to see if that stops the process. I'd be interested if it has actually finished the process on all nodes. You can check the /opt/cloudera/parcels directory to see if the symlink to the version being activated is in place.

mbigelow · ‎05-31-2017

Do either of those ids work with only one listed? That will rule out bad entity ids. Based on one of the other calls you could try the below URL. http://vm-cloudera-59:7187/api/v10/lineage3/?entityIds=4500&entityIds=5267

mbigelow · ‎05-31-2017

I don't know as I have only ever installed CM with an external DB. If that is displayed in the CM UI, I feel like yes, it will go away once you start using an external DB, especially since it mentions the embedded DB.

mbigelow · ‎05-31-2017

The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container). Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings. BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores). https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_yarn_tuning.html

mbigelow · ‎05-31-2017

The difference between the paths is that the Path A, with the installer, installs the JDK and the Postgres DB. You can install the version you need of JDK, following Cloudera's recommendations, update alternatives, and remove the old. On the DB side, install your DB of choice, configure it based on Cloudera's recommendations, create the users, databases, and tables, migrate data if you need to retain the metrics and update the db.properties for CM. You will need to create users, dbs, and tables for the CM services as well, rmon and amon, and update their configuration.

mbigelow · ‎05-31-2017

What process is using up more CPU on the one node? Is it the DN, NM, or YARN containers? I think it is likely a imbalance of data or blocks that is causing it. That would cause more containers to run on that node compared to the rest. The hdfs dfsadmin -report will give you the info to determine if it balanced or not on the data front. It is escaping me right now the best way to check for the block distribution (I think CM has a graph some where).

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Impala connection question

Re: yarn.scheduler.fair.user-as-default-queue - No...

Re: Impala Connection using JDBC fails after enabl...

Re: Impala Connection using JDBC fails after enabl...

Re: Cloudera Parcels stuck

Re: Using Rest API calls how would I call a lineag...

Re: Shifting Cloudera Manager Installed For Non-Pr...

Re: ERROR:is running beyond physical memory limits...

Re: Shifting Cloudera Manager Installed For Non-Pr...

Re: Spark transformation becomes very slow at time...