About mbigelow

mbigelow · ‎06-26-2017

Rack awareness service three purposes data locality,data redundacy and reducing the network bandwidth requirement. The replication factor sets your data redudancy level. It does not seem wise to be abitrarily changing either due to cluster growth. Simply buy more nodes and expand. To address the original question: 1. Changing the replication factor will mark the third block in all sets as bad and remove it. Due to the write workflow of HDFS that means that the remaining two block will be split between at least two racks. 2. Adjusting the rack topology will not impact any existing data. It will effect MR job performance as now blocks may not be local within the new rack topology. Newly written data will be split between the two racks. No matter the order, if you do both you will be adding the risk of your two blocks existing withing the same rack. You can run the balancer immediately after and that should help as the balancer will abide by the new rack topology but it won't touch or move all of the blocks.

mbigelow · ‎06-24-2017

I am not positive on this but I think this is a HS2 setting as the functionality is that at HS2 it decides whether to run it locally or to launch a MR job. Try applying the change to HS2 and restarting.

mbigelow · ‎06-23-2017

Ah, what you are looking for is the setting: Fetch Task Query Conversion hive.fetch.task.conversion Setting this to none will force all queries to run in MR.

mbigelow · ‎06-23-2017

This has to do with the YARN memory settings. The amount of memory allocated to yarn is only 8 GB. I don't know what the minimum container size is, probably around 1.3 G. That combination of the two determine the amount of containers that can be launched. The result of that for your cluster is 6 containers. Anything beyond that will have to wait for resources to be freed up. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html

mbigelow · ‎06-23-2017

No you cannot. That file is used to store impala-shell configuraiton settings (i.e. -k or kerberos) and not Impala session variables. https://www.cloudera.com/documentation/enterprise/5-3-x/topics/impala_shell_options.html

mbigelow · ‎06-23-2017

That setting, mapreduce.framework.name, can be found in the the yarn-site.xml. Check for it under /etc/hadoop/conf/ and for the value. If it is there with yarn as the value, then it is likely that HS2 is not running with the correct Hadoop environmental variables like HADOOP_CONF_DIR. If it isn't there or the value is incorrect then try installing the YARN gateway role on the HS2 host.

mbigelow · ‎06-22-2017

The database it is trying to access is the backend to the Hive Metastore. Are you able to access and view databases and tables in Hive?

mbigelow · ‎06-19-2017

@andrzej_jedrzej what specifically lead you to this? I know at times it can be difficult to troubleshoot issues in NTP and the various commands get confusing (i.e. ntpdate, nptq, etc.). Chrony and NTP look very similar in the install and configuration. What exactly is so different between them?

mbigelow · ‎06-19-2017

The table definition defines the virtual/partition column and in HDFS it is created as directories and subdirectories. So it checks the table definition and then searches for a directory under the table directory that matches the partition column name, and then prunes by the value.

mbigelow · ‎06-19-2017

Ah yes, sorry, add princ is for adding the principal to the kerberos database. Add entry is for adding an entry to be written to a keytab file using ktuil. Yes, do add_entry for cloudera/admin@IM, and then wkt.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Changing rack awareness in a running Hadoop cl...

Re: HiveServer2 - disable local task execution

Re: HiveServer2 - disable local task execution

Re: yarn application always in pending

Re: Define default request_pool in impalarc

Re: HiveServer2 - disable local task execution

Re: Spark failed to get database default

Re: Should Cloudera NTP use Chrony or NTPD?

Re: query on partition question

Re: Enable Kerberos for Cluster - import_credentia...