About mbigelow

mbigelow · ‎06-29-2017

This usually means that another adaptor is being picked up by the test. For me it was the loopback and it doesn't have a speed or mode, so the health test fails. Use ethtool to examine your adaptors and find the one that doesn't have a speed. Add a regex to exclude it in Network Interface Collection Exclusion Regex under the Host configuration screen. My regex for the loopback adaptor (lo) is ^lo$

mbigelow · ‎06-29-2017

You would need to add the copied directory in as a DFS directory. Even then, I don't know if the NN will pick them up as the same blocks since a different DN will have them on their report. Typically, if a DN reports a block that doesn't match the NN, the NN tells it to delete it. The safe approach is to recommission the old node, change the replication factor, and then decommission it again.

mbigelow · ‎06-28-2017

@csguna No, the YARN gateway, HDFS gateway, Hive gateway, etc. Each of these will install the binaries, libraries, set env vars, and client configuration files for its service.

mbigelow · ‎06-27-2017

They should be under /etc/<service>/conf, where service can be hadoop, hive, yarn, etc.

mbigelow · ‎06-27-2017

I don't know of any hard limits. There are limitations as a table with 10k+ partitions will likely fail on operations against all partitions like 'drop table'. That is generally the soft cap on partitions per table. For the full cluster, the backend RDBMS hosting the metastore will dictate this somewhat. Again there is no hard limit. I have seen some near 10 million partitions across all tables. Granted HMS, HS2, and CatalogD were not stable due to the large partitions count. A single or set of large queries or full table scans would bring them down each time. Your HMS heap will also be large. Hive does have settings now to prevent full partitions grabs or limiting the partition count per query. The Hive community is moving HMS to be backed by HBase to address the scalability of partitions, tables, and databases.

mbigelow · ‎06-26-2017

Did you include all existing nodes in the new racks? That is the only thing can think of. If you missed one, then it would be considered decommissioned and those blocks would be reported and missing or under replicated until they are replicated to other nodes. You are correct, it would report as being mis-replicated after the topology change if two replicas were in the same rack. I have seen the replication issue pop up before. I don't know what the resolution ended up being but it is critical to remember that it is a client side setting, so if a client is still using 3 as the repl factor then that data will have 3 replicas for each block.

mbigelow · ‎06-26-2017

I just copied it over from a cluster node.

mbigelow · ‎06-26-2017

Yes you do.

mbigelow · ‎06-26-2017

Please share the HW and SW specs and the results. I am quite interested. As pointed out, both could sway the results as even Impala's defaults are anemic. Also, I want to point out that Kudu is a filesystem, Impala is an in-memory query engine. Parquet is a file format. So what you are really comparing is Impala+Kudu v Impala+HDFS. You should be using the same file format for both to make it a direct comparison. Also, I don't view Kudu as the inherently faster option. Yes it is written in C which can be faster than Java and it, I believe, is less of an abstraction. Anyway, my point is that Kudu is great for somethings and HDFS is great for others. It isn't an this or that based on performance, at least in my opinion.

mbigelow · ‎06-26-2017

For what it is worth, I just did this and it worked. 1. set up the cdh 5 repo 2. installed hadoop-client with my package manager 3. updated the configs manually (scp or cm api) 4. ??? 5. profit

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: The health test result for HOST_SCM_HEALTH has...

Re: Tell the NameNode where to find a "MISSING" bl...

Re: Configure hadoop-client tools to access hdfs f...

Re: Configure hadoop-client tools to access hdfs f...

Re: Hive Partitioning - maximum for cluster

Re: Changing rack awareness in a running Hadoop cl...

Re: Configure hadoop-client tools to access hdfs f...

Re: Changing rack awareness in a running Hadoop cl...

Re: kudu is slower than parquet?

Re: Configure hadoop-client tools to access hdfs f...