About mbigelow

sanjeev20 · ‎02-05-2017

Yes the Source tables are in Parquet. Could you please provide a sample solution for the said issue

mbigelow · ‎02-03-2017

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_ig_feature_differences.html The main difference is that you do get a lot of features that would make management easier specifically around configuration versioning, encryption, security, etc. There will be not technical limitation on the services between the versions. Since you were told to get it from Apache it is worth mentioning that CDH is a package distribution that Cloudera integrates and tests. This means that you won't have to do it but it also means that you will have to go at Cloudera pace when adopting new projects or new versions (technically you can add your own as well but my view is that if you are going to be doing that anyway why not do it for all).

mbigelow · ‎02-03-2017

saranvisa is correct in that you should set a minimum and the max should not push the a single nodes memory limits as a single container cannot run across nodes. There is still the mismatch in what is in the configs versus what YARN is using and reporting. On the RM machine get the process id for the RM, sudo su yarn -c "jps" and then get the process info for that id, ps -ef | grep <id>. Does that show that ti is using the configs from the path that you changed, it should be listed in -classpath?

mbigelow · ‎01-29-2017

It will work. This will diminish the network throughput and could impact the cluster performance if the typical workload is Network IO bound. In my experience, with predominantly 10 Ge networks, I have not been bound by the network running at the default 1500.

alex.behm · ‎01-20-2017

Thanks!

ganeshkumarj · ‎01-19-2017

We solved this problem, Simply copying the parcels folder from the other node(fortunatly we are not deleted there) Since we are using ISILON for storage which communicate through network to the data nodes clusters.Because of that deleted node doesn't contain any meta-storage information.

terry19850289 · ‎01-17-2017

Add two background thread, one for delete empty dir, another for hive Concatenate. But It's really a ugly way.

mbigelow · ‎01-17-2017

On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.

MasterOfPuppets · ‎01-17-2017

Thanks for your feedback.

mbigelow · ‎01-16-2017

Yes. Cloudera does not support Tez on any CDH version. Hence they do not ship the Tez jar and have it in the classpath. It will take quite a bit of work to build tez and maintain it with each CDH release. Here is a link if you are up to it. Otherwise be satisfied with Hive on Spark or Impala. https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Parquet is not a parquet file (too small)

Re: Need clarity on Cloudera's Hadoop Free Editio...

Re: Yarn-site.xml changes not reflecting

Re: Any specified MTU required for cloudera cluste...

Re: CREATE TABLE AS SELECT returns error 'Failed t...

Re: Extrated CDH5.4.8 folders deleted from /opt/cl...

Re: CDH 5.4.7 spark-streaming(1.3.0) kafka messag...

Re: Hive Queries run slowly

Re: Hive hive.exec.parallel property

Re: Tez Engine not working over CDH 5.8.2