Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3435 | 10-13-2017 09:42 PM | |
6196 | 09-14-2017 11:15 AM | |
3178 | 09-13-2017 10:35 PM | |
5102 | 09-13-2017 10:25 PM | |
5736 | 09-13-2017 10:05 PM |
02-05-2017
10:23 PM
Yes the Source tables are in Parquet. Could you please provide a sample solution for the said issue
... View more
02-03-2017
01:02 PM
1 Kudo
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_ig_feature_differences.html The main difference is that you do get a lot of features that would make management easier specifically around configuration versioning, encryption, security, etc. There will be not technical limitation on the services between the versions. Since you were told to get it from Apache it is worth mentioning that CDH is a package distribution that Cloudera integrates and tests. This means that you won't have to do it but it also means that you will have to go at Cloudera pace when adopting new projects or new versions (technically you can add your own as well but my view is that if you are going to be doing that anyway why not do it for all).
... View more
02-03-2017
12:30 PM
saranvisa is correct in that you should set a minimum and the max should not push the a single nodes memory limits as a single container cannot run across nodes. There is still the mismatch in what is in the configs versus what YARN is using and reporting. On the RM machine get the process id for the RM, sudo su yarn -c "jps" and then get the process info for that id, ps -ef | grep <id>. Does that show that ti is using the configs from the path that you changed, it should be listed in -classpath?
... View more
01-29-2017
09:31 PM
It will work. This will diminish the network throughput and could impact the cluster performance if the typical workload is Network IO bound. In my experience, with predominantly 10 Ge networks, I have not been bound by the network running at the default 1500.
... View more
01-19-2017
10:27 PM
We solved this problem, Simply copying the parcels folder from the other node(fortunatly we are not deleted there) Since we are using ISILON for storage which communicate through network to the data nodes clusters.Because of that deleted node doesn't contain any meta-storage information.
... View more
01-17-2017
07:45 PM
Add two background thread, one for delete empty dir, another for hive Concatenate. But It's really a ugly way.
... View more
01-17-2017
11:50 AM
On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.
... View more
01-16-2017
10:25 PM
1 Kudo
Yes. Cloudera does not support Tez on any CDH version. Hence they do not ship the Tez jar and have it in the classpath. It will take quite a bit of work to build tez and maintain it with each CDH release. Here is a link if you are up to it. Otherwise be satisfied with Hive on Spark or Impala. https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0
... View more
- « Previous
- Next »