Member since
07-20-2016
40
Posts
3
Kudos Received
0
Solutions
07-25-2021
01:34 PM
@USMAN_HAIDER There is this step below did you perform that? Kerberos must be specified as the security mechanism for Hadoop infrastructure, starting with the HDFS service. Enable Cloudera Manager Server security for the cluster on an HDFS service. After you do so, the Cloudera Manager Server automatically enables Hadoop security on the MapReduce and YARN services associated with that HDFS service. In the Cloudera Manager Admin Console:
Select Clusters > HDFS-n.
1.Click the Configuration tab.
2.Select HDFS-n for the Scope filter.
3.Select Security for the Category filter.
4.Scroll (or search) to find the Hadoop Secure Authentication property.
5.Click the Kerberos button to select Kerberos: Please revert
... View more
08-21-2019
10:32 AM
As a temporary fix you can use linux command swapoff -a && swapon -a to move swap. Just make sure you have enough free memory to move swap to (top command)
... View more
07-17-2019
09:09 AM
Any "manually" killed application not showing up in the history server. In resource manager I am not able to browse the tasks. We care using Cloudera 5.14.X Any application killed by yarn does show up in history server and able to browse tasks in resource manager.
... View more
11-01-2018
07:46 AM
@Harsh J A couple of more questions: http://<rm http address:port>/ws/v1/cluster/apps?queue=root.queue1 filter by queue in resource manager rest interface only showing 'running' applications only, is there any way to show finsihed applications? eve mentioning 'states=running,finished' it still showing only running applications. We are using Cloudera 5.14. Is it possible to upgrade from CDH 5.14 to CDH 6.0 with out complete reinstall?, we are using parcels.
... View more
09-11-2018
07:45 AM
Thanks for the help, proper implmentation of readfields solved the problem.
... View more
09-06-2018
07:57 PM
1 Kudo
There are a few cons to raising your block size: - Increased cost of recovery during write failures When a client is writing a new block into the DataNode pipeline and one of the DataNode fails, there is a enabled-by-default recovery feature that will attempt to refill the gap in the replicated pipeline by transferring the partially written block from one of the remaining good DataNodes to a new DataNode. When this happens, the client is blocked (the outstream,write(…) caller is blocked in the API code). With increased block size, the time waited will also increase greatly depending on how much of the partial block data was written before the failure occurred. A worst-case wait example would involve the time required for network-copying 1.99 GiB for a 2 GiB block size because an involved DN may have failed at that specific point. - Cost of replication caused by DataNode loss or decommission When a DataNode is lost or is being decommissioned, the system has to react by re-filling the gaps in replica counts it creates. With smaller block sizes this activity is easy to spread randomly across the cluster, as several different nodes overall can take part in the re-replicate process. With larger blocks, only a few DNs can participate, and another consequence could be more lopsided space usage across DNs. That said, use of 1-2 GiB is not unheard of and I've seen a few large clusters apply that as their default block size. Its just worth being aware of the cons, looking out for such impact and tuning accordingly as you go. HDFS certainly functions at its best for large sized files, and your usage seems in accordance with that.
... View more
08-28-2018
07:24 AM
@bgooley Got it! I'm relatively new to Python development and was trying figure out what was an acceptable form of multiple criteria for a single POST request for the same key. <== Maybe that sentence will help someone find this later on. I used tcpdump as well but could not see the POST data as it was encrypted with SSL. I was not able to turn off SSL to see plain text content as I don't have access to the server. I was exploring using a cert to decrypt but have never done that using tcpdump. I would love to know how, as I have run into trying to debug traffic and could not see the content. Anyway, I was having trouble creating a dict for the form data as it would only accept a single key. I didn't realize that an array(list) with a single dict key would work as well. This line was a problem in that Python would not let me add multiple key entries like "user_ids='4', user_ids='5'". I didn't realize all it needed was a list. form_data = dict(csrfmiddlewaretoken=session.cookies['csrftoken'] ,next='/hue/useradmin/users/', user_ids=['4','5']) I tried every combination I could think of but made the mistake of surrounding the arguments in quotes, making them a single text argument versus a list. What you did for me was verify a few things and put me on the right trail. Thanks to everyone for their contribution!
... View more
08-22-2018
10:48 PM
1 Kudo
@sbpothineni, You can find more information regarding the Cluster Utlization Report here: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_cluster_util_report.html and about Reports Manager reports here: https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cm_dg_reports.html The Cluster Utilization report shows metrics regarding YARN and Impala jobs/queries The Reports from Reports Manager show information about HDFS
... View more
02-13-2017
08:14 PM
If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended
Sorry, this post is few months old, the above sentence mean it is recommended to use FAST_DIFF over PREFIX (not PREFIX_TREE) right?
... View more