About naveen1

naveen1 · ‎03-08-2019

We are getting below errors when 15 or 16 spark jobs are running parallelly. We are having a 21 node cluster and running spark on yarn. Regardless of the number of nodes in the cluster does one cluster get to use only 17 ports or is it 17 ports per node in a cluster? How to avoid this when we run 50 or 100 spark jobs parallely? WARN util.Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041. ::::: WARN util.Utils: Service ‘SparkUI’ could not bind on port 4055. Attempting port 4056. Address alredy in use: Service ‘sparkUI’ failed after 16 retries! Consider explicitly setting the appropriate port for the service ‘SparkUI

naveen1 · ‎05-24-2017

After recomission can just add the datanode back and Name node will identify all the blocks that were previously present in this datanode. Once Namenode identifies this information, It will wipe out the third replica that it created during the datanode decomission. You may have to run hdfs balancer if you format the disks and then recomision it to the cluster which is not a best practise.

naveen1 · ‎05-24-2017

Thanks for your reply. Any links or docs for storage pools will be helpful for me.

naveen1 · ‎05-24-2017

Hello Niranjan, drwxr-xr-x - striim1 striim1 Above permissions will not let Joy to write a file inside the hdfs directory unless Joy is a hdfs superuser. Try to look at hdfs acls to solve your problem here. Apart from striim1 if Joy is the only user who creates files in /user/striim1 then try to run below command. hdfs dfs -setfacl -m user:joy:rwx /user/striim1 HDFS ACLS https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_hdfs_ext_acls.html

naveen1 · ‎05-24-2017

NN of 5 GB should handle upwards of 5 millions blocks, which is actually 15 million total. A 10 node cluster should set the DN block threshold to 1.5 million. -- This this hold good for a heterogeneous cluster where few data nodes have 40 TB space and others are 80TB space. I am sure having a datanode block threshold of 500,000 is not a good practise. This will cause smaller datandoes to fill up faster than the larger datanodes and send alerts at an early phase.

naveen1 · ‎05-01-2017

I had to check grant in hr_role instead of emp_role. This is the solution for this question.

naveen1 · ‎04-28-2017

I have a employee_database and under employee_database I have tables salary_table and bonus_table. Right now emp_role has full access on employee_database. I would also like to give select access to hr_role on bonus_table. How can I achieve this in sentry? SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false GRANT SELECT ON TABLE emp_database.bonus_table to role hr_role; SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false I don't get error when I run the above grant but i don't see the grant in the list.

naveen1 · ‎10-06-2016

Ben, you gave me the same answer in my previous WebEx with you 🙂 Thank you!

naveen1 · ‎10-06-2016

Some times these are false alerts. Can you check your sytem load during that time when you received these allerts(CLOCK_OFFSET, DNS_HOST_RESOLUTON, WEB_METRIC etc). sar -q -f /var/log/sa/sa10 Use above command and modify sa10 with the date you have received alerts. track down the load and check what unusal thing happened on that host during that widow. If you see there is a bump up in the load then Check your system I/O disk utilization to see if the spindles are reaching 100%. If any of the spindles are reaching 100% then the system load is the cluprit here. You may need to increase your thresholds on that perticular host in CM> host>all hosts> select on host name> configurations and look for Host Clock Offset Thresholds or Host DNS Resolution Duration Thresholds. As per the present thresholds when the system experience high load it will pause for a while or send delayed response(response includes health check reports of the host when CM scm-agent is running) to the scm-server. When the SCM-server failed to receive these health check reports within the duration due to the host beeing busy then this will cause the alerts to flood in your inbox.

naveen1 · ‎05-02-2016

I found the answer myself. Using below command I can achieve ls -lt output in hdfs. hdfs dfs -ls /test | sort -k6,7

Online	Offline
Last Visited	‎07-31-2019 03:12 PM

Member Since	‎12-28-2015 09:51 AM
Last Visited	‎07-31-2019 03:12 PM
Posts	47
Kudos received	2

Cloudera Community

Re: Error while trying to impersonate a user to ac...

Re: Sentry grant on a table

Re: listing hdfs files as per timestamp

Re: HDFS encryption confusion

Spark port binding issue

Re: Best practices to join nodes back into the clu...

Re: Hadoop Data Node: why is there a "magic" numbe...

Re: Error while trying to impersonate a user to ac...

Re: Hadoop Data Node: why is there a "magic" numbe...

Re: Sentry grant on a table

Sentry grant on a table

Re: Cloudera 5.4.x cluster randomly reports "Clock...

Re: Cloudera 5.4.x cluster randomly reports "Clock...

Re: listing hdfs files as per timestamp