Member since
12-28-2015
47
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6974 | 05-24-2017 02:14 PM | |
2940 | 05-01-2017 06:53 AM | |
5777 | 05-02-2016 01:11 PM | |
6314 | 02-09-2016 01:40 PM |
03-08-2019
06:29 AM
We are getting below errors when 15 or 16 spark jobs are running parallelly. We are having a 21 node cluster and running spark on yarn. Regardless of the number of nodes in the cluster does one cluster get to use only 17 ports or is it 17 ports per node in a cluster? How to avoid this when we run 50 or 100 spark jobs parallely? WARN util.Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041. ::::: WARN util.Utils: Service ‘SparkUI’ could not bind on port 4055. Attempting port 4056. Address alredy in use: Service ‘sparkUI’ failed after 16 retries! Consider explicitly setting the appropriate port for the service ‘SparkUI
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
05-24-2017
02:31 PM
After recomission can just add the datanode back and Name node will identify all the blocks that were previously present in this datanode. Once Namenode identifies this information, It will wipe out the third replica that it created during the datanode decomission. You may have to run hdfs balancer if you format the disks and then recomision it to the cluster which is not a best practise.
... View more
05-24-2017
02:17 PM
Thanks for your reply. Any links or docs for storage pools will be helpful for me.
... View more
05-24-2017
02:14 PM
1 Kudo
Hello Niranjan, drwxr-xr-x - striim1 striim1 Above permissions will not let Joy to write a file inside the hdfs directory unless Joy is a hdfs superuser. Try to look at hdfs acls to solve your problem here. Apart from striim1 if Joy is the only user who creates files in /user/striim1 then try to run below command. hdfs dfs -setfacl -m user:joy:rwx /user/striim1 HDFS ACLS https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_hdfs_ext_acls.html
... View more
05-24-2017
02:03 PM
NN of 5 GB should handle upwards of 5 millions blocks, which is actually 15 million total. A 10 node cluster should set the DN block threshold to 1.5 million. -- This this hold good for a heterogeneous cluster where few data nodes have 40 TB space and others are 80TB space. I am sure having a datanode block threshold of 500,000 is not a good practise. This will cause smaller datandoes to fill up faster than the larger datanodes and send alerts at an early phase.
... View more
05-01-2017
06:53 AM
I had to check grant in hr_role instead of emp_role. This is the solution for this question.
... View more
04-28-2017
01:15 PM
I have a employee_database and under employee_database I have tables salary_table and bonus_table. Right now emp_role has full access on employee_database. I would also like to give select access to hr_role on bonus_table. How can I achieve this in sentry? SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false GRANT SELECT ON TABLE emp_database.bonus_table to role hr_role; SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false I don't get error when I run the above grant but i don't see the grant in the list.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Sentry
-
HDFS
10-06-2016
09:13 AM
Ben, you gave me the same answer in my previous WebEx with you 🙂 Thank you!
... View more
10-06-2016
07:45 AM
Some times these are false alerts. Can you check your sytem load during that time when you received these allerts(CLOCK_OFFSET, DNS_HOST_RESOLUTON, WEB_METRIC etc). sar -q -f /var/log/sa/sa10 Use above command and modify sa10 with the date you have received alerts. track down the load and check what unusal thing happened on that host during that widow. If you see there is a bump up in the load then Check your system I/O disk utilization to see if the spindles are reaching 100%. If any of the spindles are reaching 100% then the system load is the cluprit here. You may need to increase your thresholds on that perticular host in CM> host>all hosts> select on host name> configurations and look for Host Clock Offset Thresholds or Host DNS Resolution Duration Thresholds. As per the present thresholds when the system experience high load it will pause for a while or send delayed response(response includes health check reports of the host when CM scm-agent is running) to the scm-server. When the SCM-server failed to receive these health check reports within the duration due to the host beeing busy then this will cause the alerts to flood in your inbox.
... View more
05-02-2016
01:11 PM
I found the answer myself. Using below command I can achieve ls -lt output in hdfs. hdfs dfs -ls /test | sort -k6,7
... View more