Member since
12-28-2015
47
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4203 | 05-24-2017 02:14 PM | |
1941 | 05-01-2017 06:53 AM | |
3530 | 05-02-2016 01:11 PM | |
4103 | 02-09-2016 01:40 PM |
03-08-2019
06:29 AM
We are getting below errors when 15 or 16 spark jobs are running parallelly. We are having a 21 node cluster and running spark on yarn. Regardless of the number of nodes in the cluster does one cluster get to use only 17 ports or is it 17 ports per node in a cluster? How to avoid this when we run 50 or 100 spark jobs parallely? WARN util.Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041. ::::: WARN util.Utils: Service ‘SparkUI’ could not bind on port 4055. Attempting port 4056. Address alredy in use: Service ‘sparkUI’ failed after 16 retries! Consider explicitly setting the appropriate port for the service ‘SparkUI
... View more
06-22-2017
07:16 PM
Thanks for the solution Harsha. There is always a backup for this large directory so it is good to have a 1 replication but I will definitely reconsider your suggestion.
... View more
06-22-2017
02:08 PM
I have a pre-existing hdfs directory which is 10 TB in size and I want to change the replication of this from 3 to 1. What is the best possible way I can achieve this?
... View more
06-22-2017
02:00 PM
This seems to be a overhead and I was seeing if hdfs could handle this request. Thanks for all replies.
... View more
06-13-2017
11:56 AM
Is it possible to use different Trash interval for hdfs directories. Eg Directory "Market" for trash interval 7days and directory search for "14" days?
... View more
06-13-2017
11:25 AM
It works, When I manually run mail from worker and master nodes. When I runt the shell script from oozie. it doesnot work.
... View more
05-30-2017
08:43 PM
Falcon supports HDFS mirroring to replicate data from source to destination cluster. Technically it uses distributed copy to replicate files between the clusters just like Cloudera BDR(Backup and Disaster Recovery). Falcon is much similar to BDR in doing distcp with and without snapshots.
... View more
05-24-2017
02:31 PM
After recomission can just add the datanode back and Name node will identify all the blocks that were previously present in this datanode. Once Namenode identifies this information, It will wipe out the third replica that it created during the datanode decomission. You may have to run hdfs balancer if you format the disks and then recomision it to the cluster which is not a best practise.
... View more
05-24-2017
02:25 PM
1 Kudo
1. Go to name node UI and you can check if the newly added data nodes has any blocks or not. By default HDFS will not write or copy data to the newly added nodes. you need to run HDFS balancer to distribute the data among all datanodes 2. Go to HDFS > INSTANCES and then click on HOSTS column or service column to sort out datanodes and the nodes that run DN. 3. GO to IMPALA > INSTANCES and sort it as per above point
... View more
05-24-2017
02:17 PM
Thanks for your reply. Any links or docs for storage pools will be helpful for me.
... View more
05-24-2017
02:14 PM
1 Kudo
Hello Niranjan, drwxr-xr-x - striim1 striim1 Above permissions will not let Joy to write a file inside the hdfs directory unless Joy is a hdfs superuser. Try to look at hdfs acls to solve your problem here. Apart from striim1 if Joy is the only user who creates files in /user/striim1 then try to run below command. hdfs dfs -setfacl -m user:joy:rwx /user/striim1 HDFS ACLS https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_hdfs_ext_acls.html
... View more
05-24-2017
02:03 PM
NN of 5 GB should handle upwards of 5 millions blocks, which is actually 15 million total. A 10 node cluster should set the DN block threshold to 1.5 million. -- This this hold good for a heterogeneous cluster where few data nodes have 40 TB space and others are 80TB space. I am sure having a datanode block threshold of 500,000 is not a good practise. This will cause smaller datandoes to fill up faster than the larger datanodes and send alerts at an early phase.
... View more
05-24-2017
01:54 PM
One of the common reason for datanodes going unbalanced is ingestion/data load. The first copy of data is always stored on the same datanode from where you are loading data into HDFS. Second and third copy of the data will be stored on rest of the data nodes based on a round robin fashion. You can make name node to choose availabe space on data nodes instead of round robin fashion by setting " DataNode Volume Choosing Policy" appropriately. How many nodes do you have in this cluster? How do you push data into HDFS?
... View more
05-24-2017
01:42 PM
Guna, I guess you are talking about Impala Daemon HTTP Server Port 25000. I am looking for Impala Catalog server web server username and Catalog server web server user password which are two parameters in Impala > configurations. These are default and I want to know if there is any problem if I remove values of these two parameters.
... View more
05-16-2017
01:32 PM
What is the use of Impala Catalog server web server username and Catalog server web server user password. I see the fields are admin and ****** 1) Are these default values? what is the default password for this? 2)On google, I could not find a description for this parameter. Is there any place I can find a detailed description for each parameter that I see in Cloudera manager?
... View more
Labels:
05-07-2017
09:33 PM
I am running a shell script through oozie. A piece of my shell script has below code. I receive an email when I run this shell script through unix Command line but when I run this shell script through oozie Job it will succeed but I dont get any mail. How can I resolve this? Or is there any alternative way I can get email with attachment once string matches in unix filesystem? if [ "$Var" = "Error" ] then echo "Data error" | mail -v -s "Data Error" -a error.csv -S smtp=smtp://mail-gateway -S from=localhost@gmail.com" Kevin@gmail.com exit
... View more
- Tags:
- Oozie
05-01-2017
06:53 AM
I had to check grant in hr_role instead of emp_role. This is the solution for this question.
... View more
04-28-2017
01:15 PM
I have a employee_database and under employee_database I have tables salary_table and bonus_table. Right now emp_role has full access on employee_database. I would also like to give select access to hr_role on bonus_table. How can I achieve this in sentry? SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false GRANT SELECT ON TABLE emp_database.bonus_table to role hr_role; SHOW GRANT ROLE emp_role; 1 hdfs://localns/emp emp_role ROLE * false 2 employee_database emp_role ROLE * false I don't get error when I run the above grant but i don't see the grant in the list.
... View more
03-23-2017
07:24 AM
Hi Eric, In Impala daemon web UI I see that the query is completed 16hours ago and the state is Finished but the query is still in the Flight list. This ois the only query running on this daemon and is occupying 3.5GB of memory on this daemon. If I cancel this query then the memory on this daemon goes to zero. Basically the query is complete but its locking the memory causing memory leak. Session ID: 74480cc476dd5fde:64c866411ae5f0b5
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-03-22 17:44:29.924339000
End Time:
Query Type: QUERY
Query State: CREATED
Query Status: OK
... View more
03-20-2017
10:25 AM
I am trying to cancel impala query through CM API and the command doesn't return any output but it is not canceling the query. [root@userhost]curl -k -u admin https://userhost:7180/api/v12/clusters/userclu/services/impala/impalaQueries/624dfb95ddbb92a5:6647034a254bc1b6/cancel Enter host password for user 'admin': [root@userhost ~]# Infact I am looking to kill the query as the query is completed 50 mins ago. Any help is appreciated.
... View more
Labels:
03-09-2017
06:39 AM
Hi am having an issue with Oozie job during the DST. I have scheduled a job at 6AM due to DST the job started running at 5AM. Why is oozie not picking timing from the host? Any resolution for this? Also my job will start runnning at 6AM from March 12 which is now runnning at 5AM. How can I avoid this?
... View more
02-13-2017
08:03 AM
Hi Ben, This is Naveen and I was wondering how the patch numbers are taken. for example: cdh 5.7.0-1.cdh5.7.0.p1722.1683 1) what does 1722 and 1683 actually mean here? 2) Is either of them a sequence?
... View more
02-13-2017
07:20 AM
How do I check which Impala table is more frequently accessed so that I can gather my hot data for HDFS cacheing.
... View more
Labels:
10-26-2016
09:51 PM
Hello Experts, Can some one provie me details(count) of how many custom patches have cloudera built for cdh 5.7.0?
... View more
10-06-2016
09:13 AM
Ben, you gave me the same answer in my previous WebEx with you 🙂 Thank you!
... View more
10-06-2016
07:45 AM
Some times these are false alerts. Can you check your sytem load during that time when you received these allerts(CLOCK_OFFSET, DNS_HOST_RESOLUTON, WEB_METRIC etc). sar -q -f /var/log/sa/sa10 Use above command and modify sa10 with the date you have received alerts. track down the load and check what unusal thing happened on that host during that widow. If you see there is a bump up in the load then Check your system I/O disk utilization to see if the spindles are reaching 100%. If any of the spindles are reaching 100% then the system load is the cluprit here. You may need to increase your thresholds on that perticular host in CM> host>all hosts> select on host name> configurations and look for Host Clock Offset Thresholds or Host DNS Resolution Duration Thresholds. As per the present thresholds when the system experience high load it will pause for a while or send delayed response(response includes health check reports of the host when CM scm-agent is running) to the scm-server. When the SCM-server failed to receive these health check reports within the duration due to the host beeing busy then this will cause the alerts to flood in your inbox.
... View more
06-08-2016
12:28 PM
I am upgrading from CDH version 5.4 to 5.7. Can some one post the improvements that I can expect in BDR (backup and data recovery) from CDH 5.4 tp CDH 5.7? Even though I set 40 mappers for a bdr job it runs on 40 for a while and then it will reduce the mappers to say 20 or 15 and runs for several hours. Because of this we are not able to keep up with the replication. Is this fixed in CDH 5.7? Need to know what improvements are done for BDR on 5.7.
... View more
05-02-2016
01:11 PM
I found the answer myself. Using below command I can achieve ls -lt output in hdfs. hdfs dfs -ls /test | sort -k6,7
... View more
- Tags:
- HDFS
05-02-2016
09:06 AM
How to list hdfs files according to timestamp? just like ls -lt in unix.
... View more