Member since
12-28-2015
47
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4259 | 05-24-2017 02:14 PM | |
1968 | 05-01-2017 06:53 AM | |
3591 | 05-02-2016 01:11 PM | |
4137 | 02-09-2016 01:40 PM |
07-01-2019
07:12 PM
hi,man, did you fixed this problem,i have the same too.
... View more
03-12-2019
09:26 AM
Hi Naveen, If you have limited number of ports available. You can assign port for each application. --conf "spark.driver.port=4050" —conf "spark.executor.port=51001" --conf "spark.ui.port=4005" Hope it helps Thanks Jerry
... View more
06-22-2017
07:16 PM
Thanks for the solution Harsha. There is always a backup for this large directory so it is good to have a 1 replication but I will definitely reconsider your suggestion.
... View more
06-14-2017
01:38 AM
I would suggest to use a Decision node and an Email action instead of sending the mail from a shell script. If you prefer the shell, make sure you set every environment variable that's needed for the mail command to work.
... View more
05-30-2017
08:43 PM
Falcon supports HDFS mirroring to replicate data from source to destination cluster. Technically it uses distributed copy to replicate files between the clusters just like Cloudera BDR(Backup and Disaster Recovery). Falcon is much similar to BDR in doing distcp with and without snapshots.
... View more
05-30-2017
01:41 AM
Hi Naveen, I realized my understanding is wrong and understoop that system sees "joy" as the user who is trying to write and permissions would be enforced against "joy". So I set up an ACL for joy and my program worked fine. Now I understand ACL's usage with respect to impersonation. Thanks for the pointer. Regards, Niranjan
... View more
05-24-2017
02:37 PM
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html I recommend opening a new topic you have any other questions on storage pools. That way this discussion can stay on topic.
... View more
05-24-2017
02:31 PM
After recomission can just add the datanode back and Name node will identify all the blocks that were previously present in this datanode. Once Namenode identifies this information, It will wipe out the third replica that it created during the datanode decomission. You may have to run hdfs balancer if you format the disks and then recomision it to the cluster which is not a best practise.
... View more
05-24-2017
02:25 PM
1 Kudo
1. Go to name node UI and you can check if the newly added data nodes has any blocks or not. By default HDFS will not write or copy data to the newly added nodes. you need to run HDFS balancer to distribute the data among all datanodes 2. Go to HDFS > INSTANCES and then click on HOSTS column or service column to sort out datanodes and the nodes that run DN. 3. GO to IMPALA > INSTANCES and sort it as per above point
... View more
05-24-2017
01:54 PM
One of the common reason for datanodes going unbalanced is ingestion/data load. The first copy of data is always stored on the same datanode from where you are loading data into HDFS. Second and third copy of the data will be stored on rest of the data nodes based on a round robin fashion. You can make name node to choose availabe space on data nodes instead of round robin fashion by setting " DataNode Volume Choosing Policy" appropriately. How many nodes do you have in this cluster? How do you push data into HDFS?
... View more
05-24-2017
01:42 PM
Guna, I guess you are talking about Impala Daemon HTTP Server Port 25000. I am looking for Impala Catalog server web server username and Catalog server web server user password which are two parameters in Impala > configurations. These are default and I want to know if there is any problem if I remove values of these two parameters.
... View more
05-01-2017
06:53 AM
I had to check grant in hr_role instead of emp_role. This is the solution for this question.
... View more
03-23-2017
07:24 AM
Hi Eric, In Impala daemon web UI I see that the query is completed 16hours ago and the state is Finished but the query is still in the Flight list. This ois the only query running on this daemon and is occupying 3.5GB of memory on this daemon. If I cancel this query then the memory on this daemon goes to zero. Basically the query is complete but its locking the memory causing memory leak. Session ID: 74480cc476dd5fde:64c866411ae5f0b5
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-03-22 17:44:29.924339000
End Time:
Query Type: QUERY
Query State: CREATED
Query Status: OK
... View more
03-09-2017
06:39 AM
Hi am having an issue with Oozie job during the DST. I have scheduled a job at 6AM due to DST the job started running at 5AM. Why is oozie not picking timing from the host? Any resolution for this? Also my job will start runnning at 6AM from March 12 which is now runnning at 5AM. How can I avoid this?
... View more
02-13-2017
11:22 AM
Set up auditing for Impala and use a tool to analyze it (or do it yourself). I have feed these audit logs to Cloudera Optimizer and Splunk (this requires Splunk and SPL knowledge). Both will give you your answers to this and quite a bit more. Honestly, to just get your answer you should be able to read the audit files and use basic unix tools like grep, awk, cut, sort, uniq, etc. to get tables names and the frequency.
... View more
02-13-2017
08:03 AM
Hi Ben, This is Naveen and I was wondering how the patch numbers are taken. for example: cdh 5.7.0-1.cdh5.7.0.p1722.1683 1) what does 1722 and 1683 actually mean here? 2) Is either of them a sequence?
... View more
06-10-2016
02:13 AM
The # of mappers in BDR is definitively determined by the passed configuration. Could you share JSONs or Screenshots of your configured job(s) illustrating the problem?
... View more
03-30-2016
12:09 PM
Hi, Encryption at rest is used for protecting your data from an unauthorized user who has no read permission in hdfs or has no access to cluster and is trying to read it from the disk directly. In your example the directory / tmp /user1zone1 has read access for all cluster users and hence user2 is allowed to read from it. drwxr-xr-x - user1 supergroup 0 2016-02-10 02:42 /tmp/user1zone1
... View more
02-16-2016
08:15 AM
Hi Naveen, If you are running sentry service on one of the Name node there may be chances of sentry service going down when that node goes down and obviously I guess you will not be able to access the files and databases. for better service you can install sentry service other than those NN HA nodes and enable HA in sentry. I hope this link will help you https://cwiki.apache.org/confluence/display/SENTRY/Sentry+HA+Server+Client+Configuration
... View more
02-04-2016
01:01 AM
Please read the link in before post, there is a oiv_legacy command also which do what you want.
... View more
01-28-2016
05:39 PM
Thank you all for your time, logical workaround sounds good to me.
... View more
01-28-2016
07:07 AM
any chance to fix my problem ? Thanks
... View more
01-27-2016
07:32 AM
There are 9 drives mounted, only 3 of them shows tps 30 rest all shows tps 7. On a node how to know on what drive journal node is installed and how to move that to other drive?
... View more
01-18-2016
02:10 PM
You won't save HDFS filesystem space by "archiving" or "combining" small files. In many scenarios you will get a performance boost from combining. You will also reduce the metadata overhead on the namenode by combining as well.
... View more
01-03-2016
09:17 PM
FSCK prints the full identifier of a block, which is useful in some contexts depending on what you're about to troubleshoot or investigate. Here's a break down: BP-929597290-192.0.0.2-1439573305237 = This is a BlockPool (BP) ID. Its the mark of a NameNode's ownership of the block in question. You might recall that HDFS now supports federated namespaces, wherein multiple NameNodes may be served by a single DataNode. This ID is how each NameNode is uniquely identified to be the owner of a held block ID. Even though you do not explicitly utilise federation, the block-pool concept is now inbuilt into the identifier design of HDFS by default. See http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/Federation.html#Multiple_NamenodesNamespaces blk_1074084574_344316 = This is the block ID (blk_X_Y). Each block under every file is uniquely identified by a number X and a sub-number Y (generation stamp). More on block IDs and HDFS architecture can be read in the AOS book: http://aosabook.org/en/hdfs.html DS-730a75d3-046c-4254-990a-4eee9520424f,DISK = This is a storage identifier ID. It helps tell that on the specified DN IP:PORT, which disk (hashed identifier) is actually the one holding the data, and what is the type of the disk (DISK). HDFS now supports tiered storage, in which this comes useful: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html (aside of other things).
... View more