Member since
02-18-2016
141
Posts
19
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5124 | 12-18-2019 07:44 PM | |
5154 | 12-15-2019 07:40 PM | |
1817 | 12-03-2019 06:29 AM | |
1836 | 12-02-2019 06:47 AM | |
5849 | 11-28-2019 02:06 AM |
11-14-2019
08:41 PM
@BaoHo CCU means per Concurrent User (CCU). I think the best way to get the details for this is to reach to Cloudera Sales Representative. They will brief you on this topic.
... View more
11-14-2019
08:17 PM
@deekshant To debug Namenode issue you need to check below - 1. Check active namenode[NN] logs [for time when it got reboot] 2. Check active NN zkfc logs [same time - if you see any issue] 3. Check for standby NN logs at same time if you see any error 4. Check for standby NN zkfc logs for any error at same timestamp 5. Check for Active NN .out file for any warnings/error 6. Check for system logs "/var/log/message" for any issue at particular moment of time. You will find error in one of the above file. accordingly you can go for RCA. Do revert if you need further help.
... View more
11-14-2019
12:37 AM
2 Kudos
@TheBroMeister I will try to comment my views inline - 1.) How different would the Setup and configuration be for Physical Servers as to VMs. Yes, Setting up the VMs would be faster as compared to the physical ones but are there any additional configurations or settings that we would need to look into? -- If we talk regarding general configuration they below points will be taken in account which counting on performance - a. Disks b Network c. Memory/CPU d. SLA 2.) We've read that one possible issue with setting the cluster on VMs is with Data Locality and redundancy. On how no 2 replicas should not be in the same physical node but since one physical node may house several VMs, would there be a way around this issue? -- VM with external storage[like SAN] will be impacting data locality. You can go with dedicated disk for the VM's which will be a good hybrid approach. 'YES' , also for data locality addon components from virtual vendors[like vmware] are provided - such as BDE [Big Data Extensions] also for Network compromises of NSX technology which will help to speed up systems to avoid performance impacts. But you need to take licensing cost into account. 3.) Since the specs of the VMs would be restricted to the specs of the physical node and its resources be split depending on how many VMs it is housing, wouldn't it be better to have separate servers to house 1 node of a cluster to get better performance? and would having several VMs in one physical node affect the parallelism of the jobs that will run on the cluster? -- Its difficult to put decision at first moment based upon actual experiences. This decision purely depends upon your sla. At start while running hadoop applications, you might not be aware of how much time it takes for your application to process or meet the SLA. This can be purely POC base approach you need to test and also run benchmarking before you go for actual dev/uat/prod implementations. benchmarking results will give you fair idea about performance and computational stats. That can be easy then to take the decision. Pls do check below links which might be useful - https://community.cloudera.com/t5/Support-Questions/Virtual-Machines-in-Hadoop-cluster/td-p/119675 https://www.kdnuggets.com/2015/12/myths-virtualizing-hadoop-vsphere-explained.html https://pubs.vmware.com/bde-2/index.jsp
... View more
11-13-2019
11:06 PM
@TheBroMeister Every technology has its pros and cons. The above comment is very broad and every lasting if we discuss. Do you have any specific question/issue regarding implementations/architecture ? Will try to comment accordingly.
... View more
11-12-2019
08:39 PM
@fgarciaCan you try to hit rest call and check if you get same info ? curl -X GET admin: http://<active_namenode>:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo From the Namenode screenshot it seems that 0 datanodes/blocks are reported to NN. Do you see all connections between DN and NN are good? Can you check/pass full log strack?
... View more
11-12-2019
08:37 PM
@VamshiDevraj If you are still facing issue can you share details about the error or screenshot for the same?
... View more
11-12-2019
08:09 PM
1. Is the job failed due to above reason? If "NO" - then is it the error occurring displayed in logs for all spark jobs or just for this job?
... View more
11-12-2019
02:21 AM
Can you also check heap size utilization for Ambari server. You might need to revisit Ambari server heap config if needed. Check this link for details - https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/administering-ambari/content/amb_adjust_ambari_server_heap_size.html
... View more
11-12-2019
02:19 AM
If you know the file name then - hdfs fsck /myfile.txt -files -blocks -locations Else hdfs fsck |grep <blkxxx>
... View more
11-12-2019
01:53 AM
1. Is the job failed due to above reason? If "NO", then is the error occurring in logs eveything for other BP XXX also? 2. Can you check using fsck which nodes had copied of the BP specified above?
... View more