Member since
02-18-2016
135
Posts
19
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1807 | 12-18-2019 07:44 PM | |
1837 | 12-15-2019 07:40 PM | |
735 | 12-03-2019 06:29 AM | |
754 | 12-02-2019 06:47 AM | |
1703 | 11-28-2019 02:06 AM |
11-18-2019
11:24 PM
@divya_thaore 1. If you are starting service via Ambari / Cloudera Manager UI then check the operation logs which are displayed in UI while you start the service Please check for any errors in the logs clicking the service operational logs. 2. If you do not see any operational logs or any operation triggerred while you start/restart service then kindly restart agent service once [ambari agent/cloudera-scm-agent] 3. Else please check logs from cli as suggested by @Shelton If you still need help please revert.
... View more
11-18-2019
11:11 PM
Hi @Manoj690 Did the earlier community link helped to resolve issue - https://community.cloudera.com/t5/Support-Questions/Ambari-metircs-not-started/m-p/283228#M210525 ?? Please confirm!!
... View more
11-18-2019
11:09 PM
@mike_bronson7 The latest command you posted again has typo. "R" is missing at the end in below command - >>curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE "http://node02:8080/api/v1/clusters/HDP/hosts/node01/host_components/SPARK2_THRIFTSERVE" Pls try and pass new error if any.
... View more
11-18-2019
06:38 PM
@getschwifty Till what i know is Quotas are managed by a set of commands available only to the administrator[like dfsadmin]. Quotas are not exposed via rest api call. But you can achieve it by other way via "lsr |count -q". Please check below link for details - https://issues.apache.org/jira/browse/HDFS-2711 https://issues.apache.org/jira/browse/HDFS-253
... View more
11-18-2019
03:48 AM
yes. you can remove pig client using ambari api.
... View more
11-18-2019
02:32 AM
@shyamshaw This is already highlighted in community here by @LesterMartin - https://community.cloudera.com/t5/Support-Questions/Reason-for-Hive-dependency-on-PIg-during-installation-of/td-p/239407 >>>> Probably for using HCatalog with can be extremely useful for Pig programmers even if they don't want to use Hive and just leverage this for schema management instead of defining AS clauses in their LOAD commands? Just as likely this is something hard-coded into Ambari? If you really don't want Hive, I bet you can just delete it after installation. For giggles, I stood up an HDFS-only HDP 3.1.0 cluster for https://community.hortonworks.com/questions/245432/is-it-possible-to-install-only-hdfs-on-linux-mach... and just added Pig (required YARN, MR, Tez & ZK, but that makes sense!) and did NOT require Hive to be added as seen below. Please check the link for full details. Also same you can remove PIG after installation which will not impact your HIVE.
... View more
11-18-2019
02:24 AM
Hi@Manoj690 , Seems your AMS hbase master is not able to start - Please try below steps - In the Ambari Dashboard, go to the 'Ambari Metrics' section and under the 'Service Actions' dropdown click 'Stop'. Check and confirm from backend that ams process is stopped. If the process is still running use below command to stop - # ambari-metrics-collector stop Delete all AMS Hbase data - In the Ambari Dashboard, under the Ambari Metrics section do a search for the following configuration values "hbase.rootdir" Remove entire files from “hbase.rootdir” Eg. #hdfs dfs -cp /user/ams/hbase/* /tmp/ #hdfs dfs -rm -r -skipTrash /user/ams/hbase/* In the Ambari Dashboard, under the Ambari Metrics section do a search for the following configuration values “hbase.tmp.dir”. Backup the directory and remove the data. Eg. #cp /var/lib/ambari-metrics-collector/hbase-tmp/* /tmp/ #rm –fr /var/lib/ambari-metrics-collector/hbase-tmp/* Remove the znode for hbase in zookeeper cli Login to ambari UI -> Ambari Metrics -> Configs -> Advance ams-hbase-site and search for property “zookeeper.znode.parent” #/usr/hdp/current/zookeeper-client/bin/zkCli.sh #rmr /ams-hbase-secure Start AMS Let know if you still have issue
... View more
11-17-2019
11:10 PM
1 Kudo
Hi@Manoj690 1. First check which is the directory configured for dfs data storage. Login to Ambari UI -> Services-> HDFS -> Configs -> [search for dfs.datanode.data.dir] Capture this list of directories defined here. EG. i have list as below - /data01/hadoop/hdfs/data,/data02/hadoop/hdfs/data 2. Login to the datanodes and go to the mount. In my case- $cd /data01/hadoop/hdfs/ 3. Check if there is any directory/data inside "/data01/hadoop/hdfs/" or "/data01/" 4. Other data than "**data" is consider as nondfs and which will show you in dfsadmin -report. 5. You need to get rid of those data which will lesser your NON DFS used. You can share your output if you have any confusion/need help.
... View more
11-17-2019
10:29 PM
Probably i see chronyd command are similar to NTP - you can refer this for debugging - https://www.thegeekdiary.com/centos-rhel-7-tips-on-troubleshooting-ntp-chrony-issues/
... View more
11-17-2019
10:27 PM
Please check this once - Try running "ntpdate ipsap01.ecb.de" on all hosts and check if any issue reported while running this command Make sure chronyd/ntp.conf is same on all nodes hwclock--systohc systemctl restart cloudera-scm-agent Further more if the above wont help then you need to debug ntp server side. Execute below commands - ntpq -c pe The output shown is good, but note that if the refid column indicates ".INIT." it can suggest a communication issue. ntpq -c as The output below is good however if the reach column indicates "no" it suggests that the client cannot reach peer hosts. You probably need to check stratum of your ntp servers - The "assID" from ntpq -c as can be used with command ntpq -c "rv assID" to determine the "stratum". The lower the stratum the better. The upper limit for stratum is 15; stratum 16 is used to indicate that a device is unsynchronized. ntpq -c "rv <association_id_from_above_command_output>"
... View more
11-17-2019
09:17 PM
@stunningsuraj Also if the existing Cluster is non HDP platform installed then its not possible to add to new Ambari instance.
... View more
11-17-2019
09:09 PM
1 Kudo
@stunningsuraj >>The remote cluster on which hadoop is running - do you have ambari server also to manage that cluster? Or is it just hadoop installed without Ambari? 1. If you have remote cluster which is managed via Ambari then you can take ambari DB backup and import the backup DB to the newly setup Ambari server node. Check this once - https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/administering-ambari/content/amb_register_a_remote_cluster.html 2. If you are not managing remote cluster using Ambari then its not possible at your end to added the remote hadoop cluster to New Ambari. You need to follow "Ambari Takeover" is a process which can be used to add remote cluster to new Ambari Instance. But it completely depends on which kind of components you have installed and it is usually a tough process to setup Ambari over an existing HDP cluster. But it can be achieved. Similar threads you can find here: https://community.cloudera.com/t5/Support-Questions/Can-ambari-manage-an-exists-hadoop-cluster-an-update-it-to/m-p/155950 https://community.cloudera.com/t5/Support-Questions/Monitoring-Non-Ambari-Cluster-health-and-status/m-p/199501 You need to reach Professional services from cloudera team which can help you to migrate remote cluster to new ambari but you need to check possibility of data guarantee before proceeding for this activity.
... View more
11-15-2019
06:10 AM
@mike_bronson7 you just need to backup /hadoop/hdfs/namenode/current from active namenode Also if you backup one week earlier the activity and lets say your first cluster is going serve more request to clients then you will loose that data which was written after backup. So best is to do savenamespace and backup when you are going to do activity and freeze clients not accessing the cluster.
... View more
11-15-2019
04:53 AM
Backup i mean, copy the namenode current directory only first do safemode on and then save namespace. once both commands are executed take backup of namenode current directory from active node. you can copy to destination/new cluster using any command (like scp) or tool. scp sill be simplest option.
... View more
11-15-2019
03:13 AM
1 Kudo
1. if you can backup metadata drom orignal cluster(where datanode were existing at first) and copy that metadata to new cluster then thats the best option. 2. if you are not able to go with point 1, then probably you can try " hadoop namenode -recover" option. below link might be useful https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/ https://clouderatemp.wpengine.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
... View more
11-15-2019
02:06 AM
@mike_bronson7 What i got from your scenario is on second scratch installation your master nodes [ie. active/standby name-node ] are fresh installed and you are only adding the datanodes which are having pre-existing data [from other cluster]..right!! -- In this case its not possible to get the cluster up with new data from the HDD which was restored. Since namenode will not have any information about the blocks lying in blockstorage on the datanode disk. If you have opted support from Cloudera then you can approach them for DR scenario where they can help you to get existing data from datanodes to be added back in cluster[not sure if it can be recovered/added back 100%] Same for kafka.
... View more
11-14-2019
10:52 PM
@BaoHoYou can reach on below number - or subcribe with details here https://www.cloudera.com/contact-sales.html and sales person will reach to you. US: (888) 789-1488 International: +1 (650) 362-0488
... View more
11-14-2019
08:53 PM
@bdelpizzo Can you see any error in kafka logs/mirror maker logs? It might be possible that the mirror maker is not able to process messages, because of size of message. If the size of any message is high than configured/default value then it might stuck in queue. Check for message.max.bytes property
... View more
11-14-2019
08:41 PM
@BaoHo CCU means per Concurrent User (CCU). I think the best way to get the details for this is to reach to Cloudera Sales Representative. They will brief you on this topic.
... View more
11-14-2019
08:17 PM
@deekshant To debug Namenode issue you need to check below - 1. Check active namenode[NN] logs [for time when it got reboot] 2. Check active NN zkfc logs [same time - if you see any issue] 3. Check for standby NN logs at same time if you see any error 4. Check for standby NN zkfc logs for any error at same timestamp 5. Check for Active NN .out file for any warnings/error 6. Check for system logs "/var/log/message" for any issue at particular moment of time. You will find error in one of the above file. accordingly you can go for RCA. Do revert if you need further help.
... View more
11-14-2019
02:04 AM
@feblik Previously we used to access sandbox archieves using link - http://hortonworks.com/products/hortonworks-sandbox/#archive but after migration of the new portal only latest version are available on the portal. Still if you want HDP 2.3 then here you go - VMWare HDP 2.3.0 Sandbox VirtualBox HDP 2.3.0 Sandbox
... View more
11-14-2019
12:37 AM
2 Kudos
@TheBroMeister I will try to comment my views inline - 1.) How different would the Setup and configuration be for Physical Servers as to VMs. Yes, Setting up the VMs would be faster as compared to the physical ones but are there any additional configurations or settings that we would need to look into? -- If we talk regarding general configuration they below points will be taken in account which counting on performance - a. Disks b Network c. Memory/CPU d. SLA 2.) We've read that one possible issue with setting the cluster on VMs is with Data Locality and redundancy. On how no 2 replicas should not be in the same physical node but since one physical node may house several VMs, would there be a way around this issue? -- VM with external storage[like SAN] will be impacting data locality. You can go with dedicated disk for the VM's which will be a good hybrid approach. 'YES' , also for data locality addon components from virtual vendors[like vmware] are provided - such as BDE [Big Data Extensions] also for Network compromises of NSX technology which will help to speed up systems to avoid performance impacts. But you need to take licensing cost into account. 3.) Since the specs of the VMs would be restricted to the specs of the physical node and its resources be split depending on how many VMs it is housing, wouldn't it be better to have separate servers to house 1 node of a cluster to get better performance? and would having several VMs in one physical node affect the parallelism of the jobs that will run on the cluster? -- Its difficult to put decision at first moment based upon actual experiences. This decision purely depends upon your sla. At start while running hadoop applications, you might not be aware of how much time it takes for your application to process or meet the SLA. This can be purely POC base approach you need to test and also run benchmarking before you go for actual dev/uat/prod implementations. benchmarking results will give you fair idea about performance and computational stats. That can be easy then to take the decision. Pls do check below links which might be useful - https://community.cloudera.com/t5/Support-Questions/Virtual-Machines-in-Hadoop-cluster/td-p/119675 https://www.kdnuggets.com/2015/12/myths-virtualizing-hadoop-vsphere-explained.html https://pubs.vmware.com/bde-2/index.jsp
... View more
11-13-2019
11:06 PM
@TheBroMeister Every technology has its pros and cons. The above comment is very broad and every lasting if we discuss. Do you have any specific question/issue regarding implementations/architecture ? Will try to comment accordingly.
... View more
11-13-2019
11:02 PM
Hi @TheBroMeister What command you tried to balance hdfs previously ? Can you try running hdfs balancer as below - $ sudo –u hdfs hdfs balancer -threshold 1 While running you can check hdfs logs which can show you the details of moving data percent. Do revert if it wont work.
... View more
11-13-2019
02:28 AM
@webtransactor you might also need to check the process limit for user using which you are starting the service/running the service. /etc/security/limits.conf -- is the default file to set the limits you can also check doing - su - <username> ulimit -a You can increase ulimit accordingly.
... View more
11-12-2019
08:39 PM
@fgarciaCan you try to hit rest call and check if you get same info ? curl -X GET admin: http://<active_namenode>:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo From the Namenode screenshot it seems that 0 datanodes/blocks are reported to NN. Do you see all connections between DN and NN are good? Can you check/pass full log strack?
... View more
11-12-2019
08:37 PM
@VamshiDevraj If you are still facing issue can you share details about the error or screenshot for the same?
... View more
11-12-2019
08:09 PM
1. Is the job failed due to above reason? If "NO" - then is it the error occurring displayed in logs for all spark jobs or just for this job?
... View more
11-12-2019
02:21 AM
Can you also check heap size utilization for Ambari server. You might need to revisit Ambari server heap config if needed. Check this link for details - https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/administering-ambari/content/amb_adjust_ambari_server_heap_size.html
... View more
11-12-2019
02:19 AM
If you know the file name then - hdfs fsck /myfile.txt -files -blocks -locations Else hdfs fsck |grep <blkxxx>
... View more