Member since
01-19-2017
3598
Posts
593
Kudos Received
359
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
119 | 10-26-2022 12:35 PM | |
273 | 09-27-2022 12:49 PM | |
341 | 05-27-2022 12:02 AM | |
276 | 05-26-2022 12:07 AM | |
467 | 01-16-2022 09:53 AM |
05-05-2018
04:17 PM
@Vishal G I did sit for the HDPCA and I can confirm you will have a link to the official HDP 2.3 documentation. But the instances are very dodgy try to work starting with easiest and save most time for the more difficult questions but if you tried out the test exam in AWS the look and feel is exactly the same. Good luck
... View more
05-05-2018
12:43 PM
@Sim kaur I would suggest you install at least the node managers on the servers where datanodes are running this way node managers can find the data locally. Datanodes are part of HDFS and node managers are part of Yarn. Datanodes are used to store data on HDFS whereas Nodemanagers are used to start a container on Yarn. There is no strict rule that datanodes and node managers have to be on the same host. If you have nodemanagers on all nodes, in this case, the containers running on hosts where datanodes aren't installed will still run the application by copying data from datanodes. That could be the issue of timeouts you are experiencing.
... View more
05-05-2018
11:51 AM
1 Kudo
@Raj ji I think your symlink is broken please recreate the symlink and re-test ! It should look like this lrwxrwxrwx 1 root root 23 Oct 19 2017 /usr/hdp/2.6.3.0/hive/conf -> /etc/hive/2.6.3.0/0 To create a new symlink (will fail if symlink exists already): ln -s /path/to/file /path/to/symlink
To create or update a symlink: ln -sf /path/to/file /path/to/symlink Hope that helps
... View more
05-05-2018
11:37 AM
@Sim kaur For sure all was working when you had 3 datanodes !!! The default replication factor is 3 so if you delete 2 out of 3 my HDFS data nodes, that literally means you have ONLY one copy of your file. With 6 nodes you could have a setup like this 2 Master node 3 Datanodes(every datanode should have a node manager default) I Edge Node (Low-end node ) You should have a least 3 zookeeper servers running and a client on each node! When you are not running NameNode HA you will see NN,SNN running on the same node the SNN daemon is only an NN helper for merging the edits and fsimage, it offloads the task of merging from the NN but if you plan to have High Availability then you should configure a real NameNode HA the primary and standby NameNodes MUST run on 2 different node !! There is no better document than the HWX Multi-home-cluster but from your setup you are running CDH, I don't thisnk there is a big difference. Please set add the custom hdfs-site properties check the HDFS configuration parameters in CDH. . dfs.client.use.datanode.hostname=true In your previous post above can you explain me the 3 4 And please revert
... View more
05-05-2018
08:57 AM
@Subramanian Govindasamy Can you check the /etc/hosts entries on all the nodes ? and do the following with the ambari-agents on the affected node move the move /var/lib/ambari-agent/data/structured-out-status.json to /tmp and restart the ambari-agent. # ambari-agent restart Do you see any error/exception in the /var/log/ambari-server/ambari-server.log?
... View more
05-05-2018
08:17 AM
@Sim kaur Problem The application log file shows: 74865 millis timeout while waiting for channel to be ready for connecting: java.nio.channels.SocketChannel[connected local=/172.31.4.192:42632 remote=/172.31.4.192:50010]. 74865 millis timeout left. All nodes are connected to each other via an internal switch, which is a subnet of 172.31.4.x. This network is not open to public access. Cause Each node in the Hadoop cluster has an internal IP (through an internal switch) and external IP address, used to communicate with clients and external apps. Hadoop cluster by using the internal IP addresses. According to the description, this is caused by the multi-homed cluster. Solution In this case, in the hdfs-site.xml file a property dfs.client.use.datanode.hostname is set. This is the parameter that should force a client to retrieve a hostname instead of an IP address and perform its own lookup of the hostname to get a routable path to that host. To solve this, add the following line into the custom hdfs-site properties. dfs.client.use.datanode.hostname=true Hope that help please revert
... View more
05-04-2018
08:59 PM
@Subramanian Govindasamy Tha means the HDFS is down can you start it from Ambari UI or CLI?
... View more
05-04-2018
12:54 PM
@Subramanian Govindasamy Seem you have problems with your Auth-to-local Rules please validate? ""message": "Invalid value for webhdfs parameter" The conclusion is: the username used with the query is checked against a regular expression and, if not validated, the above exception is returned. The default regular expression being: ^[A-Za-z_][A-Za-z0-9._-]*[$]?$ Can you start the namenode manually, su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode" Please revert
... View more
05-04-2018
08:00 AM
@harsha vardhan bandaru Look at this comparison it could help answer some of the doubts you have Comparison site
... View more
05-04-2018
07:36 AM
@harsha vardhan bandaru Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation written in Scala and Java. Confluent Platform includes Apache Kafka but also includes few things that can make Apache Kafka easier to use:
... View more
05-03-2018
07:10 PM
@Raj ji I am afraid there are only 2 options - tweak the python code NOT recommended !!!!!
- Ask the SysOPS team to rename the mount and notify them of the caveat for using /home as Hadoop mount point. Hope that helps
... View more
05-03-2018
06:32 PM
@Raj ji If I were you I would change the mount point name to some like /shome rather than tweaking the python code, because subsequent upgrades could be more frustrating. As reiterated that's not good practice neither is it recommended.
... View more
05-03-2018
06:17 PM
@Raj ji Check this link it answers your query https://community.hortonworks.com/answers/148611/view.html Hope that helps
... View more
05-03-2018
12:25 PM
@abu ameen, The command is wrong note the slash [zk: localhost:2181(CONNECTED) 6] ls /
... View more
05-02-2018
09:59 PM
@Prakash Punj You can apply ACL's I have a post in HCC can't locate it now but here is one You can lock out everybody and ONLY ALLOW. If you can give a use case I can help on that !!
... View more
05-01-2018
07:39 PM
2 Kudos
@Michael Bronson You can safely delete them
... View more
05-01-2018
07:37 PM
@Mustafa Ali Qizilbash Great to know that you got the help expected I expect you become a regular participant and share your experience and knowledge Open source 🙂 If you feel my responsed helped you then accept and close the thread so others in a similar sitation could use it !!
... View more
05-01-2018
06:29 PM
@Snoops First, ensure you have ZooKeeper ensemble(3) running. then NameNode HA ResourceManager(YARN) Once all is fine, if you are using VMware take a snapshot and proceed MySQL HA Oozie You will need a Loadbalancer, Virtual IP, or Round-Robin DNS. The load balancer should be configured for round-robin between the Oozie servers to distribute the requests
... View more
05-01-2018
11:07 AM
@Michael Bronson Yes that should delete the corrupt blocks notice the space between the / and -delete or simply using the -rm option see below hdfs fs -rm /path/to/file/with/permanently/missing/blocks To delete the first missing block in the case of your, output above this will be rebalanced with time or run manually the balancer i.e hdfs fs -rm /localF/STRZONEZone/intercept_by_country/2018/4/10/16/2018_4_10_16_45.parquet/part-00003-8600d0e2-c6b6-49b7-89cd-ef2a2bc1dc5e.snappy.parquet Hope that clarifies
... View more
05-01-2018
07:59 AM
@Dinesh Chitlangia Livy requires that this service principal is configured with a couple of different parameters, namely: livy.server.launch.kerberos.[principal|keytab]
livy.server.auth.kerberos.[principal|keytab] Also livy.server.auth.type needs to be set to kerberos. livy.impersonation.enabled = true
livy.server.auth.type = kerberos
livy.server.launch.kerberos.principal = livy/node2.{fqdn}@TEX.COM
livy.server.launch.kerberos.keytab = /etc/security/keytabs/livy.service.keytab
livy.server.auth.kerberos.principal = HTTP/node2.{fqdn}@TEX.COM
livy.server.auth.kerberos.keytab = /etc/security/keytabs/spnego.service.keytab This livy.server.auth.type will also set authentication for the Livy server itself. To configure Zeppelin with authentication for Livy you need to set the following in the interpreter settings: "zeppelin.livy.principal": "zeppelin/node2.{fqdn}@TEX.COM",
"zeppelin.livy.keytab": "/etc/security/keytabs/zeppelin.service.keytab" The launch parameters are used during startup: export SPARK_HOME=/usr/hdp/current/spark-client
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=/usr/lib/jvm/java-1.8.0-openjdk/bin:$PATH
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LIVY_SERVER_JAVA_OPTS="-Xmx2g" Kinit is not required with 0.3 of Livy, which is the version being used here. With livy 0.2 it is required to kinit the livy user before starting the web-service: $ su livy
$ kinit -kt /etc/security/keytabs/livy.service.keytab livy/node2.{fqdn}@TEX.COM
$ bin/livy-server start Authorization With authentication enabled setting authorization will likely be required. For this Livy provides access control settings to control which users have access to the resources: livy.server.access_control.enabled = true
livy.server.access_control.users = livy,zeppelin Further, for services like Zepplin impersonation settings are required. In order for the zeppelin user to be able to impersonate other users, it requires to be a superuser. livy.superusers=zeppelin Hope that helps
... View more
04-30-2018
08:54 PM
@Michael Bronson Its important to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated? If it's easy enough just to replace the file, that's the route I would take. HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster. so if HDFS detects that one replica of a block has become corrupt or damaged, HDFS will create a new replica of that block from a known-good replica, and will mark the damaged one for deletion. The known-good state is determined by checksums which are recorded alongside the block by each DataNode. This will list the corrupt HDFS blocks: hdfs fsck -list-corruptfileblocks This will delete the corrupted HDFS blocks: hdfs fsck / -delete Once you find a file that is corrupt hdfs fsck /path/to/corrupt/file -locations -blocks -files
Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks. You can use the reported block numbers to go around to the DataNodes and the NameNode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, DataNode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again. Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks. Once you determine what happened and you cannot recover any more blocks, just use the below command hdfs fs -rm /path/to/file/with/permanently/missing/blocks
command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.
... View more
04-30-2018
11:58 AM
@Mustafa Ali Qizilbash Yes, you should have separate nodes to host PROD & DR you can also go further into the redundancy concept by having network redundancy ie not have your PROD and DR share the same network segment. A PROD cluster should be in HA whose functionality is not the same as a DR site! High Availability (HA) i.e can be used t avoid downtime during upgrades, while Disaster Recovery is cas of total loss of primary site due to various reasons like earthquake ,bomb blast etc The most probable setup would be to have an HA setup for components like NameNode/RM/Hive metastore and of course the classic 3 zk instances. Setting up the NameNode and RM is well documented but here is an HCC doc for MySQL Metastore HA setup
... View more
04-30-2018
10:39 AM
@Michael Bronson Here is how I force my filesystem check for every 3 months, I use the below command below. $ sudo tune2fs -i 3m /dev/sda1 Now verify that newly added filesystem check conditions are set properly. $ sudo tune2fs -l /dev/sda1 Desired output should look like this Last mount time: n/a
Last write time: Sat Mar 10 22:29:24 2018
Mount count: 20
Maximum mount count: 30
Last checked: Fri Mar 2 20:55:08 2018
Check interval: 7776000 (3 months)
Next check after: Sat Jun 2 21:55:08 2018 Hope that answers your question
... View more
04-30-2018
09:17 AM
@Mustafa Ali Qizilbash Sorry I just didn't see that my focus was on the problem ... See the link above ........included in the compatible matrice
... View more
04-30-2018
08:37 AM
@Simran Kaur Let's first dive into the core explanation of NameNode(NN) and Secondary NameNode (SNN) by explaining the roles of NN and SNN. NameNode: It contains metadata of datanodes, directory tree structure, fsimage and edit logs related to your entire cluster SecondaryNameNode: It periodically collect fsimage and edit logs from NN and then merge those into new fsimage file, again it will push back into NN to decrease the size of NN metadata. So if NN fails SNN won't receive any updates from NN Your entire cluster go down With the help of SNN you can start another node as NN but SNN won't do NN works. It is just to collect fsimage and editlogs from NameNode Having said that a PRODUCTION cluster should run in a NameNode HA. Where the Active and Standby NameNodes are running on different hosts, racks and also with Network redundancy this ensures Automated Failover: HDP pro-actively detects NameNode host and process failures and will automatically switch to the standby NameNode to maintain availability for the HDFS service. Hot Standby: Both Active and Standby NameNodes have up to date HDFS metadata, ensuring seamless failover even for large clusters – which means no downtime for your HDP cluster! Full Stack Resiliency: The entire HDP stack to handle a NameNode failure scenario without losing data or the job progress. This is vital to ensure long running jobs that are critical to complete on schedule will not be adversely affected during a NameNode failure scenario. Here is aHortonworks documentation Youtube video Please let me know if that helped
... View more
04-30-2018
08:16 AM
@Mustafa Ali Qizilbash That explains why you should stick strictly to the Hortonworks documentation. If you'd mentioned the version I would have outright told you it's not included in the compatible matrice Hortonworks does rigorous testing against third-party tools before certifying it against its products. I have done 100's of installation but NEVER came across such an issue ...lessons learned to stick to technical documentation 🙂 Update the thread and close after the thorough test but I am positive it will work like a charm.
... View more
04-29-2018
09:41 PM
@Michael Bronson Any updates so as to close the thread?
... View more
04-29-2018
08:43 PM
@Mustafa Kemal MAYUK I have seen some variations which ain't correct, I would like you to compare with this HCC doc one thing already in [URLs] part the /** = authc should be the last entry and a couple others !! Please revert !!
... View more
04-29-2018
01:33 PM
@Mustafa Kemal MAYUK Can you share your shiro.ini after scrambling sensitive info?
... View more