Member since
01-19-2017
3598
Posts
593
Kudos Received
359
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
119 | 10-26-2022 12:35 PM | |
274 | 09-27-2022 12:49 PM | |
344 | 05-27-2022 12:02 AM | |
276 | 05-26-2022 12:07 AM | |
469 | 01-16-2022 09:53 AM |
01-16-2022
09:53 AM
@Koffi This is typical of a rogue process hasn't reslease the Caused by: java.net.BindException: Address already in use You will need to run # kill -9 5356 The restart the NN that should resolve the issue
... View more
12-19-2021
08:51 AM
@Koffi Any updates on the commands ?
... View more
12-19-2021
04:24 AM
@Koffi Yes you obviously cannot run safe mode when the namenodes are down I can see the JN and ZKFC are all up can you run the below command on the last known good Namenode nn01 hopping you are running it as root su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode" If nn01 starts without any issue then run the same command on nn02 else share the logs from nn01
... View more
12-17-2021
08:56 PM
@Koffi This issue seems linked to your previous posting. Your last healthy name node was nn01, right? The assumption here is you are logged in as root Instructions to fix that one journal node. 1) Put both nn01 and nn02 in safe mode ( NN HA) $ sudo su - hdfs
[hdfs@host ~]$ hdfs dfsadmin -safemode enter Safe mode is ON in nn01/<nn01_IP>:8020 Safe mode is ON in nn02/<nn02_IP>:8020 2) Save Namespace [hdfs@host ~]$ hdfs dfsadmin -saveNamespace Save namespace successful for nn01/<nn01_IP>:8020 Save namespace successful for nn02/<nn02_IP>:8020 3) Backup zip/tar the journal dir from a working JN node of (nn01) and copy it to the non-working JN's of (nn02)node to something like /hadoop/hdfs/journal/<Cluster_name>/current 4) Leave safe mode [hdfs@host ~]$ hdfs dfsadmin -safemode leave Safe mode is OFF in nn01/<nn01_IP>:8020 Safe mode is OFF in nn02/<nn02_IP>:8020 4) Restart HDFS From Ambari you can now start the nn01 first when it comes up then start nn02 Please let me know.
... View more
12-17-2021
08:45 PM
@Koffi From the Ambari UI are you seeing any HDFS alert? ZKFailover Controller or Journalnodes? If so share the logs?
... View more
10-27-2021
12:08 PM
@Rish How much memory has your VM quickstart have? Can you open the RM and check using the application_id the logs should give you an idea of whats happening Geoffrey
... View more
10-27-2021
11:16 AM
@Koffi There are a couple of things here you first need to resolve too many open files issue by checking the ulimit $ ulimit -n To increase for the current session depending on the above output ulimit -n 102400 Edit /etc/security/limits.conf to make the change permanent. Then restart the kdc and kadmin depending on your Linux version systemctl # /etc/rc.d/init.d/krb5kdc start
# /etc/rc.d/init.d/kadmin start Then restart Atlas from the Ambari UI Please revert after these actions Geoffrey
... View more
10-02-2021
11:11 AM
@Phanikondeti Please can you share how you installed your nifi , version, and install documents followed? The errors logs would be good to share too.
... View more
09-21-2021
06:28 AM
@vciampa The solution is the Replication Manager which enables you to replicate data across data centers for disaster recovery scenarios. Replication Manager replicates HDFS, Hive, and Impala data, and supports Sentry to Ranger replication from CDH (version 5.10 and higher) clusters to CDP Private Cloud Base (version 7.0. 3 and higher) clusters. https://docs.cloudera.com/cdp/latest/data-migration/topics/cdp-data-migration-replication-manager-to-cdp-data-center.html It's easy to use 🙂 Happy Hadooping
... View more
09-20-2021
10:53 PM
@vciampa Please look at this document that performs the steps for upgrading from HDP to CDP Private Happy hadooping
... View more
09-09-2021
07:19 AM
@rachida_el-hamm Here is a very good resource, sit back and sip your coffee or tea. It should help you resolve your MySQL issue Happy hadooping
... View more
09-06-2021
12:19 PM
@Anup123 I responded to a similar question see SSL Sqoop If already have an SSL cert file, then you can generate your own JKS file and import your cert into your jks.
... View more
08-29-2021
07:19 AM
@npr20202 I am sorry but you will have to continue being a magician 🙂 If you don't want then you have to teach your users the secret sauce or magic wand. We too face the same problems with multiple users Spark/Impala/PySpark I have made them add the INVALIDATE METADATA and REFRESH (spark) )command at the start of their queries and that works perfectly Else the automatic invalidate/refresh of metadata is enabled and available in CDP 7.2.10. As long as Impala depends on HMS that issue will exist 🙂 Happy hadooping
... View more
08-29-2021
12:42 AM
@npr20202 That makes sense that the problem only crops up after the maintenance "reboot" of the Metasore host. Once the server is rebooted the metadata is purged from memory that explains the slowness of querries after a cluster restart. Automatic Invalidation/Refresh of Metadata Now an available option in CDP 7.2.10 When automatic invalidate/refresh of metadata is enabled, the Catalog Server polls Hive Metastore (HMS) notification events at a configurable interval and automatically applies the changes to Impala catalog. Impala Catalog Server polls and processes the following changes. Invalidates the tables when it receives the ALTER TABLE event. Refreshes the partition when it receives the ALTER, ADD, or DROP partitions. Adds the tables or databases when it receives the CREATE TABLE or CREATE DATABASE events. Removes the tables from catalogd when it receives the DROP TABLE or DROP DATABASE events. The HMS stores metadata for Hive tables schema, permissions, location, and partitions in a relational database providing clients access to this information by using metastore service API. Hive Metastore is a component in Hive that stores the catalog of the system that contains the metadata about Hive create columns, Hive table creation, and partitions. Impala uses the HIVE metastore to read the data created in hive, it is possible to read the same and query the same using Impala. All you need is to refresh the table or trigger INVALIDATE METADATA in impala to read the data. Hive and impala are two different query engines. Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking metadata about schema objects such as tables and columns. Virtualization Discoverability Schema Evolution Performance Hive utilizes execution engines (like Tez, Hive on Spark, and LLAP) to improve query performance without low-level tuning approaches. Leveraging parallel execution whenever sequential operations are not needed is also wise. The amount of parallelism that your system can perform depends on the resources available and the overall data structure. Proper Hive tuning allows you to manipulate as little data as possible. One way to do this is through partitioning, where you assign “keys” to subdirectories where your data is segregated. Impala uses Hive metastore and can query the Hive tables directly. Unlike Hive, Impala does not translate the queries into MapReduce jobs like hive but executes them natively using its daemons running on the data nodes to directly access the files on HDFS . Created metadata is stored in the Hive Metastore‚ and is contained in an RDBMS such as MySQL/Oracle, MSSQL or MariaDB. Hive and Impala work with the same data tables in HDFS, metadata in the Metastore. Metadata information of tables created in Hive is stored in Hive "Meta storage database".
... View more
08-28-2021
01:30 PM
@hadoclc Can you share more details? What job spark/hive? Can you share some information about your environment and the code submitted that fails? What is the permission of /user/yarn Who and how was the job executed in 7.1.4? Is the same user running the job in 7.1.6? Please share the logs?
... View more
08-27-2021
11:47 AM
1 Kudo
@vciampa The log clearly shows that the Address is already in use Caused by: java.net.BindException: Port in use: 0.0.0.0:8042 Caused by: java.net.BindException: Address already in use Can you proceed by locating the pid # lsof -i -P -n | grep LISTEN | grep 8042 Example # lsof -i:8042 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 9322 yarn 475u IPv4 294790 0t0 TCP *:fs-agent (LISTEN Kill using the PID $ kill -9 9322 Restart the service Please revert
... View more
08-26-2021
12:52 PM
@npr20202 To return accurate query results, Impala needs to keep the metadata current for the databases and tables queried. Therefore, if some other entity modifies information used by Impala in the metastore, the information cached by Impala must be updated via INVALIDATE METADATA or REFRESH. Difference between INVALIDATE METADATA and REFRESH INVALIDATE METADATA is an asynchronous operation that simply discards the loaded metadata from the catalog and coordinator caches. After that operation, the catalog and all the Impala coordinators only know about the existence of databases and tables and nothing more. Metadata loading for tables is triggered by any subsequent queries. REFRESH Just reloads the metadata synchronously. REFRESH is more lightweight than doing a full metadata load after a table has been invalidated. REFRESH cannot detect changes in block locations triggered by operations like HDFS balancer, hence causing remote reads during query execution with negative performance implications. Syntax INVALIDATE METADATA [[db_name.]table_name] You can run it in the HUE or impala-shell i.e INVALIDATE METADATA product.customer By default, the cached metadata for all tables is flushed. If you specify a table name, only the metadata for that one table is flushed. Even for a single table. INVALIDATE METADATA is more expensive than REFRESH, so prefer REFRESH in the common case where you add new data files for an existing table.
... View more
08-26-2021
12:36 PM
@dv_conan I think your issue should be resolved with this posting Hive on tez issue Please let me know if that resolves your problem or not
... View more
08-24-2021
02:44 PM
@Nitin0858 Did you enable your Ambari manually because those parameters you are referring to are set automatically when enabling through Ambari else if you did it manually as I suspect you need to perform the steps mentioned Set Up Kerberos for Ambari Server Please revert
... View more
08-24-2021
01:05 PM
@lyash There are a couple of things members need to be able to help with your case. CDH or CDP version? OS? Your Postgres version and document followed for setup or steps executed? Memory allocated to the hive Can you connect to Postgres locally? ie sudo --login --user=postgres Can you change the hive.metastore.schema.verification to false in hive-site.xml Please revert
... View more
08-21-2021
04:11 AM
1 Kudo
@mike_bronson7 Can you share your capacity scheduler , total memory and vcores configs ?
... View more
08-21-2021
03:56 AM
@Nitin0858 Can you share the 2 contents so we can help with the analysis?
... View more
08-06-2021
01:31 PM
@NIFI_123 Maybe you try this crontab generator it has more possibilities Hope that helps
... View more
08-01-2021
12:24 PM
@Vinay1991 I mentioned the logs below.You will need definitely the ZK ZKFailoverController logs and NameNode logs
... View more
08-01-2021
03:41 AM
@iPanda Adding Ambari for management purposes over an existing cluster is called as "Ambari Takeover". I did a write on that some years back Ambari take-over Here is another source of good info to try out referenced from Adaltas Please read carefully to fully understand the steps required
... View more
07-30-2021
02:48 PM
@Buithuy96 First and foremost re-running these steps won't do damage to your cluster I assure you. What you have is purely a permissions issue java.sql.SQLException: Access denied for user 'ambari'@'mtnode.hdp.vn' (using password: YES) Revalidate the MySQL connector # yum install -y mysql-connector-java
# ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar Re-run the Ambari user setup CREATE USER 'ambari'@'%' IDENTIFIED BY 'aCtct@123';
CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Ctct@123';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'mtnode.hdp.vn';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'mtnode.hdp.vn' IDENTIFIED BY 'Ctct@123';
FLUSH PRIVILEGES; Try restarting Ambari while tailing the ambari-server.log and share the contents, first reset the log before starting Ambari, to ensure you have a minimum log to delve through # truncate --size 0 /var/logs/ambari-server/ambari-server.log Restart your Ambari server and tail the logs # tail -f /var/logs/ambari-server/ambari-server.log Share the status
... View more
07-29-2021
01:04 PM
@Vinay1991 The ZK's look okay please go through the list I shared about the connectivity. Please validae one by one.
... View more
07-28-2021
10:30 AM
1 Kudo
@Vinay1991 From the logs, I see connectivity loss and that's precisely what's causing the NN switch. Remember in my earlier posting the importance of Zk quorum! Your NN and losing Connection to the ZK so the NN that loses active connection is causing the ZK to elect a new leader and that's happening in a loop Caused by : java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for read I would start by checking FW I see you are on Ubuntu so ensure the FW is disabled across the Cluster. Identifying and Fixing Socket Timeouts The root cause of a Socket Timeout is a connectivity failure between the machines, so try the usual process Check the settings: is this the machine you really wanted to talk to? From the machine that is raising the exception, can you resolve the hostname? Is that resolved hostname the correct one? Can you ping the remote host? Is the target machine running the relevant Hadoop processes? Can you telnet to the target host and port? Can you telnet to the target host and port from any other machine? On the target machine, can you telnet to the port using localhost as the hostname? If this works but external network connections time out, it's usually a firewall issue. If it is a remote object store: is the address correct? Does it go away when you repeat the operation? Does it only happen on bulk operations? If the latter, it's probably due to throttling at the far end. Check your hostname resolution DNS or /etc/hosts should be in sync and another important thing is all your host time should be in sync. can you share the value of Core-site.xml parameter ha.zookeeper.quorum
... View more
07-27-2021
12:35 PM
2 Kudos
@Vinay1991 Unfortunately, you haven't described your cluster setup but my assumption is that you have 3 Zk's in your HA implementation. There are two components deployed to Hadoop HDFS for implementing Automatic Failover. These two components are- ZKFailoverController process(ZKFC) ZooKeeper quorum (3 Zk's) 1. ZKFailoverController(ZKFC) The ZKFC is the ZooKeeper client, who is also responsible for managing and monitoring the NameNode state. ZKFC is a client that runs on all nodes on the Hadoop cluster, which is running NameNode. These 2 components are responsible for: Health monitoring ZKFC is accountable for health monitoring heart beating the NameNode with health-check commands periodically. As long as the NameNode responds with a healthy status timely, it considers the NameNode as healthy. In this case, if the NameNode got crashed, froze, or entered an unhealthy state, then it marks the NameNode as unhealthy. ZooKeeper session management It is also responsible for the session management with ZooKeeper. The ZKFC maintains a session open in the ZooKeeper when the local Namenode is healthy. Also, if the Local NameNode is the active NameNode, then with the session, it also holds a special lock “znode”. This lock uses ZooKeeper support for the ”ephemeral” nodes. Thus, if the session gets expires, the lock node will be deleted automatically. ZooKeeper-based election When the local Namenode is healthy and ZKFC finds that no other NameNode acquires the lock “znode”, then it will try by itself to acquire the lock. If it gets successful in obtaining the lock, then ZKFC has won the election, and now it is responsible for running the failover to make its local NameNode active. The failover process run by the ZKFC is similar to the failover process run by the manual failover described in the NameNode High Availability article. 2. ZooKeeper quorum A ZK quorum is a highly available service for maintaining little amounts of coordination data. It notifies the clients about the changes in that data. It monitors clients for the failures. The HDFS implementation of automatic failover depends on ZooKeeper for the following things: How does it detect NN Failure each NameNode machine in the Hadoop cluster maintains a persistent session in the ZooKeeper. If any of the machines crashes, then the ZooKeeper session maintained will get expire—zooKeeper than reveal to all the other NameNodes to start the failover process. To exclusively select the active NameNode, ZooKeeper provides a simple mechanism. In the case of active NameNode failure, another standby NameNode may take the special exclusive lock in the ZooKeeper, stating that it should become the next active NameNode. After the initialization of Health Monitor is completed, internal threads are started to call the method corresponding to the HASERVICE Protocol RPC interface of NameNode periodically to detect the health status of NameNode. If the Health Monitor detects a change in the health status of NameNode, it calls back the corresponding method registered by ZKFailover Controller for processing. If ZKFailover Controller decides that a primary-standby switch is needed, it will first use Active Standby Elector to conduct an automatic primary election. Active Standby Elector interacts with Zookeeper to complete an automatic backup election. Active Standby Elector calls back the corresponding method of ZKFailover Controller to notify the current NameNode to become the main NameNode or the standby NameNode after the primary election is completed. ZKFailover Controller calls the HASERVICE Protocol RPC interface corresponding to NameNode to convert NameNode to Active or Standby state. Taking all the above into account, the first component logs to check are Zk and NN, /var/log/hadoop/hdfs
/var/log/zookeeper My suspicion is you have issues with the Namenode heartbeat which makes the zookeeper fail to get the pingback in time and marks the NN as dead and elects a new leader and that keeps happening in a loop. So check those ZK logs to ensure time is set correctly and is in sync! Please revert
... View more
07-26-2021
10:51 PM
@sipocootap2 Unfortunately, you cannot disallow snapshots in a snapshottable directory that already has snapshots! Yes, you will have to list and delete the snapshot even if it contains subdirs you only pass the root snapshot in the hdfs dfs -deleteSnapshot command. If you had an $ hdfs dfs -ls /app/tomtest/.snapshot
Found 2 items
drwxr-xr-x - tom developer 0 2021-07-26 23:14 /app/tomtest/.snapshot/sipo/work/john
drwxr-xr-x - tom developer 0 2021-07-26 23:14 /app/tomtest/.snapshot/tap2/work//peter You would simply delete the snapshots like $ hdfs dfs -deleteSnapshot /app/tomtest/ sipo
$ hdfs dfs -deleteSnapshot /app/tomtest/ tap2
... View more