About rki_

MrNicen · ‎12-19-2024

We still have 99 RIT's, how can we delete tables that are stuck?

UrosCvijanovic · ‎11-21-2024

Thank you @rki_ ! That is absolutely what happened. I had a node that the /tmp/ folder still contained old journalnode data. After cleaning it up and doing initializeSharedEdits i managed to start cluster. Note: I had this exact exception on two slave nodes: WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: There appears to be a gap in the edit log. We expected txid 121994, but got txid 121998. I did hdfs namenode -recover on both slave nodes and then was able to start both namenodes propely. The data is replicated within all 3 nodes. Thank you so much for the help!

Amandi · ‎09-29-2024

Hi rki_, It seems like both hbase:meta and hbase:namespace tables are not online. I am attaching the master log for your review, and if you know a way to fix this, can you check it? 2024-09-30 10:11:28,981 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:12:28,982 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:13:19,391 ERROR [ActiveMasterInitializationMonitor-1727422999267] master.MasterInitializationMonitor (MasterInitializationMonitor.java:run(67)) - Master failed to complete initialization after 900000ms. Please consider submitting a bug report including a thread dump of this process. 2024-09-30 10:13:28,982 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:13:36,668 INFO [master:store-WAL-Roller] monitor.StreamSlowMonitor (StreamSlowMonitor.java:<init>(122)) - New stream slow monitor dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727671416667 2024-09-30 10:13:36,684 INFO [master:store-WAL-Roller] wal.AbstractFSWAL (AbstractFSWAL.java:logRollAndSetupWalProps(834)) - Rolled WAL /hbase/MasterData/WALs/dc1-apache-hbase.mobitel.lk,16000,1727422992087/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727670516635 with entries=0, filesize=85 B; new WAL /hbase/MasterData/WALs/dc1-apache-hbase.mobitel.lk,16000,1727422992087/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727671416667 2024-09-30 10:13:37,089 INFO [WAL-Archive-0] wal.AbstractFSWAL (AbstractFSWAL.java:archiveLogFile(815)) - Archiving hdfs://192.168.6.205:9000/hbase/MasterData/WALs/dc1-apache-hbase.mobitel.lk,16000,1727422992087/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727670516635 to hdfs://192.168.6.205:9000/hbase/MasterData/oldWALs/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727670516635 2024-09-30 10:13:37,092 INFO [WAL-Archive-0] region.MasterRegionUtils (MasterRegionUtils.java:moveFilesUnderDir(50)) - Moved hdfs://192.168.6.205:9000/hbase/MasterData/oldWALs/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727670516635 to hdfs://192.168.6.205:9000/hbase/oldWALs/dc1-apache-hbase.mobitel.lk%2C16000%2C1727422992087.1727670516635$masterlocalwal$ 2024-09-30 10:14:28,982 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:15:28,983 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:16:28,983 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:16:41,861 INFO [RS-EventLoopGroup-1-1] hbase.Server (ServerRpcConnection.java:processConnectionHeader(550)) - Connection from 192.168.6.205:57364, version=2.5.10, sasl=false, ugi=super (auth:SIMPLE), service=MasterService 2024-09-30 10:17:28,984 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:18:28,985 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:19:28,985 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-09-30 10:20:28,985 WARN [master/dc1-apache-hbase:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1373)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1727422999057, server=dc1-apache-hbase.mobitel.lk,16020,1727159057270}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. Thank you!

rki_ · ‎09-13-2024

See if you can raise a support ticket with Cloudera. The app log needs a detailed review to know what is causing the container to get fail.

hadoopranger · ‎08-28-2024

I am unable to locate /hbase-secure znode , which one should i delete have the same issue , I am just having /hbase znode

Noel_0317 · ‎02-25-2024

Hi, it seems the table and partition can't be created, also the files on each datanodes can't be located by the namenode. 1. Is there a way to re-point those files? (non dfs used data to the actual directory) Configured Capacity: 1056759873536 (984.18 GB) DFS Used: 475136 (464 KB) Non DFS Used: 433030918144 (403.29 GB) DFS Remaining: 623711703040 (580.88 GB) DFS Used%: 0.00% DFS Remaining%: 59.02% Datanode directory: bash-4.2$ cd /hadoop/dfs/data bash-4.2$ ls -l total 10485776 drwxrwsr-x. 4 hadoop root 4096 Feb 23 11:15 current -rw-r--r--. 1 hadoop root 58 Feb 26 09:34 in_use.lock -rw-rw-r--. 1 hadoop root 10737418240 Aug 28 05:26 tempfile drwxrwsr-x. 2 hadoop root 4096 Feb 23 13:05 test 2. Next, how can we proceed with creating the tables and partitions? logs of namenode: 2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: *DIR* NameNode.rename: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 because destination's parent does not exist 2024-02-26 06:52:26,604 DEBUG ipc.Server: Served: rename, queueTime= 0 procesingTime= 0 2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164 2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164 Wrote 36 bytes. 2024-02-26 06:52:26,607 DEBUG ipc.Server: got #55245 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER 2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) 2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto 2024-02-26 06:52:26,607 DEBUG metrics.TopMetrics: a metric is reported: cmd: getfileinfo user: presto (auth:SIMPLE) 2024-02-26 06:52:26,607 DEBUG top.TopAuditLogger: ------------------- logged event for top service: allowed=true ugi=presto (auth:SIMPLE) ip=/10.128.38.29 cmd=getfileinfo src=/operator_metering/storage/metering_health_check dst=null perm=null 2024-02-26 06:52:26,607 DEBUG ipc.Server: Served: getFileInfo, queueTime= 0 procesingTime= 0 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 Wrote 34 bytes. 2024-02-26 06:52:26,608 DEBUG ipc.Server: got #55246 2024-02-26 06:52:26,608 DEBUG ipc.Server: IPC Server handler 4 on 9820: Call#55246 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER logs of reporting-operator: time="2024-02-23T14:14:21Z" level=error msg="cannot insert into Presto table operator_health_check" app=metering component=testWriteToPresto error="presto: query failed (200 OK): \"com.facebook.presto.spi.PrestoException: Failed to create directory: hdfs://hdfs-namenode-proxy:9820/tmp/presto-reporting-operator/1d20c5c5-11e0-47b4-9bce-eaa724db21eb\"" whenever we're trying to query, this is the error: Error running query: Partition location does not exist: hdfs://hdfs-namenode-0.hdfs-namenode:9820/user/hive/warehouse/datasource_mlp_gpu_request_slots/dt=2024-02-08 Thank you!

Meepoljd · ‎01-05-2024

I think your intention is to retrieve these data for your own monitoring or reporting tasks. If so, you can try requesting JMX to obtain the relevant data, such as through http://namenode:port/jmx.

jayes · ‎10-30-2023

Hi @rki_ , i tired that but still it is failing with same error. sudo -u hive beeline -u "jdbc:hive2://machine1.dev.domain.com:2181/default;password=hive;principal=hive/_HOST@DEV.domain.COM;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=****;user=hive;zooKeeperNamespace=hiveserver2" --hiveconf dfs.replication=1 -n hive--showHeader=false --outputformat=tsv2 -e "use testdb; export table newt1 to '/staging/exporttable/testdb/newt1';" Error : 23/10/31 02:01:14 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify dfs.replication at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)

nsup · ‎10-26-2023

Hi rki_, I understood. Thank you for the information.

rki_ · ‎10-24-2023

To enable Ranger authorization for HDFS on the same cluster we should not select the Ranger service dependency but we should select the 'Enable Ranger Authorization' checkbox instead of the Ranger service under HDFS. In the base cluster, even if you select / check the box for "Ranger_service", the CM seem to indicate saving configuration successfully, but that box will never be checked, and a warning message will be logged in CM server logs indicating "CyclicDependencyConfigUpdateListener - Unsetting dependency from service hdfs to service ranger to prevent cyclic dependency". Refer the below article which is for Solr-Ranger dependency. https://my.cloudera.com/knowledge/WARN-quotUnsetting-dependency-from-servicequot-when-Ranger?id=329275

Online	Offline
Last Visited	‎11-15-2025 04:28 AM

Member Since	‎07-30-2020 02:04 AM
Last Visited	‎11-15-2025 04:28 AM
Posts	219
Kudos received	44

Cloudera Community

Re: Restore data from datanode after doing hdfs na...

Re: HBase "Master is initializing" error in pseudo...

Re: After upgrading Cloudera Manager to 7.11.3, Li...

Re: Can HDFS Rebalancer run without interrupted Pr...

Re: CM-HDFS

Re: Tables regios stuck in RIT

Re: Restore data from datanode after doing hdfs na...

Re: HBase "Master is initializing" error in pseudo...

Re: Issues with ImportTsv - Job Fails with Exit Co...

Re: Dead region servers

Re: Failed to start namenode | unable to find any ...

Re: Where does the Hadoop WebUI data come from?

Re: Not able to set dfs.replication while running ...

Re: After upgrading Cloudera Manager to 7.11.3, Li...

Re: Can't enable Ranger Service for HDFS through c...