Member since
10-11-2022
133
Posts
20
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 439 | 08-19-2025 01:50 AM | |
| 1893 | 11-07-2024 10:00 PM | |
| 2496 | 05-23-2024 11:44 PM | |
| 2197 | 05-19-2024 11:32 PM | |
| 9632 | 05-18-2024 11:26 PM |
05-05-2026
03:48 AM
1. Cloudera Operational Database backup guide: https://docs.cloudera.com/operational-database/cloud/managing-database/topics/cod-backing-up-table.html https://docs.cloudera.com/cdp-public-cloud/cloud/requirements-azure/topics/mc-az-minimal-setup-for-cloud-storage.html HBase snapshot export also covered in Runtime HBase backup docs. 2. Default backup location: Mainly stores cluster-level backups (Data Lake, FreeIPA, logs/telemetry). Not HBase table data. 3. Different container for manual backups: Use --snapshot-location abfss://othercontainer@account.dfs.core.windows.net/path in snapshot/export command. No direct Hadoop config edit needed in CDP Public Cloud. 4. Give HBase rights Assign Storage Blob Delegator role to HBase Managed Identity (at Storage Account level). Grant Storage Blob Data Owner/Contributor on target container. Set POSIX ACLs (Execute + Read/Write) on root and target path via Storage Explorer. Verify IDBroker mapping for HBase user. Test access with hdfs dfs -ls on target path. Enable "Allow trusted Microsoft services" if option available. @MintberryCrunch
... View more
05-05-2026
03:16 AM
Enabling a CMK at the environment level is meant for new encryption use in that environment, not for changing how already running services are encrypted. It should not disrupt existing CDE, CDF, or CML services that are already deployed and running. Existing services generally continue using the encryption setup they already have. The CMK choice is typically applied to new resources or new clusters created after the CMK is configured. In practice, the main impact is on future deployments, not on the current installed services. The CMK setting is usually a one-time environment configuration for that environment. @Lorenzo_F
... View more
05-05-2026
02:33 AM
Use a pre‑created Customer Managed KMS Key (CMK) for secret encryption; with restricted IAM, Liftie cannot create the key automatically. In AWS KMS, create or select a symmetric CMK in the same region as the CDP environment. Edit the Compute Restricted IAM policy and in the statement RestrictedKMSPermissionsUsingCustomerProvidedKey, replace the placeholder with the exact CMK ARN. Make sure that statement includes KMS actions such as kms:CreateGrant, kms:DescribeKey, kms:Encrypt, kms:Decrypt, kms:ReEncrypt*, kms:GenerateDataKey*. On the CMK itself, edit the KMS key policy to allow the required service roles (for example AWSServiceRoleForAutoScaling and the EKS/EC2 roles used by CDP) to use the key with the same KMS actions. Re‑run the compute cluster activation; since skip‑validation is not supported here, it will only succeed once the CMK and all related permissions are correctly configured. If, after these changes, the error persists, the next step is to capture the environment name, CMK ARN, and the full key policy, and open a case with Cloudera Support @Lorenzo_F
... View more
02-24-2026
01:48 AM
@APentyala This error is not caused by the SCD Type-2 logic or the MERGE syntax itself. The message “Ambiguous column reference deal_yearnumber in tgt” usually indicates that the column exists more than once in the target table metadata. Please check whether deal_yearnumber (or the partition column) is defined both as a regular column and in the PARTITIONED BY section. Run DESCRIBE FORMATTED and SHOW CREATE TABLE to verify the schema. If the column appears twice or was altered previously, Hive may treat it as ambiguous during MERGE compilation. Recreating the table with a clean schema (ensuring the partition column is defined only once) typically resolves the issue.
... View more
02-08-2026
12:10 AM
@MarlinGomez For that CCA175 streaming scenario with inconsistent formats, cleansing/transforming to HDFS, better to go with Spark Structured Streaming + schema evolution as the most exam-realistic pick. It handles real-time ingestion efficiently via micro-batches, infers/evolves schemas on the fly (especially with JSON/Avro), and lets you apply transformations like filter/map before writing Parquet to HDFS. Separate ETL pipelines per format add too much complexity/overhead for exam constraints, and pure schema-on-read skips proactive cleansing. QuickStart with Kafka source, schema merging enabled: .option("mergeSchema", "true").writeStream... to HDFS.This nails the "perform ETL on data using Spark API" objective perfectly. Good luck on your prep.
... View more
09-05-2025
04:17 AM
In NiFi 2.4 and above, the built-in Jython (Python 2 interpreter) for ExecuteScript has been removed, so the traditional approach using inline Python is no longer supported. There is, however, a modern and robust alternative using NiFi's first-class Python processor support for attribute manipulation, or using Groovy/Clojure in ExecuteScript, or simply leveraging UpdateAttribute for simple logic. https://nifi.apache.org/nifi-docs/python-developer-guide.html https://nifi.apache.org/components/org.apache.nifi.processors.script.ExecuteScript/
... View more
08-19-2025
01:56 AM
Hi, @Hadoop16 The error Zookeeper connection string cannot be null means the Router process is expecting ZooKeeper configs for token management but isn’t finding them. Even if you already have a ZooKeeper quorum set in core-site.xml, the Router Federation requires its own configs in hdfs-rbf-site.xml. Specifically, you need to set hadoop.kms.authentication.zk-dt-secret-manager.zkConnectionString (or in some versions hadoop.security.token.service.use_ip + hadoop.zk.address) depending on your setup. Please double-check that hdfs-rbf-site.xml contains the federation and router related properties, including ZooKeeper connection string and Kerberos settings. Ensure the file is deployed to all Router nodes and included in the classpath. Also verify that the Router service user has Kerberos credentials and permissions to connect to ZooKeeper. Once the ZK connection string is set properly, the Router daemon should start without this error.
... View more
08-19-2025
01:54 AM
Hi, @linssab This error (java.lang.ArithmeticException: / by zero in HiveSplitGenerator) usually comes from Hive when the query compiles into an empty or invalid input split. With PutHive3QL, DELETE/UPDATE operations on ACID tables often trigger a full table scan, and if stats are missing or corrupted, Tez can fail this way. First, try running the same SQL directly in Hive CLI/Beeline to confirm it’s not NiFi-specific. Then, run ANALYZE TABLE <table> COMPUTE STATISTICS and ANALYZE TABLE <table> COMPUTE STATISTICS FOR COLUMNS to refresh stats. Also check that the table is bucketed/transactional as required for ACID.
... View more
08-19-2025
01:51 AM
Hi, @Hz In HDFS 2.7.3, setting a storage policy on a directory does not immediately place new blocks directly into the target storage (e.g., ARCHIVE). New writes still go to default storage (usually DISK), and the Mover process is required to relocate both existing and newly written blocks to comply with the policy. The storage policy only marks the desired storage type, but actual enforcement happens through the Mover. This is expected behavior and you did not miss any configuration. There’s no way in 2.7.3 to bypass the Mover and force blocks to land directly in cold storage on write. Later Hadoop versions introduced improvements, but for your version, running the Mover is required.
... View more
08-19-2025
01:50 AM
Hi, @quangbilly79 Yes, you can continue to use HDFS normally while the Balancer is running. The Balancer only moves replicated block copies between DataNodes to even out disk usage; it does not modify the actual data files. Reads and writes are fully supported in parallel with balancing, and HDFS ensures data integrity through replication and checksums. The process may add some extra network and disk load, so you might see reduced performance during heavy balancing. There is no risk of data corruption caused by the Balancer. You don’t need to wait — it’s safe to continue your normal operations.
... View more