Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 701 | 06-04-2025 11:36 PM | |
| 1273 | 03-23-2025 05:23 AM | |
| 631 | 03-17-2025 10:18 AM | |
| 2313 | 03-05-2025 01:34 PM | |
| 1501 | 03-03-2025 01:09 PM |
12-25-2024
03:57 PM
1 Kudo
@Roomka Looks an old post all the same I will try to answer and I hope its still relevant to others too The challenges you're facing with Apache NiFi's development life cycle stem from its design, which does not fully separate code/logic and environment-specific configurations. To address this and create a robust process for porting flows from dev to QA to prod, consider the following solutions: 1. Use Parameter Contexts for Configuration Management NiFi supports Parameter Contexts, which can be used to externalize environment-specific configurations. This allows you to separate logic from environment-specific details. Steps: Define Parameter Contexts for each environment (e.g., DevContext, QAContext, ProdContext). Externalize configurations like: Number of threads Cron schedules Database connection strings API endpoints When deploying to a new environment: Import the flow without the environment-specific Parameter Contexts. Assign the appropriate Parameter Context for that environment. 2. Use NiFi Registry for Flow Versioning and Promotion The NiFi Registry provides a way to version control your flows and manage deployments across environments. Steps: Set up NiFi Registry and connect your NiFi instances to it. Use the Registry to version your flows. Promote flows from dev to QA to prod by exporting/importing them through the Registry. In each environment, override parameters using the appropriate Parameter Context. 3. Handle Env.-Specific Differences with External Configuration Management If Parameter Contexts are insufficient, consider externalizing configurations entirely using tools like Consul, AWS Parameter Store, or environment variables. Steps: Store all environment-specific configurations in an external tool. Use a custom script or a NiFi processor to fetch configurations dynamically at runtime. This ensures the flow logic remains the same across environments, and only the external configurations vary. 4. Adopt Best Practices for Flow Design To reduce the impact of embedded environment-specific details, follow these design principles: Avoid hardcoding resource-intensive configurations like thread counts or cron schedules into the flow. Use NiFi Variables or Parameters wherever possible to make configurations flexible. Split flows into smaller, reusable components to reduce complexity and improve maintainability. 5. Use Deployment Automation Automate the deployment process to ensure consistency and reduce manual errors. Use tools like Ansible, Terraform, or Jenkins to automate flow deployment. Include steps to set up Parameter Contexts or fetch external configurations as part of the deployment pipeline. 6. Mitigating the 12-Factor Principles Concern While NiFi isn't designed to fully adhere to 12-factor principles, you can adapt your processes to bridge this gap: Codebase: Manage flow versions centrally using NiFi Registry. Config: Externalize environment-specific configurations using Parameter Contexts or external configuration management tools. Build, Release, Run: Standardize your flows and deployment pipeline across environments. Disposability: Test flows rigorously in QA to ensure they can handle unexpected failures gracefully. Hope these points give you a better picture and possibly an answer Happy Haddoping
... View more
12-22-2024
12:39 PM
1 Kudo
Hi Shelton, Thanks very much for your help. I think this may be working. I'm going to continue testing it with the few of us who require access, but for now it's working from different browsers from my laptop. I had the certificates set up correctly from the start, and this setting seems to have done the trick. Thanks again for your reply and assistance!
... View more
12-20-2024
02:39 AM
1 Kudo
Thank you @SAMSAL (JOLT expert), The spec you posted worked properly and I'm able to see the flat JSON as expected.
... View more
12-20-2024
01:32 AM
1 Kudo
The TLS/SSL are configured correctly. Also, the krb5.conf is the same for pre-prod which is working fine. I think the problem partly has to do with our Python3.8 installation. We did the installation via Anaconda. Cloudera recommended will use yum to install the rh-python38 on our RHEL/OL7. Documentation is here: Installing Python 3.8 standard package on RHEL 7 | CDP Private Cloud Also, regarding the Impala, this Cloudera documentation was quite helpful: Configuring Impala Web UI | CDP Public Cloud The issue is resolved now by following the instructions in the above documentation.
... View more
12-19-2024
09:23 PM
1 Kudo
Hi @SAMSAL , This works really fine. Thank you so much for your solution. I got your idea of splitting the JSON and perform the transformation in SQL table. I will work on that. Thank you again.
... View more
12-19-2024
06:44 PM
@Shelton Thank you for your reply. This information is very helpful.
... View more
12-19-2024
05:02 PM
Please Help me @Shelton , what is the maximum datanode failure percentage? I tried to install 11 JN, and 11 ZK, but it didn't work, out of 16 nodes only 7 datanodes can fail active or dead, I need 8 datanodes dead but HA still running
... View more
12-18-2024
08:50 AM
@drewski7 I have just picked your ticket I hope I can help you resolve this issue if its still unresolved. There are are couple of configurations changes and implementations that have to done. 1. Overview OAuth allows Kafka clients to obtain access tokens from an external authentication provider like OAuth providers to authenticate with the Kafka broker. This process involves configuring the Kafka broker, OAuth provider, and Kafka clients. 2. Prerequisites Kafka cluster with SASL/OAUTHBEARER enabled. An OAuth provider set up to issue access tokens. Kafka clients that support SASL/OAUTHBEARER. Required libraries for OAuth integration (e.g. kafka-clients, oauth2-client, or keycloak adapters). 3. Procedure Step 1: Configure the OAuth Provider Set up an OAuth provider (e.g., Keycloak, Okta, etc.) to act as the identity provider (IdP). Register a new client application for Kafka in the OAuth provider: Set up client ID and client secret for Kafka clients. Configure scopes, roles, or claims required for authorization. Enable grant types like Client Credentials or Password (depending on your use case). Note down the following details: Authorization Server URL (e.g.https://authlogin.northwind.com/token). Client ID and Client Secret. Step 2: Configure the Kafka Broker Enable SASL/OAUTHBEARER Authentication: Edit the Kafka broker configuration (/config/server.properties) sasl.enabled.mechanisms=OAUTHBEARER listener.name.<listener-name>.oauthbearer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ oauth.token.endpoint.uri="https://auth.example.com/token" \ oauth.client.id="kafka-broker-client-id" \ oauth.client.secret="kafka-broker-client-secret" \ oauth.scope="kafka-scope"; Replace <listener-name> with (SASL_PLAINTEXT, SASL_SSL) as appropriate. Configure ACLs (Optional): If using authorization, configure ACLs to grant specific permissions to authenticated users. Restart the Kafka Broker: Restart the Kafka broker to apply the changes sudo systemctl restart kafka Step 3: Configure the Kafka Client Add required dependencies to your Kafka client application: For Java applications, add the Kafka and OAuth dependencies to your pom.xml or build.gradle. pom.xml example <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>com.nimbusds</groupId> <artifactId>oauth2-oidc-sdk</artifactId> <version>9.4</version> </dependency> 2. Configure OAuth in the Kafka Client: Specify the SASL mechanism and the OAuth token endpoint in the client configuration Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("security.protocol", "SASL_SSL"); props.put("sasl.mechanism", "OAUTHBEARER"); props.put("sasl.jaas.config", "org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required " + "oauth.token.endpoint.uri=\"https://auth.example.com/token\" " + "oauth.client.id=\"kafka-client-id\" " + "oauth.client.secret=\"kafka-client-secret\";"); 3. Implement Token Retrieval (Optional): Use an external tool or library to retrieve and manage tokens if you need a custom implementation. curl -X POST -d "grant_type=client_credentials&client_id=kafka-client-id&client_secret=kafka-client-secret" \ https://auth.example.com/token 4. Create the Kafka Producer/Consumer: Use the above configuration to initialize a Kafka producer or consumer KafkaProducer<String, String> producer = new KafkaProducer<>(props); Step 4: Test the Authentication Produce and consume messages to verify OAuth-based authentication: kafka-console-producer.sh --broker-list <broker-address> --topic <topic-name> --producer.config <client-config> kafka-console-consumer.sh --bootstrap-server <broker-address> --topic <topic-name> --consumer.config <client-config> Ensure logs indicate successful authentication using SASL/OAUTHBEARER. Step 5: Monitor and Debug Check Kafka broker logs for errors related to OAuth authentication. Verify token expiration and renewal mechanisms. Ensure the OAuth provider is reachable from the Kafka brokers and clients. Happy Hadooping I hope the above steps helps in the diagnosis and resolution of you Kafka OAuth issue
... View more
12-17-2024
04:48 PM
1 Kudo
@Shelton Thank you for your advice. As I use the latest version of NiFi and it requires Java 21, I added the following line in bootstrap.conf and confirmed the warning messages disappeared. java.arg.EnableNativeAccess=--enable-native-access=ALL-UNNAMED I appreciate your help. Thank you,
... View more
12-17-2024
12:41 PM
1 Kudo
@JSSSS The error is this "java.io.IOException: File /user/JS/input/DIC.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation." All the 3 datanode according to the log are excludeNodes=[192.168.1.81:9866, 192.168.1.125:9866, 192.168.1.8> with replication factor of 3 , writes should succeed to all the 3 datanodes else the write fails. The cluster may have under-replicated or unavailable blocks due to excluded nodes HDFS cannot use these nodes, possibly due to: Disk space issues. Write errors or disk failures. Network connectivity problems between the NameNode and DataNodes. 1. Verify if the DataNodes are live and connected to the NameNode hdfs dfsadmin -report Look for the "Live nodes" and "Dead nodes" section If all 3 DataNodes are excluded, they might show up as dead or decommissioned. Ensure the DataNodes have sufficient disk space for the write operation df -h Look at the HDFS data directories (/hadoop/hdfs/data) If disk space is full, clear unnecessary files or increase disk capacity hdfs dfs -rm -r /path/to/old/unused/files View the list of excluded nodes cat $HADOOP_HOME/etc/hadoop/datanodes.exclude If nodes are wrongly excluded: Remove their entries from datanodes.exclude. Refresh the NameNode to apply changes hdfs dfsadmin -refreshNodes Block Placement Policy: If the cluster has DataNodes with specific restrictions (e.g., rack awareness), verify the block placement policy grep dfs.block.replicator.classname $HADOOP_HOME/etc/hadoop/hdfs-site.xml Default: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault Happy hadooping
... View more