Member since
01-19-2017
3665
Posts
626
Kudos Received
368
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
236 | 03-05-2025 01:34 PM | |
152 | 03-03-2025 01:09 PM | |
114 | 03-02-2025 07:19 AM | |
539 | 12-22-2024 07:33 AM | |
339 | 12-18-2024 12:21 PM |
12-22-2024
12:39 PM
1 Kudo
Hi Shelton, Thanks very much for your help. I think this may be working. I'm going to continue testing it with the few of us who require access, but for now it's working from different browsers from my laptop. I had the certificates set up correctly from the start, and this setting seems to have done the trick. Thanks again for your reply and assistance!
... View more
12-20-2024
07:52 AM
2 Kudos
@nifier 3 weeks old posting but I still hope it help resolve your reporting task.NiFi has a built-in Data Provenance feature that tracks the lineage of data as it moves through the flow. To capture file transfer details: 1. Enable Provenance Reporting in NiFi Provenance Events: NiFi records events such as SEND, RECEIVE, DROP, and ROUTE. For SFTP file transfers, look for SEND events. Steps to Enable Provenance Reporting: Log in to the NiFi UI. Go to the Provenance tab (accessible from the top menu). Configure the Provenance Repository in nifi.properties to store sufficient nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=1 GB Ensure the max.storage.time or max.storage.size is configured to retain events for the desired reporting period. 2. Query Provenance Data You can query and filter provenance events to generate the report: Go to the Provenance tab in the NiFi UI. Filter the events using criteria: Component Name: Filter for the SFTP processor (PutSFTP). Event Type: Select SEND. Date Range: Specify the desired time frame. Download the filtered results as a CSV file. 3. Automate Reporting with NiFi Reporting Tasks To generate periodic reports automatically: Use the SiteToSiteProvenanceReportingTask: In the NiFi canvas, navigate to Controller Settings. Add a new Reporting Task and select SiteToSiteProvenanceReportingTask. Configure the Reporting Task to: Specify the target location for the report. Filter for SEND events related to your file transfer processors. Schedule the Reporting Task to run periodically (e.g., daily or weekly). 4. Include File Name and Transfer Status NiFi provenance events include metadata such as the file name, size, and transfer status: File Name: Captured in the filename attribute. Transfer Status: Success: The event is logged with SEND. Failure: Look for errors or failed processor logs (use a LogMessage processor to capture failure events in flow). 5. Alternative: Push Logs to External Tools You can push the provenance data to an external system for detailed analysis and reporting: Elasticsearch/Kibana: Use the PutElasticsearch processor to send provenance events to Elasticsearch and visualize them in Kibana. Custom Script: Use the ExecuteScript processor to write a Python or Groovy script to extract, filter, and format the provenance data into a report. 6. Sample Workflow for Reporting Use a QueryProvenance processor to fetch provenance events for the desired period. Filter for SEND events from the SFTP processor. Route successful and failed events to different processors (PutFile for saving logs). Format the report (CSV/JSON) using processors like UpdateAttribute and ConvertRecord. By combining these steps, you can efficiently generate a report for all file transfers in the given period, including file names and transfer statuses.
... View more
12-20-2024
02:39 AM
1 Kudo
Thank you @SAMSAL (JOLT expert), The spec you posted worked properly and I'm able to see the flat JSON as expected.
... View more
12-20-2024
01:32 AM
1 Kudo
The TLS/SSL are configured correctly. Also, the krb5.conf is the same for pre-prod which is working fine. I think the problem partly has to do with our Python3.8 installation. We did the installation via Anaconda. Cloudera recommended will use yum to install the rh-python38 on our RHEL/OL7. Documentation is here: Installing Python 3.8 standard package on RHEL 7 | CDP Private Cloud Also, regarding the Impala, this Cloudera documentation was quite helpful: Configuring Impala Web UI | CDP Public Cloud The issue is resolved now by following the instructions in the above documentation.
... View more
12-19-2024
09:23 PM
1 Kudo
Hi @SAMSAL , This works really fine. Thank you so much for your solution. I got your idea of splitting the JSON and perform the transformation in SQL table. I will work on that. Thank you again.
... View more
12-19-2024
06:44 PM
@Shelton Thank you for your reply. This information is very helpful.
... View more
12-19-2024
05:02 PM
Please Help me @Shelton , what is the maximum datanode failure percentage? I tried to install 11 JN, and 11 ZK, but it didn't work, out of 16 nodes only 7 datanodes can fail active or dead, I need 8 datanodes dead but HA still running
... View more
12-18-2024
08:50 AM
@drewski7 I have just picked your ticket I hope I can help you resolve this issue if its still unresolved. There are are couple of configurations changes and implementations that have to done. 1. Overview OAuth allows Kafka clients to obtain access tokens from an external authentication provider like OAuth providers to authenticate with the Kafka broker. This process involves configuring the Kafka broker, OAuth provider, and Kafka clients. 2. Prerequisites Kafka cluster with SASL/OAUTHBEARER enabled. An OAuth provider set up to issue access tokens. Kafka clients that support SASL/OAUTHBEARER. Required libraries for OAuth integration (e.g. kafka-clients, oauth2-client, or keycloak adapters). 3. Procedure Step 1: Configure the OAuth Provider Set up an OAuth provider (e.g., Keycloak, Okta, etc.) to act as the identity provider (IdP). Register a new client application for Kafka in the OAuth provider: Set up client ID and client secret for Kafka clients. Configure scopes, roles, or claims required for authorization. Enable grant types like Client Credentials or Password (depending on your use case). Note down the following details: Authorization Server URL (e.g.https://authlogin.northwind.com/token). Client ID and Client Secret. Step 2: Configure the Kafka Broker Enable SASL/OAUTHBEARER Authentication: Edit the Kafka broker configuration (/config/server.properties) sasl.enabled.mechanisms=OAUTHBEARER listener.name.<listener-name>.oauthbearer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ oauth.token.endpoint.uri="https://auth.example.com/token" \ oauth.client.id="kafka-broker-client-id" \ oauth.client.secret="kafka-broker-client-secret" \ oauth.scope="kafka-scope"; Replace <listener-name> with (SASL_PLAINTEXT, SASL_SSL) as appropriate. Configure ACLs (Optional): If using authorization, configure ACLs to grant specific permissions to authenticated users. Restart the Kafka Broker: Restart the Kafka broker to apply the changes sudo systemctl restart kafka Step 3: Configure the Kafka Client Add required dependencies to your Kafka client application: For Java applications, add the Kafka and OAuth dependencies to your pom.xml or build.gradle. pom.xml example <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>com.nimbusds</groupId> <artifactId>oauth2-oidc-sdk</artifactId> <version>9.4</version> </dependency> 2. Configure OAuth in the Kafka Client: Specify the SASL mechanism and the OAuth token endpoint in the client configuration Properties props = new Properties(); props.put("bootstrap.servers", "broker1:9092,broker2:9092"); props.put("security.protocol", "SASL_SSL"); props.put("sasl.mechanism", "OAUTHBEARER"); props.put("sasl.jaas.config", "org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required " + "oauth.token.endpoint.uri=\"https://auth.example.com/token\" " + "oauth.client.id=\"kafka-client-id\" " + "oauth.client.secret=\"kafka-client-secret\";"); 3. Implement Token Retrieval (Optional): Use an external tool or library to retrieve and manage tokens if you need a custom implementation. curl -X POST -d "grant_type=client_credentials&client_id=kafka-client-id&client_secret=kafka-client-secret" \ https://auth.example.com/token 4. Create the Kafka Producer/Consumer: Use the above configuration to initialize a Kafka producer or consumer KafkaProducer<String, String> producer = new KafkaProducer<>(props); Step 4: Test the Authentication Produce and consume messages to verify OAuth-based authentication: kafka-console-producer.sh --broker-list <broker-address> --topic <topic-name> --producer.config <client-config> kafka-console-consumer.sh --bootstrap-server <broker-address> --topic <topic-name> --consumer.config <client-config> Ensure logs indicate successful authentication using SASL/OAUTHBEARER. Step 5: Monitor and Debug Check Kafka broker logs for errors related to OAuth authentication. Verify token expiration and renewal mechanisms. Ensure the OAuth provider is reachable from the Kafka brokers and clients. Happy Hadooping I hope the above steps helps in the diagnosis and resolution of you Kafka OAuth issue
... View more
12-17-2024
04:48 PM
1 Kudo
@Shelton Thank you for your advice. As I use the latest version of NiFi and it requires Java 21, I added the following line in bootstrap.conf and confirmed the warning messages disappeared. java.arg.EnableNativeAccess=--enable-native-access=ALL-UNNAMED I appreciate your help. Thank you,
... View more
12-17-2024
12:41 PM
1 Kudo
@JSSSS The error is this "java.io.IOException: File /user/JS/input/DIC.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation." All the 3 datanode according to the log are excludeNodes=[192.168.1.81:9866, 192.168.1.125:9866, 192.168.1.8> with replication factor of 3 , writes should succeed to all the 3 datanodes else the write fails. The cluster may have under-replicated or unavailable blocks due to excluded nodes HDFS cannot use these nodes, possibly due to: Disk space issues. Write errors or disk failures. Network connectivity problems between the NameNode and DataNodes. 1. Verify if the DataNodes are live and connected to the NameNode hdfs dfsadmin -report Look for the "Live nodes" and "Dead nodes" section If all 3 DataNodes are excluded, they might show up as dead or decommissioned. Ensure the DataNodes have sufficient disk space for the write operation df -h Look at the HDFS data directories (/hadoop/hdfs/data) If disk space is full, clear unnecessary files or increase disk capacity hdfs dfs -rm -r /path/to/old/unused/files View the list of excluded nodes cat $HADOOP_HOME/etc/hadoop/datanodes.exclude If nodes are wrongly excluded: Remove their entries from datanodes.exclude. Refresh the NameNode to apply changes hdfs dfsadmin -refreshNodes Block Placement Policy: If the cluster has DataNodes with specific restrictions (e.g., rack awareness), verify the block placement policy grep dfs.block.replicator.classname $HADOOP_HOME/etc/hadoop/hdfs-site.xml Default: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault Happy hadooping
... View more