Member since
01-19-2017
3647
Posts
621
Kudos Received
364
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
150 | 12-22-2024 07:33 AM | |
98 | 12-18-2024 12:21 PM | |
384 | 12-17-2024 07:48 AM | |
290 | 08-02-2024 08:15 AM | |
3566 | 04-06-2023 12:49 PM |
01-02-2025
10:59 AM
@tuyen123 If you have installed other applications or dependencies for Spark, Hive, etc. that use a different version of protobuf, the conflict can cause issues with the block report. Locate conflicting protobuf JARs find $HADOOP_HOME -name "protobuf*.jar" Check if there are multiple versions present in $HADOOP_HOME/lib or other dependency paths. Remove conflicting jars Keep only the protobuf JAR version that matches your Hadoop distribution e.g.protobuf-java-2.5.0.jar Alternatively, explicitly set the protobuf version in your CLASSPATH. If third-party libraries are included in your Hadoop environment, they might override the correct protobuf version. Open $HADOOP_HOME/etc/hadoop/hadoop-env.sh and prepend the correct protobuf library: export HADOOP_CLASSPATH=/path/to/protobuf-java-2.5.0.jar:$HADOOP_CLASSPATH Verify Classpath hadoop classpath | grep protobuf Ensure it includes the correct protobuf JAR. Please try that and revert. Happy hadooping
... View more
12-31-2024
09:47 AM
1 Kudo
@MrNicen This is a very common problem where the table gets stuck in a DISABLING state. First, please try these series of diagnostic and repair steps: First, verify the current state: echo "scan 'hbase:meta'" | hbase shell Try to force the table state change using HBCK2: # Set table to ENABLED state hbase hbck -j ./hbase-hbck2-2.0.2.jar setTableState <table_name> ENABLED # Download HBCK2 if not already present wget https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase-hbck2/2.0.2/hbase-hbck2-2.0.2.jar If that doesn't work, try cleaning the znode: # Connect to ZooKeeper ./zkCli.sh -server localhost:2181 # Check the table znode ls /hbase/table/<table_name> # Delete the table znode if present rmr /hbase/table/<table_name> If the issue persists, try manually updating the meta table: hbase shell # Disable table disable '<table_name>' # Wait a few seconds, then enable enable '<table_name>' # If that fails, try force disable disable_all '<table_name>' If still stuck, try these repair commands: # Clear the META table state echo "put 'hbase:meta', '<table_name>', 'table:state', '\x08\x00'" | hbase shell # Recreate the regions hbase hbck -j ./hbase-hbck2-2.0.2.jar assigns <table_name> As a last resort, try a full cleanup: # Stop HBase ./bin/stop-hbase.sh # Clear ZooKeeper data ./zkCli.sh -server localhost:2181 rmr /hbase # Remove the META directory rm -rf /hbase/data/hbase/meta # Start HBase ./bin/start-hbase.sh # Recreate the table structure hbase shell create '<table_name>', {NAME => 'cf'} # Adjust column families as needed If none of these steps work, we can try a more aggressive approach: Back up your data: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot_name> -copy-to hdfs://backup-cluster/hbase Try a clean META rebuild: # Stop HBase ./bin/stop-hbase.sh # Clear META rm -rf /hbase/data/default/hbase/meta # Start HBase in repair mode env HBASE_OPTS="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC" ./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair # Start HBase normally ./bin/start-hbase.sh Additional troubleshooting tips: Check HBase logs for specific errors: tail -f /var/log/hbase/hbase-master.log Verify cluster health: hbase hbck -details Monitor region transitions: echo "scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}" | hbase shell If you encounter any specific errors during these steps, please share them and I can provide more targeted solutions.
... View more
12-25-2024
03:57 PM
1 Kudo
@Roomka Looks an old post all the same I will try to answer and I hope its still relevant to others too The challenges you're facing with Apache NiFi's development life cycle stem from its design, which does not fully separate code/logic and environment-specific configurations. To address this and create a robust process for porting flows from dev to QA to prod, consider the following solutions: 1. Use Parameter Contexts for Configuration Management NiFi supports Parameter Contexts, which can be used to externalize environment-specific configurations. This allows you to separate logic from environment-specific details. Steps: Define Parameter Contexts for each environment (e.g., DevContext, QAContext, ProdContext). Externalize configurations like: Number of threads Cron schedules Database connection strings API endpoints When deploying to a new environment: Import the flow without the environment-specific Parameter Contexts. Assign the appropriate Parameter Context for that environment. 2. Use NiFi Registry for Flow Versioning and Promotion The NiFi Registry provides a way to version control your flows and manage deployments across environments. Steps: Set up NiFi Registry and connect your NiFi instances to it. Use the Registry to version your flows. Promote flows from dev to QA to prod by exporting/importing them through the Registry. In each environment, override parameters using the appropriate Parameter Context. 3. Handle Env.-Specific Differences with External Configuration Management If Parameter Contexts are insufficient, consider externalizing configurations entirely using tools like Consul, AWS Parameter Store, or environment variables. Steps: Store all environment-specific configurations in an external tool. Use a custom script or a NiFi processor to fetch configurations dynamically at runtime. This ensures the flow logic remains the same across environments, and only the external configurations vary. 4. Adopt Best Practices for Flow Design To reduce the impact of embedded environment-specific details, follow these design principles: Avoid hardcoding resource-intensive configurations like thread counts or cron schedules into the flow. Use NiFi Variables or Parameters wherever possible to make configurations flexible. Split flows into smaller, reusable components to reduce complexity and improve maintainability. 5. Use Deployment Automation Automate the deployment process to ensure consistency and reduce manual errors. Use tools like Ansible, Terraform, or Jenkins to automate flow deployment. Include steps to set up Parameter Contexts or fetch external configurations as part of the deployment pipeline. 6. Mitigating the 12-Factor Principles Concern While NiFi isn't designed to fully adhere to 12-factor principles, you can adapt your processes to bridge this gap: Codebase: Manage flow versions centrally using NiFi Registry. Config: Externalize environment-specific configurations using Parameter Contexts or external configuration management tools. Build, Release, Run: Standardize your flows and deployment pipeline across environments. Disposability: Test flows rigorously in QA to ensure they can handle unexpected failures gracefully. Hope these points give you a better picture and possibly an answer Happy Haddoping
... View more
12-22-2024
09:45 AM
1 Kudo
@rsurti The issue described suggests a mismatch or misconfiguration in the SAML integration with NiFi and NGINX. The following analysis and potential solutions should address your findings SAML Payload Issues: Empty Recipient Value: The Recipient in the SAML assertion should match the ACS (Assertion Consumer Service) URL configured in NiFi. If it is empty, this indicates a misconfiguration in the SAML IdP (OneLogin). Cookie and InResponseTo Mismatch: The InResponseTo attribute in the SAML response should correspond to the SAML request identifier issued by NiFi. If the cookie storing the SAML request ID is missing or mismatched, authentication fails. NiFi Error: SAML Authentication Request Identifier Cookie not found: This suggests that the browser is not sending back the SAML request ID cookie, or NiFi cannot recognize it. This could happen if: The cookie is not set or overwritten by NGINX. The cookie is being blocked or dropped due to cross-domain or SameSite restrictions. NGINX is misconfigured to handle or forward SAML cookies. Probable Causes NiFi Configuration: Misconfigured nifi.security.user.saml properties in nifi.properties. ACS URL mismatch between NiFi and OneLogin. NGINX Configuration: Improper handling of cookies, particularly the SAML request identifier cookie. Incorrect forwarding of headers or paths for SAML requests and responses. OneLogin Configuration: The SAML application in OneLogin is not configured to provide a valid Recipient or ACS URL. Mismatched SAML settings such as entity ID, ACS URL, or signature settings. Steps to Resolve 1. Verify and Update NiFi Configuration Ensure the nifi.properties file has the correct SAML configurations: nifi.security.user.saml.idp.metadata.url=<OneLogin SAML Metadata URL> nifi.security.user.saml.sp.entity.id=<NiFi Entity ID> nifi.security.user.saml.sp.base.url=https://<nifi-url> # Same as what users access nifi.security.user.saml.authentication.expiration=12 hours nifi.security.user.saml.request.identifier.name=nifi-request-id The nifi.security.user.saml.sp.base.url must match the Recipient value in the SAML response. 2. Check OneLogin SAML Connector Configuration Ensure the Recipient value in OneLogin matches the NiFi ACS URL: ACS URL: https://<nifi-url>/nifi-api/access/saml/login/consumer Verify that the SAML settings in OneLogin include: Audience (Entity ID): Matches nifi.security.user.saml.sp.entity.id. ACS URL: Matches nifi.security.user.saml.sp.base.url. 3. Debug and Adjust NGINX Configuration Ensure NGINX is not interfering with SAML cookies proxy_pass https://<nifi-host>:9444; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_cookie_path / "/; SameSite=None; Secure"; Add debug logging to check if cookies are being forwarded correctly. 4. Troubleshoot Cookie Handling Check the browser developer tools (under Application > Cookies) to verify that the SAML request identifier cookie is being set and returned. Ensure the SameSite=None and Secure flags are set for the cookies. 5. Check SAML Logs for Errors In the nifi-user.log file, look for logs that provide details on the failed SAML authentication, including: Missing cookies. InResponseTo mismatch. 6. Test the Flow After making the adjustments, perform the following: Clear browser cookies. Initiate the SAML login process from the NiFi GUI. Check if the Recipient and InResponseTo values align in the SAML assertion and request. Use a SAML debugging tool like SAML-tracer (browser extension) to inspect the SAML request/response flows before that enable debug Enable detailed logging in NiFi for SAML authentication by modifying logback.xml <logger name="org.apache.nifi.web.security.saml" level="DEBUG" /> Let me know if you need further assistance! Happy hadooping
... View more
12-22-2024
07:33 AM
1 Kudo
@Emery I think this should resolve your problem, change the nifi.web.https.host and shown below so it binds to all network interfaces, allowing access from other machines in your intranet nifi.web.https.host=ourMacMini20 --> nifi.web.https.host=0.0.0.0 Browser Trust for Self-Signed Certificates Problem: If you're using a self-signed certificate, browsers on other machines may block access or show warnings. Solution: Install the certificate from the NiFi server on the client machines' trusted certificate store. Alternatively, use a certificate from a trusted Certificate Authority (CA). Please let me know if that helped Happy hadooping
... View more
12-22-2024
05:33 AM
1 Kudo
@polingsky202 To configure HAProxy to connect three Kafka brokers with Kerberos authentication and resolve the Authentication failed due to invalid credentials with SASL mechanism GSSAPI error, follow these steps: Step 1: Review the Current Configuration The provided configuration shows: HAProxy is configured for load balancing using roundrobin. Kafka brokers are set up with: advertised.listeners and listeners for internal and LB connections. SASL GSSAPI configured with Kerberos. Issue Likely Causes: Kerberos principal or keytab file mismatch. Improper mapping of advertised listener names. Client-side misconfiguration for Kerberos authentication. Step 2: Correct and Optimize HAProxy Configuration Update the HAProxy configuration to correctly pass Kerberos authentication to Kafka brokers. Updated haproxy.cfg listen kafka bind *:6677 mode tcp balance roundrobin option tcp-check server kafka1 kafka-1.kafka.net:6668 check server kafka2 kafka-2.kafka.net:6669 check server kafka3 kafka-3.kafka.net:6666 check Key updates above in the haproxy config file: Mode TCP: Ensures TCP passthrough for Kerberos authentication. Option tcp-check: Validates backend server availability. Step 3: Verify Kafka Broker Configuration Ensure the Kerberos configuration for each broker is consistent and properly aligned. Key Points: advertised.listeners: Ensure the LB listener matches the address clients will connect to via HAProxy (e.g. gateway.kafka.net). Kerberos JAAS Configuration: Validate the listener.name.LB.gssapi.sasl.jaas.config entry for all brokers. Ensure the keyTab file exists and has correct permissions: ls -l /etc/security/keytabs/kafka.service.keytab Example Updated kafka1 Broker Configuration: advertised.listeners=INTERNAL://:6667,LB://gateway.kafka.net:6668 listeners=INTERNAL://:6667,LB://:6668 listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,LB:SASL_PLAINTEXT inter.broker.listener.name=INTERNAL listener.name.LB.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ doNotPrompt=true useKeyTab=true storeKey=true \ keyTab="/etc/security/keytabs/kafka.service.keytab" \ principal="kafka/gateway.kafka.net@KAFKA.NET"; Repeat similar updates for kafka2 and kafka3 with their respective listener ports. Step 4: Update Kerberos Configuration Ensure that Kerberos configuration is consistent across all systems. Validate Kerberos krb5.conf ensure the file includes the correct realm and KDC information: [libdefaults] default_realm = KAFKA.NET [realms] KAFKA.NET = { kdc = your-kdc-host admin_server = your-kdc-admin-host } 2. Test Kerberos Principal: Verify the principal works with the keytab: kinit -kt /etc/security/keytabs/kafka.service.keytab kafka/gateway.kafka.net@KAFKA.NET Step 5: Verify Client Configuration The client is attempting to authenticate with Kerberos. Ensure the producer properties are configured correctly updated Producer Command: see below /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh \ --topic my-topic \ --broker-list gateway.kafka.net:6677 \ --producer-property security.protocol=SASL_PLAINTEXT \ --producer-property sasl.kerberos.service.name=kafka Key Properties: security.protocol=SASL_PLAINTEXT: Specifies Kerberos authentication. sasl.kerberos.service.name=kafka: Matches the Kerberos principal’s service name. Step 6: Test and Troubleshoot Enable Debug Logging: Add -Dsun.security.krb5.debug=true to the JVM options for the client to debug Kerberos issues export KAFKA_OPTS="-Dsun.security.krb5.debug=true" Check Logs: On the client side, check for detailed Kerberos errors in the output. On Kafka brokers, inspect logs for authentication errors: less /var/log/kafka/server.log 3. Verify Connectivity: Use telnet or nc to confirm connectivity to HAProxy and brokers telnet gateway.kafka.net 6677 telnet kafka-1.kafka.net 6668 Final Checklist Ensure all brokers have consistent Kerberos configurations. Verify the client-side security.protocol and sasl.kerberos.service.name settings. Ensure HAProxy uses TCP passthrough (mode tcp) for Kerberos. With these adjustments, the Kerberos authentication error should be resolved. Let me know if further clarification is needed! Happy hadooping
... View more
12-20-2024
07:52 AM
2 Kudos
@nifier 3 weeks old posting but I still hope it help resolve your reporting task.NiFi has a built-in Data Provenance feature that tracks the lineage of data as it moves through the flow. To capture file transfer details: 1. Enable Provenance Reporting in NiFi Provenance Events: NiFi records events such as SEND, RECEIVE, DROP, and ROUTE. For SFTP file transfers, look for SEND events. Steps to Enable Provenance Reporting: Log in to the NiFi UI. Go to the Provenance tab (accessible from the top menu). Configure the Provenance Repository in nifi.properties to store sufficient nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=1 GB Ensure the max.storage.time or max.storage.size is configured to retain events for the desired reporting period. 2. Query Provenance Data You can query and filter provenance events to generate the report: Go to the Provenance tab in the NiFi UI. Filter the events using criteria: Component Name: Filter for the SFTP processor (PutSFTP). Event Type: Select SEND. Date Range: Specify the desired time frame. Download the filtered results as a CSV file. 3. Automate Reporting with NiFi Reporting Tasks To generate periodic reports automatically: Use the SiteToSiteProvenanceReportingTask: In the NiFi canvas, navigate to Controller Settings. Add a new Reporting Task and select SiteToSiteProvenanceReportingTask. Configure the Reporting Task to: Specify the target location for the report. Filter for SEND events related to your file transfer processors. Schedule the Reporting Task to run periodically (e.g., daily or weekly). 4. Include File Name and Transfer Status NiFi provenance events include metadata such as the file name, size, and transfer status: File Name: Captured in the filename attribute. Transfer Status: Success: The event is logged with SEND. Failure: Look for errors or failed processor logs (use a LogMessage processor to capture failure events in flow). 5. Alternative: Push Logs to External Tools You can push the provenance data to an external system for detailed analysis and reporting: Elasticsearch/Kibana: Use the PutElasticsearch processor to send provenance events to Elasticsearch and visualize them in Kibana. Custom Script: Use the ExecuteScript processor to write a Python or Groovy script to extract, filter, and format the provenance data into a report. 6. Sample Workflow for Reporting Use a QueryProvenance processor to fetch provenance events for the desired period. Filter for SEND events from the SFTP processor. Route successful and failed events to different processors (PutFile for saving logs). Format the report (CSV/JSON) using processors like UpdateAttribute and ConvertRecord. By combining these steps, you can efficiently generate a report for all file transfers in the given period, including file names and transfer statuses.
... View more
12-19-2024
11:07 PM
1 Kudo
@enam I see you have used the correct InvokeHTTP processor which is used to send API requests and handle responses, including error codes 400, 500 etc. Configuration for InvokeHTTP: Set Required Properties: HTTP Method: POST Remote URL: http://192.168.200.162:2031/nostroliquidity........ Return Code: Ensure Success Codes includes only 2xx responses by default. To enable Error Responses add the following settings: "Always Output Response": Set to true. Output Response Attributes: Include attributes such as status.code and status.message to capture response metadata. This ensures that the processor outputs responses regardless of status code. Now route Responses Based on Status Code Use the RouteOnAttribute processor to differentiate between success and error responses add two conditions: Success Route: Check for status.code >= 200 AND status.code < 300. Error Route: Check for status.code >= 400. Connect the InvokeHTTP processor's Response relationship to the RouteOnAttribute processor. To write Responses to Appropriate Locations Use a PutFile or PutHDFS processor for both success and error routes: Success Route: Write successful responses to a specific directory. Error Route: Write error responses 400, 500 to a separate directory. Include response details for debugging. GenerateFlowFile --> InvokeHTTP --> RouteOnAttribute --> [Success] PutFile --> [Error] PutFile Important Configuration Notes Configure Failure Handling in InvokeHTTP: Connect the InvokeHTTP processor’s Original relationship to a LogMessage processor or another flow to avoid losing the original flowfile. Customize Filenames or Attributes: Use UpdateAttribute to set filenames or directories dynamically based on attributes like status.code or status.message. Capture Full API Responses: Ensure that Response Body from the API is written as the content of the output file. Can you run the above flow and revert Happy hadooping
... View more
12-19-2024
09:08 AM
@Velankanni If you are still having the problem you can you try this JOLT remember the spoiler tag distorts the JSON JOLT Spec [ { "operation": "shift", "spec": { "data": { "getItemListing": { "edges": { "*": { "node": { "identifier": "[&1].identifier", "parentItems": { "*": { "parentIdentifier": "[&3].[&1].parentIdentifier" } }, "treatmentClusterIds": { "*": { "metadata": { "*": { "treatmentClusterIDs": "[&4].[&2].[&1].TreatmentId" } }, "element": { "name": "[&4].[&2].[&1].Element", "labelingClusters": { "*": { "labelingCluster": "[&5].[&3].[&2].LabelingCluster" } } } } } } } } } } } }, { "operation": "shift", "spec": { "*": { "*": { "*": { "*": { "*": { "*": { "$": "[#6].identifier", "parentIdentifier": "[#6].parentIdentifier", "Element": "[#6].Element", "TreatmentId": "[#6].TreatmentId", "LabelingCluster": "[#6].LabelingCluster" } } } } } } } } ] Place the Input JSON in a file input.json. Use a JOLT processor Apply the spec above and verify the output matches the Output JSON format. Happy hadooping
... View more
12-19-2024
08:44 AM
@Abhijith_Nayak To achieve the desired behavior where Impala queries automatically run in the respective resource pools based on the user, you can configure Impala Admission Control to handle this routing seamlessly. Here's how you can implement it. 1. Enable Resource Pools in Cloudera Manager Ensure that Dynamic Resource Pools are enabled in Cloudera Manager: Go to Cloudera Manager > Impala > Configuration > Admission Control. Enable Admission Control if it isn’t already. 2. Create Resource Pools You already have two resource pools (resource_pool_1 and resource_pool_2). Ensure these pools are properly configured: resource_pool_1: For users A, B, and C. resource_pool_2: For users D and E. Verify the following settings for each pool: Memory and CPU resources are allocated appropriately for the expected workloads. Query concurrency limits are set based on your cluster’s capacity. 3. Configure Submission Access Control Map the users to their respective resource pools: Navigate to Cloudera Manager > Impala > Configuration > Admission Control > Resource Pools. For resource_pool_1, under Submission Access Control, add: A,B,C 3. For resource_pool_2, under Submission Access Control, add D,E This ensures that only the specified users can submit queries to their designated pools. Configure Default Resource Pool Selection Use the pool_mapping configuration to automatically route queries based on the user. This eliminates the need for users to specify the pool explicitly when submitting queries. Steps: Navigate to Cloudera Manager > Impala > Configuration > Admission Control > Pool Mapping Rules. Add rules to map users to their pools user:A pool:resource_pool_1 user:B pool:resource_pool_1 user:C pool:resource_pool_1 user:D pool:resource_pool_2 user:E pool:resource_pool_2 5. Validate the Configuration Restart the Impala services to apply the changes. Run test queries as each user to confirm the routing works as expected: Log in as user A, B, C, D, and E and execute queries without specifying the resource pool. Use the following command in the Impala shell to check the assigned resource pool for a query: PROFILE; Look for the Admission result section to verify the query ran in the correct pool. Expected Outcome Users A, B, and C will have their queries automatically routed to resource_pool_1. Users D and E will have their queries automatically routed to resource_pool_2. No manual pool specification will be required during query submission. This configuration ensures proper workload isolation and efficient resource utilization in your cluster. Let me know if further clarification is needed Happy hadooping !!!
... View more