Member since
01-19-2017
3647
Posts
621
Kudos Received
364
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
150 | 12-22-2024 07:33 AM | |
98 | 12-18-2024 12:21 PM | |
384 | 12-17-2024 07:48 AM | |
290 | 08-02-2024 08:15 AM | |
3566 | 04-06-2023 12:49 PM |
01-02-2025
10:59 AM
@tuyen123 If you have installed other applications or dependencies for Spark, Hive, etc. that use a different version of protobuf, the conflict can cause issues with the block report. Locate conflicting protobuf JARs find $HADOOP_HOME -name "protobuf*.jar" Check if there are multiple versions present in $HADOOP_HOME/lib or other dependency paths. Remove conflicting jars Keep only the protobuf JAR version that matches your Hadoop distribution e.g.protobuf-java-2.5.0.jar Alternatively, explicitly set the protobuf version in your CLASSPATH. If third-party libraries are included in your Hadoop environment, they might override the correct protobuf version. Open $HADOOP_HOME/etc/hadoop/hadoop-env.sh and prepend the correct protobuf library: export HADOOP_CLASSPATH=/path/to/protobuf-java-2.5.0.jar:$HADOOP_CLASSPATH Verify Classpath hadoop classpath | grep protobuf Ensure it includes the correct protobuf JAR. Please try that and revert. Happy hadooping
... View more
12-31-2024
09:47 AM
1 Kudo
@MrNicen This is a very common problem where the table gets stuck in a DISABLING state. First, please try these series of diagnostic and repair steps: First, verify the current state: echo "scan 'hbase:meta'" | hbase shell Try to force the table state change using HBCK2: # Set table to ENABLED state hbase hbck -j ./hbase-hbck2-2.0.2.jar setTableState <table_name> ENABLED # Download HBCK2 if not already present wget https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase-hbck2/2.0.2/hbase-hbck2-2.0.2.jar If that doesn't work, try cleaning the znode: # Connect to ZooKeeper ./zkCli.sh -server localhost:2181 # Check the table znode ls /hbase/table/<table_name> # Delete the table znode if present rmr /hbase/table/<table_name> If the issue persists, try manually updating the meta table: hbase shell # Disable table disable '<table_name>' # Wait a few seconds, then enable enable '<table_name>' # If that fails, try force disable disable_all '<table_name>' If still stuck, try these repair commands: # Clear the META table state echo "put 'hbase:meta', '<table_name>', 'table:state', '\x08\x00'" | hbase shell # Recreate the regions hbase hbck -j ./hbase-hbck2-2.0.2.jar assigns <table_name> As a last resort, try a full cleanup: # Stop HBase ./bin/stop-hbase.sh # Clear ZooKeeper data ./zkCli.sh -server localhost:2181 rmr /hbase # Remove the META directory rm -rf /hbase/data/hbase/meta # Start HBase ./bin/start-hbase.sh # Recreate the table structure hbase shell create '<table_name>', {NAME => 'cf'} # Adjust column families as needed If none of these steps work, we can try a more aggressive approach: Back up your data: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot_name> -copy-to hdfs://backup-cluster/hbase Try a clean META rebuild: # Stop HBase ./bin/stop-hbase.sh # Clear META rm -rf /hbase/data/default/hbase/meta # Start HBase in repair mode env HBASE_OPTS="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC" ./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair # Start HBase normally ./bin/start-hbase.sh Additional troubleshooting tips: Check HBase logs for specific errors: tail -f /var/log/hbase/hbase-master.log Verify cluster health: hbase hbck -details Monitor region transitions: echo "scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}" | hbase shell If you encounter any specific errors during these steps, please share them and I can provide more targeted solutions.
... View more
12-25-2024
09:50 PM
1 Kudo
Hi @Shelton sorry for late reply: Here is my json response 400: Here is invoke HTTP: I have used below properties in invokhttp like below: Always Output Response : Set to true. Output Response Attributes: 400 and 500 Here is RouteOnAttribute: Is it like that?
... View more
12-25-2024
03:57 PM
1 Kudo
@Roomka Looks an old post all the same I will try to answer and I hope its still relevant to others too The challenges you're facing with Apache NiFi's development life cycle stem from its design, which does not fully separate code/logic and environment-specific configurations. To address this and create a robust process for porting flows from dev to QA to prod, consider the following solutions: 1. Use Parameter Contexts for Configuration Management NiFi supports Parameter Contexts, which can be used to externalize environment-specific configurations. This allows you to separate logic from environment-specific details. Steps: Define Parameter Contexts for each environment (e.g., DevContext, QAContext, ProdContext). Externalize configurations like: Number of threads Cron schedules Database connection strings API endpoints When deploying to a new environment: Import the flow without the environment-specific Parameter Contexts. Assign the appropriate Parameter Context for that environment. 2. Use NiFi Registry for Flow Versioning and Promotion The NiFi Registry provides a way to version control your flows and manage deployments across environments. Steps: Set up NiFi Registry and connect your NiFi instances to it. Use the Registry to version your flows. Promote flows from dev to QA to prod by exporting/importing them through the Registry. In each environment, override parameters using the appropriate Parameter Context. 3. Handle Env.-Specific Differences with External Configuration Management If Parameter Contexts are insufficient, consider externalizing configurations entirely using tools like Consul, AWS Parameter Store, or environment variables. Steps: Store all environment-specific configurations in an external tool. Use a custom script or a NiFi processor to fetch configurations dynamically at runtime. This ensures the flow logic remains the same across environments, and only the external configurations vary. 4. Adopt Best Practices for Flow Design To reduce the impact of embedded environment-specific details, follow these design principles: Avoid hardcoding resource-intensive configurations like thread counts or cron schedules into the flow. Use NiFi Variables or Parameters wherever possible to make configurations flexible. Split flows into smaller, reusable components to reduce complexity and improve maintainability. 5. Use Deployment Automation Automate the deployment process to ensure consistency and reduce manual errors. Use tools like Ansible, Terraform, or Jenkins to automate flow deployment. Include steps to set up Parameter Contexts or fetch external configurations as part of the deployment pipeline. 6. Mitigating the 12-Factor Principles Concern While NiFi isn't designed to fully adhere to 12-factor principles, you can adapt your processes to bridge this gap: Codebase: Manage flow versions centrally using NiFi Registry. Config: Externalize environment-specific configurations using Parameter Contexts or external configuration management tools. Build, Release, Run: Standardize your flows and deployment pipeline across environments. Disposability: Test flows rigorously in QA to ensure they can handle unexpected failures gracefully. Hope these points give you a better picture and possibly an answer Happy Haddoping
... View more
12-23-2024
02:43 AM
1 Kudo
1. Enable Resource Pools in Cloudera Manager 2. Create Resource Pools 3. Configure Submission Access Control We have done above configuration successfully But we are not able to do this section Configure Default Resource Pool Selection we don't have Pool Mapping Rules this option in Cloudera Manager > Impala > Configuration > Admission Control > Pool Mapping Rules. path We are using CDP 7.1.9 and CM 7.11.3 Do we need to change any thing in Placement Rules section of Impala Admission Control Configuration page
... View more
12-22-2024
12:39 PM
1 Kudo
Hi Shelton, Thanks very much for your help. I think this may be working. I'm going to continue testing it with the few of us who require access, but for now it's working from different browsers from my laptop. I had the certificates set up correctly from the start, and this setting seems to have done the trick. Thanks again for your reply and assistance!
... View more
12-22-2024
09:45 AM
1 Kudo
@rsurti The issue described suggests a mismatch or misconfiguration in the SAML integration with NiFi and NGINX. The following analysis and potential solutions should address your findings SAML Payload Issues: Empty Recipient Value: The Recipient in the SAML assertion should match the ACS (Assertion Consumer Service) URL configured in NiFi. If it is empty, this indicates a misconfiguration in the SAML IdP (OneLogin). Cookie and InResponseTo Mismatch: The InResponseTo attribute in the SAML response should correspond to the SAML request identifier issued by NiFi. If the cookie storing the SAML request ID is missing or mismatched, authentication fails. NiFi Error: SAML Authentication Request Identifier Cookie not found: This suggests that the browser is not sending back the SAML request ID cookie, or NiFi cannot recognize it. This could happen if: The cookie is not set or overwritten by NGINX. The cookie is being blocked or dropped due to cross-domain or SameSite restrictions. NGINX is misconfigured to handle or forward SAML cookies. Probable Causes NiFi Configuration: Misconfigured nifi.security.user.saml properties in nifi.properties. ACS URL mismatch between NiFi and OneLogin. NGINX Configuration: Improper handling of cookies, particularly the SAML request identifier cookie. Incorrect forwarding of headers or paths for SAML requests and responses. OneLogin Configuration: The SAML application in OneLogin is not configured to provide a valid Recipient or ACS URL. Mismatched SAML settings such as entity ID, ACS URL, or signature settings. Steps to Resolve 1. Verify and Update NiFi Configuration Ensure the nifi.properties file has the correct SAML configurations: nifi.security.user.saml.idp.metadata.url=<OneLogin SAML Metadata URL> nifi.security.user.saml.sp.entity.id=<NiFi Entity ID> nifi.security.user.saml.sp.base.url=https://<nifi-url> # Same as what users access nifi.security.user.saml.authentication.expiration=12 hours nifi.security.user.saml.request.identifier.name=nifi-request-id The nifi.security.user.saml.sp.base.url must match the Recipient value in the SAML response. 2. Check OneLogin SAML Connector Configuration Ensure the Recipient value in OneLogin matches the NiFi ACS URL: ACS URL: https://<nifi-url>/nifi-api/access/saml/login/consumer Verify that the SAML settings in OneLogin include: Audience (Entity ID): Matches nifi.security.user.saml.sp.entity.id. ACS URL: Matches nifi.security.user.saml.sp.base.url. 3. Debug and Adjust NGINX Configuration Ensure NGINX is not interfering with SAML cookies proxy_pass https://<nifi-host>:9444; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_cookie_path / "/; SameSite=None; Secure"; Add debug logging to check if cookies are being forwarded correctly. 4. Troubleshoot Cookie Handling Check the browser developer tools (under Application > Cookies) to verify that the SAML request identifier cookie is being set and returned. Ensure the SameSite=None and Secure flags are set for the cookies. 5. Check SAML Logs for Errors In the nifi-user.log file, look for logs that provide details on the failed SAML authentication, including: Missing cookies. InResponseTo mismatch. 6. Test the Flow After making the adjustments, perform the following: Clear browser cookies. Initiate the SAML login process from the NiFi GUI. Check if the Recipient and InResponseTo values align in the SAML assertion and request. Use a SAML debugging tool like SAML-tracer (browser extension) to inspect the SAML request/response flows before that enable debug Enable detailed logging in NiFi for SAML authentication by modifying logback.xml <logger name="org.apache.nifi.web.security.saml" level="DEBUG" /> Let me know if you need further assistance! Happy hadooping
... View more
12-22-2024
05:33 AM
1 Kudo
@polingsky202 To configure HAProxy to connect three Kafka brokers with Kerberos authentication and resolve the Authentication failed due to invalid credentials with SASL mechanism GSSAPI error, follow these steps: Step 1: Review the Current Configuration The provided configuration shows: HAProxy is configured for load balancing using roundrobin. Kafka brokers are set up with: advertised.listeners and listeners for internal and LB connections. SASL GSSAPI configured with Kerberos. Issue Likely Causes: Kerberos principal or keytab file mismatch. Improper mapping of advertised listener names. Client-side misconfiguration for Kerberos authentication. Step 2: Correct and Optimize HAProxy Configuration Update the HAProxy configuration to correctly pass Kerberos authentication to Kafka brokers. Updated haproxy.cfg listen kafka bind *:6677 mode tcp balance roundrobin option tcp-check server kafka1 kafka-1.kafka.net:6668 check server kafka2 kafka-2.kafka.net:6669 check server kafka3 kafka-3.kafka.net:6666 check Key updates above in the haproxy config file: Mode TCP: Ensures TCP passthrough for Kerberos authentication. Option tcp-check: Validates backend server availability. Step 3: Verify Kafka Broker Configuration Ensure the Kerberos configuration for each broker is consistent and properly aligned. Key Points: advertised.listeners: Ensure the LB listener matches the address clients will connect to via HAProxy (e.g. gateway.kafka.net). Kerberos JAAS Configuration: Validate the listener.name.LB.gssapi.sasl.jaas.config entry for all brokers. Ensure the keyTab file exists and has correct permissions: ls -l /etc/security/keytabs/kafka.service.keytab Example Updated kafka1 Broker Configuration: advertised.listeners=INTERNAL://:6667,LB://gateway.kafka.net:6668 listeners=INTERNAL://:6667,LB://:6668 listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,LB:SASL_PLAINTEXT inter.broker.listener.name=INTERNAL listener.name.LB.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ doNotPrompt=true useKeyTab=true storeKey=true \ keyTab="/etc/security/keytabs/kafka.service.keytab" \ principal="kafka/gateway.kafka.net@KAFKA.NET"; Repeat similar updates for kafka2 and kafka3 with their respective listener ports. Step 4: Update Kerberos Configuration Ensure that Kerberos configuration is consistent across all systems. Validate Kerberos krb5.conf ensure the file includes the correct realm and KDC information: [libdefaults] default_realm = KAFKA.NET [realms] KAFKA.NET = { kdc = your-kdc-host admin_server = your-kdc-admin-host } 2. Test Kerberos Principal: Verify the principal works with the keytab: kinit -kt /etc/security/keytabs/kafka.service.keytab kafka/gateway.kafka.net@KAFKA.NET Step 5: Verify Client Configuration The client is attempting to authenticate with Kerberos. Ensure the producer properties are configured correctly updated Producer Command: see below /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh \ --topic my-topic \ --broker-list gateway.kafka.net:6677 \ --producer-property security.protocol=SASL_PLAINTEXT \ --producer-property sasl.kerberos.service.name=kafka Key Properties: security.protocol=SASL_PLAINTEXT: Specifies Kerberos authentication. sasl.kerberos.service.name=kafka: Matches the Kerberos principal’s service name. Step 6: Test and Troubleshoot Enable Debug Logging: Add -Dsun.security.krb5.debug=true to the JVM options for the client to debug Kerberos issues export KAFKA_OPTS="-Dsun.security.krb5.debug=true" Check Logs: On the client side, check for detailed Kerberos errors in the output. On Kafka brokers, inspect logs for authentication errors: less /var/log/kafka/server.log 3. Verify Connectivity: Use telnet or nc to confirm connectivity to HAProxy and brokers telnet gateway.kafka.net 6677 telnet kafka-1.kafka.net 6668 Final Checklist Ensure all brokers have consistent Kerberos configurations. Verify the client-side security.protocol and sasl.kerberos.service.name settings. Ensure HAProxy uses TCP passthrough (mode tcp) for Kerberos. With these adjustments, the Kerberos authentication error should be resolved. Let me know if further clarification is needed! Happy hadooping
... View more
12-20-2024
07:52 AM
2 Kudos
@nifier 3 weeks old posting but I still hope it help resolve your reporting task.NiFi has a built-in Data Provenance feature that tracks the lineage of data as it moves through the flow. To capture file transfer details: 1. Enable Provenance Reporting in NiFi Provenance Events: NiFi records events such as SEND, RECEIVE, DROP, and ROUTE. For SFTP file transfers, look for SEND events. Steps to Enable Provenance Reporting: Log in to the NiFi UI. Go to the Provenance tab (accessible from the top menu). Configure the Provenance Repository in nifi.properties to store sufficient nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=1 GB Ensure the max.storage.time or max.storage.size is configured to retain events for the desired reporting period. 2. Query Provenance Data You can query and filter provenance events to generate the report: Go to the Provenance tab in the NiFi UI. Filter the events using criteria: Component Name: Filter for the SFTP processor (PutSFTP). Event Type: Select SEND. Date Range: Specify the desired time frame. Download the filtered results as a CSV file. 3. Automate Reporting with NiFi Reporting Tasks To generate periodic reports automatically: Use the SiteToSiteProvenanceReportingTask: In the NiFi canvas, navigate to Controller Settings. Add a new Reporting Task and select SiteToSiteProvenanceReportingTask. Configure the Reporting Task to: Specify the target location for the report. Filter for SEND events related to your file transfer processors. Schedule the Reporting Task to run periodically (e.g., daily or weekly). 4. Include File Name and Transfer Status NiFi provenance events include metadata such as the file name, size, and transfer status: File Name: Captured in the filename attribute. Transfer Status: Success: The event is logged with SEND. Failure: Look for errors or failed processor logs (use a LogMessage processor to capture failure events in flow). 5. Alternative: Push Logs to External Tools You can push the provenance data to an external system for detailed analysis and reporting: Elasticsearch/Kibana: Use the PutElasticsearch processor to send provenance events to Elasticsearch and visualize them in Kibana. Custom Script: Use the ExecuteScript processor to write a Python or Groovy script to extract, filter, and format the provenance data into a report. 6. Sample Workflow for Reporting Use a QueryProvenance processor to fetch provenance events for the desired period. Filter for SEND events from the SFTP processor. Route successful and failed events to different processors (PutFile for saving logs). Format the report (CSV/JSON) using processors like UpdateAttribute and ConvertRecord. By combining these steps, you can efficiently generate a report for all file transfers in the given period, including file names and transfer statuses.
... View more
12-20-2024
02:39 AM
1 Kudo
Thank you @SAMSAL (JOLT expert), The spec you posted worked properly and I'm able to see the flat JSON as expected.
... View more