About Shelton

RobertWilliams · ‎03-24-2025

Thank you so much for helping me out.

Shelton · ‎03-23-2025

@shiva239 You've set up GitLabFlowRegistry in NiFi 2.0's controller settings and can import flows from GitLab and commit changes back through the UI your existing CI/CD pipeline uses NiFi toolkit commands. As of NiFi 2.0, there aren't direct toolkit commands specifically designed for the new GitLabFlowRegistry feature someone should correct me here. However, you have a few options to achieve what you need. Option 1: Use the NiFi REST API The GitLabFlowRegistry operations that you perform through the UI are backed by REST API endpoints. You can integrate these REST API calls into your CI/CD pipeline # Example REST API call to import/sync from Git repository curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt Option 2: Hybrid Approach Use a combination of existing toolkit commands and REST API calls: Use toolkit commands for basic operations (deployment to a target NiFi instance) Use REST API calls for GitLabFlowRegistry-specific operations Option 3: Create Custom Scripts You could create wrapper scripts that combine the toolkit functionality with the necessary REST API calls for GitLabFlowRegistry operations: #!/bin/bash # Custom script to sync from GitLabFlowRegistry and deploy # First pull from Git registry via REST API curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt # Then use toolkit commands for additional operations as needed ./bin/cli.sh nifi pg-import ... I recommend adopting Option 3 (custom scripts) for your CI/CD pipeline. This approach: Takes advantage of the new GitLabFlowRegistry feature Maintains compatibility with your existing toolkit-based CI/CD pipeline Provides flexibility to customize the process based on your specific requirements For implementing the REST API calls, consult the NiFi API documentation for the full set of endpoints related to the new GitLabFlowRegistry functionality. The NiFi Admin Guide for version 2.0 should also have details on these new REST endpoints. Happy hadooping

Shelton · ‎03-23-2025

@spserd Looking at your issue with Spark on Kubernetes, I see a clear difference between the client and cluster deployment modes that's causing the "system" authentication problem. the issue is when running in client mode with spark-shell, you're encountering an authorization issue where Spark is trying to create executor pods as "system" instead of using your service account "spark-sa", despite providing the token. Possible Solution For client mode, you need to add a specific configuration to tell Spark to use the token for executor pod creation --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa So your updated command should look like this ./bin/spark-shell \ --master k8s://https://my-k8s-cluster:6443 \ --deploy-mode client \ --name spark-shell-poc \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \ --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ --conf spark.kubernetes.namespace=dynx-center-resources \ --conf spark.driver.pod.name=dynx-spark-driver \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN The key is that in client mode, you need to explicitly configure the executor authentication because the driver is running outside the cluster and needs to delegate this permission. If this still doesn't work, ensure your service account has appropriate ClusterRole bindings that allow it to create and manage pods in the specified namespace. Happy hadooping

Shelton · ‎03-21-2025

@PriyankaMondal Looking at your error message log, I can see you're experiencing authentication timeouts with ConsumeIMAP and ConsumePOP3 processors when connecting to Microsoft Office 365 services. Possible Blockers Timeout Issue The primary error is "Read timed out" during authentication, which suggests the connection to Office 365 is being established but then timing out during the OAUTH2 handshake. Microsoft 365 Specific Considerations Microsoft has specific requirements for modern authentication with mail services and has been deprecating basic authentication methods. Processor Configuration Using OAUTH2 authentication mode, is correct for Office 365, but there may be issues with the token acquisition or timeout settings. Possible solutions 1. Check timeout settings # Add these properties to your processor configuration mail.imap.connectiontimeout=60000 mail.imap.timeout=60000 mail.pop3.connectiontimeout=60000 .pop3.timeout=60000 2. Verify Modern Authentication settings Ensure the account has Modern Authentication enabled in Microsoft 365 Admin Center Verify the application registration in Azure AD has the correct permissions IMAP.AccessAsUser.All for IMAP POP.AccessAsUser.All for POP3 offline_access scope for refresh tokens 3. Update NiFi Mail Libraries NiFi's default JavaMail implementation might have compatibility issues with Office 365. Try: Updating to the latest version of NiFi (if possible) Or add Microsoft's MSAL (Microsoft Authentication Library) JAR to NiFi's lib directory 4. Use a Custom SSL Context Service Microsoft servers might require specific TLS settings # Create a Standard SSL Context Service with: Protocol: TLS # Add to Advanced Client settings for the processor 5. Alternative Approach: Use Microsoft Graph API Since Microsoft is moving away from direct IMAP/POP3 access, consider: Using InvokeHTTP processor to authenticate against Microsoft Graph API Use the Graph API endpoints to retrieve email content 6. Check Proxy Settings If your environment uses proxies # Add these properties mail.imap.proxy.host=your-proxy-host mail.imap.proxy.port=your-proxy-port mail.pop3.proxy.host=your-proxy-host mail.pop3.proxy.port=your-proxy-port 7. Implementation Steps Update the processor configuration with extended timeout values Verify OAuth2 settings in the processor match exactly with your Azure application registration Check Microsoft 365 account settings to ensure IMAP/POP3 is enabled with Modern Authentication Consider implementing a token debugging flow using InvokeHTTP to validate token acquisition separately Happy hadooping

Shelton · ‎03-20-2025

@RaoNEY The error message suggests that there's a JWT token algorithm mismatch: "An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm exp" This typically happens when: The token you're receiving from Azure in Step 1 uses a signing algorithm that doesn't match what NiFi is expecting NiFi is configured to use RS256 algorithm (as shown in your nifi.properties), but the Azure token might be using a different algorithm Verify token algorithm First, check what algorithm your Azure token is using. You can decode your JWT token using tools like jwt.io to see the header which contains the algorithm (look for the "alg" field). Modify your Azure token request Azure AD OAuth tokens typically use RS256, but you may need to specify this explicitly in your Azure app registration settings. Ensure correct token type For NiFi OAuth/OIDC authentication, you need an ID token, not an access token. In your Step 1, you're requesting a client credentials grant which returns an access token. Instead, you need to: # Modified Step 1 - Use authorization code flow to get ID token curl -X POST https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=authorization_code" \ -d "client_id={ClientID}" \ -d "client_secret={ClientSecret}" \ -d "code={AuthorizationCode}" \ -d "redirect_uri={RedirectURI}" \ -d "scope=openid email profile Update NiFi properties: Ensure these settings match your Azure configuration: # Make sure these settings are correct nifi.security.user.oidc.jwt.algorithm=RS256 nifi.security.user.oidc.preferred.jwsalgorithm=RS256 Check Azure app registration In your Azure portal, verify: Redirect URI is properly set to your NiFi callback URL The app has appropriate API permissions Token configuration includes ID tokens Complete Authentication Flow For NiFi OAuth with Azure AD, the proper flow should be: 1. Initiate login via NiFi UI or using GET https://NIFIDnsName:9444/nifi-api/access/oidc/request 2. This redirects to Microsoft login page, where user authenticates 3. After successful authentication, Azure redirects back to NiFi with an authorization code 4. NiFi exchanges this code for tokens automatically 5. If you're doing this programmatically, use the authorization code flow, not client credentials The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. happy hadooping

TimL · ‎03-18-2025

The issue may be due to NiFi's PutDatabaseProcessor applying a local time zone conversion during data ingestion, causing the 5-hour shift. To fix this, ensure NiFi is explicitly set to handle UTC time zones for both reading and writing. Additionally, using ODBC for PostgreSQL could help by providing better control over time zone handling during the data transfer, ensuring consistency between MSSQL and PostgreSQL.

Shelton · ‎03-17-2025

@AllIsWell Somehow I feel you have some stale data. Before deleting the process group, fetch its current state using the API to confirm the correct version number curl -k -X GET "https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6" \ -H "Authorization: Bearer Token" Look for the revision object in the response. The version field should match what you include in your DELETE request. Update the DELETE Request If the version in the response is not 0, update your DELETE request with the correct version. For example, if the current version is 5, your request should look like this: curl -k -X DELETE "https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer Token" \ --data '{ "revision": { "version": <Value from above> }, "disconnectedNodeAcknowledged": false }' Validate the JSON: Ensure that the JSON payload is valid. You can use tools like JSONLint to validate the structure. Check for Trailing Slashes: Ensure there are no trailing slashes in the URL. For example, use https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6 instead of https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6/. Disconnected Node Acknowledgment: If your NiFi cluster has disconnected nodes, you may need to set disconnectedNodeAcknowledged to true Final Notes: If the issue persists, double-check the API documentation for any changes or additional requirements. Ensure that the Authorization token is valid and has the necessary permissions to delete the process group. If you are using a NiFi version older than 1.12.0, the API behavior might differ slightly, so consult the documentation for your specific version. Happy hadooping

Shelton · ‎03-09-2025

@zeeshanmcs It seem you're having an issue with unavailable tablets in your Kudu table, which is preventing Spark from inserting data. The output from kudu cluster ksck clearly shows the problem: the leader replicas for all tablets in the impala::mrs.NumberofSubs table are on a tablet server that's unavailable. The key issue is that the tablet server with ID 24483fcd36ce45d78d80beb04b3b0cf4 is not running, and it happens to be the leader for all 7 tablets in your table. Here's a solution to resolve this issue: 1. First, check the status of all Kudu tablet servers sudo systemctl status kudu-tserver 2. Look specifically for the tablet server with ID 24483fcd36ce45d78d80beb04b3b0cf4 sudo -u kudu kudu tserver list tstewputil1 3. If the tablet server is down, start it. sudo systemctl start kudu-tserver 4. If the tablet server is running but not responding, restart it sudo systemctl restart kudu-tserver 5. After restarting the tablet server, wait a few minutes for it to rejoin the cluster and for leadership transitions to occur, then check the status again sudo -u kudu kudu cluster ksck tstewputil1 If the tablet server is permanently lost or damaged, you'll need to recover the tablets a. Check if you have enough replicas (you should have at least 3 for production) sudo -u kudu kudu table describe impala::mrs.NumberofSubs tstewputil1 b. If you have other healthy replicas, you can delete the failed server from the cluster and Kudu will automatically recover sudo -u kudu kudu tserver delete tstewputil1 <tablet_server_uuid> c. If this is the only replica and you don't have backups, you may need to: Create a new table with the same schema Load data from your source systems Or restore from a backup if available If, after restarting, you still have issues, the problem might be: Disk space issues on the tablet server Configuration problems Network connectivity problems between servers Check the Kudu tablet server logs for more details less /var/log/kudu/kudu-tserver.log Once the tablet server is back online and healthy, your Spark job should be able to insert data into the table successfully Happy hadooping

Shelton · ‎03-07-2025

@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping

pavanshettyg5 · ‎03-07-2025

Hi @Shelton Thanks so much for detailed information @MattWho thanks much for the reply and apologies for short info. based on above information was able to create SSL certificates and generate Keystore and trustore in jks format . initially i was not configured CA file into truststore so faced some issue 2. then i did not added nifi nodes entries as intial identity in autherizers.xml file so above issue occured . i followed cloudera blogs where you had informed https://community.cloudera.com/t5/Support-Questions/insufficient-permissions-untrusted-proxy/m-p/366443#M239582 based on these i was able to resolve and 3 node cluster with external zookeeper was able to up. i appreciate your kind help and your time here . much thanks to both 🙂

Online	Offline
Last Visited	‎03-23-2025 01:13 AM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎03-23-2025 01:13 AM
Posts	3,670
Kudos received	621

Cloudera Community

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Apache Ranger Admin UI Login Question

Re: NiFi ERR_CONNECTION_REFUSED from other compute...

Re: What is the difference between spark_shuffle &...

Re: Integrating Git to Nifi Registry using SSH Aut...

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Unable to run spark-shell command with k8s as ...

Re: Apache Nifi IMAP/POP3 authentocation Error

Re: NiFi API OAuth authentication issue

Re: Issue with Time Shift During Data Migration fr...

Re: Not able to delete the NiFi existing flow usin...

Re: Kudo table not upserting data

Re: How to connect hive using Python for pytest da...

Re: Securing Nifi with SSL and using OIDC provider...