Member since
01-19-2017
3674
Posts
631
Kudos Received
370
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
257 | 03-23-2025 05:23 AM | |
627 | 03-05-2025 01:34 PM | |
476 | 03-03-2025 01:09 PM | |
521 | 03-02-2025 07:19 AM | |
803 | 12-22-2024 07:33 AM |
04-28-2025
05:43 AM
1 Kudo
@nifier Thats good to hear. Now the onus is on you to share the provenance setup that helped you resolve your problem. It's priceless to share such information to grow our documentation base. If you do a good detailed write up then the moderators could help integrate that to the official Cloudera knowledgebase. Happy hadooping
... View more
04-02-2025
07:09 AM
1 Kudo
@shubham_sharma my bad its hbase.mob.file.expired.period and not hbase.mob.file.expired Happy hadooping
... View more
04-01-2025
05:17 AM
@shubham_sharma The hbase.mob.file.expired property in HBase is found in the hbase-site.xml configuration file. This property is related to HBase's MOB (Medium-sized Objects) feature, which is designed to efficiently store objects that are larger than typical HBase cells but not large enough to warrant HDFS storage. <property> <name>hbase.mob.file.expired</name> <value>30</value> <description>The number of days to keep a mob file before deleting it. Default value is 30 days.</description> </property> Happy hadooping
... View more
03-28-2025
04:23 AM
1 Kudo
@allen_chu As you correctly suspected the error indicates that HBase is trying to access a MOB (Medium Object) file in HDFS that no longer exists at the expected location hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4 Potential Root Causes: The MOB file was manually deleted from HDFS HBase's MOB cleanup process didn't properly update metadata The HDFS path was changed without proper HBase configuration updates Corrupted HBase metadata Incomplete data migration Filesystem inconsistency Incomplete compaction or archiving process 1. Verify HDFS Integrity # Check HDFS file system health hdfs fsck /apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW 2. HBase Data Consistency Check # Start HBase shell hbase shell # Major compact the table to rebuild metadata major_compact 'FPCA_ITEMS_TY_NEW' # Verify table status status 'detailed' 3. Immediate Recovery Options Option A: Repair the table hbase hbck -j <path_to_hbase_classpath> -repair FPCA_ITEMS_TY_NEW 4. Advanced Recovery Options # If previous methods fail, consider: # a) Snapshot and restore hbase shell snapshot 'FPCA_ITEMS_TY_NEW', 'FPCA_ITEMS_TY_NEW_SNAPSHOT' # b) Clone the snapshot clone_snapshot 'FPCA_ITEMS_TY_NEW_SNAPSHOT', 'FPCA_ITEMS_TY_NEW_RECOVERED' Option B: Disable and re-enable the table disable 'FPCA_ITEMS_TY_NEW' enable 'FPCA_ITEMS_TY_NEW' Option C: Run MOB compaction hbase org.apache.hadoop.hbase.mob.mapreduce.Sweeper FPCA_ITEMS_TY_NEW cf Replace the 'cf' with your actual column family name Option 😧 If data is not critical, you can reset MOB references alter 'FPCA_ITEMS_TY_NEW', {NAME => 'cf', MOB_COMPACT_PARTITION_POLICY => 'daily'} major_compact 'FPCA_ITEMS_TY_NEW' 3. Preventive Measures <property> <name>hbase.mob.file.expired.period</name> <value>86400</value> <!-- 1 day in seconds --> </property> Additional Checks 1.Verify HDFS permissions for the HBase user 2.Check HDFS health (namenode logs, datanode availability) 3.Review HBase MOB configuration for the table describe 'FPCA_ITEMS_TY_NEW' With the above steps you should be able to resolve your hbase issue Happy hadooping
... View more
03-23-2025
05:23 AM
2 Kudos
@shiva239 You've set up GitLabFlowRegistry in NiFi 2.0's controller settings and can import flows from GitLab and commit changes back through the UI your existing CI/CD pipeline uses NiFi toolkit commands. As of NiFi 2.0, there aren't direct toolkit commands specifically designed for the new GitLabFlowRegistry feature someone should correct me here. However, you have a few options to achieve what you need. Option 1: Use the NiFi REST API The GitLabFlowRegistry operations that you perform through the UI are backed by REST API endpoints. You can integrate these REST API calls into your CI/CD pipeline # Example REST API call to import/sync from Git repository curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt Option 2: Hybrid Approach Use a combination of existing toolkit commands and REST API calls: Use toolkit commands for basic operations (deployment to a target NiFi instance) Use REST API calls for GitLabFlowRegistry-specific operations Option 3: Create Custom Scripts You could create wrapper scripts that combine the toolkit functionality with the necessary REST API calls for GitLabFlowRegistry operations: #!/bin/bash # Custom script to sync from GitLabFlowRegistry and deploy # First pull from Git registry via REST API curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt # Then use toolkit commands for additional operations as needed ./bin/cli.sh nifi pg-import ... I recommend adopting Option 3 (custom scripts) for your CI/CD pipeline. This approach: Takes advantage of the new GitLabFlowRegistry feature Maintains compatibility with your existing toolkit-based CI/CD pipeline Provides flexibility to customize the process based on your specific requirements For implementing the REST API calls, consult the NiFi API documentation for the full set of endpoints related to the new GitLabFlowRegistry functionality. The NiFi Admin Guide for version 2.0 should also have details on these new REST endpoints. Happy hadooping
... View more
03-23-2025
04:42 AM
@spserd Looking at your issue with Spark on Kubernetes, I see a clear difference between the client and cluster deployment modes that's causing the "system" authentication problem. the issue is when running in client mode with spark-shell, you're encountering an authorization issue where Spark is trying to create executor pods as "system" instead of using your service account "spark-sa", despite providing the token. Possible Solution For client mode, you need to add a specific configuration to tell Spark to use the token for executor pod creation --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa So your updated command should look like this ./bin/spark-shell \ --master k8s://https://my-k8s-cluster:6443 \ --deploy-mode client \ --name spark-shell-poc \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \ --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ --conf spark.kubernetes.namespace=dynx-center-resources \ --conf spark.driver.pod.name=dynx-spark-driver \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN The key is that in client mode, you need to explicitly configure the executor authentication because the driver is running outside the cluster and needs to delegate this permission. If this still doesn't work, ensure your service account has appropriate ClusterRole bindings that allow it to create and manage pods in the specified namespace. Happy hadooping
... View more
03-21-2025
12:16 PM
@PriyankaMondal Looking at your error message log, I can see you're experiencing authentication timeouts with ConsumeIMAP and ConsumePOP3 processors when connecting to Microsoft Office 365 services. Possible Blockers Timeout Issue The primary error is "Read timed out" during authentication, which suggests the connection to Office 365 is being established but then timing out during the OAUTH2 handshake. Microsoft 365 Specific Considerations Microsoft has specific requirements for modern authentication with mail services and has been deprecating basic authentication methods. Processor Configuration Using OAUTH2 authentication mode, is correct for Office 365, but there may be issues with the token acquisition or timeout settings. Possible solutions 1. Check timeout settings # Add these properties to your processor configuration mail.imap.connectiontimeout=60000 mail.imap.timeout=60000 mail.pop3.connectiontimeout=60000 .pop3.timeout=60000 2. Verify Modern Authentication settings Ensure the account has Modern Authentication enabled in Microsoft 365 Admin Center Verify the application registration in Azure AD has the correct permissions IMAP.AccessAsUser.All for IMAP POP.AccessAsUser.All for POP3 offline_access scope for refresh tokens 3. Update NiFi Mail Libraries NiFi's default JavaMail implementation might have compatibility issues with Office 365. Try: Updating to the latest version of NiFi (if possible) Or add Microsoft's MSAL (Microsoft Authentication Library) JAR to NiFi's lib directory 4. Use a Custom SSL Context Service Microsoft servers might require specific TLS settings # Create a Standard SSL Context Service with: Protocol: TLS # Add to Advanced Client settings for the processor 5. Alternative Approach: Use Microsoft Graph API Since Microsoft is moving away from direct IMAP/POP3 access, consider: Using InvokeHTTP processor to authenticate against Microsoft Graph API Use the Graph API endpoints to retrieve email content 6. Check Proxy Settings If your environment uses proxies # Add these properties mail.imap.proxy.host=your-proxy-host mail.imap.proxy.port=your-proxy-port mail.pop3.proxy.host=your-proxy-host mail.pop3.proxy.port=your-proxy-port 7. Implementation Steps Update the processor configuration with extended timeout values Verify OAuth2 settings in the processor match exactly with your Azure application registration Check Microsoft 365 account settings to ensure IMAP/POP3 is enabled with Modern Authentication Consider implementing a token debugging flow using InvokeHTTP to validate token acquisition separately Happy hadooping
... View more
03-20-2025
01:47 PM
@RaoNEY The error message suggests that there's a JWT token algorithm mismatch: "An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm exp" This typically happens when: The token you're receiving from Azure in Step 1 uses a signing algorithm that doesn't match what NiFi is expecting NiFi is configured to use RS256 algorithm (as shown in your nifi.properties), but the Azure token might be using a different algorithm Verify token algorithm First, check what algorithm your Azure token is using. You can decode your JWT token using tools like jwt.io to see the header which contains the algorithm (look for the "alg" field). Modify your Azure token request Azure AD OAuth tokens typically use RS256, but you may need to specify this explicitly in your Azure app registration settings. Ensure correct token type For NiFi OAuth/OIDC authentication, you need an ID token, not an access token. In your Step 1, you're requesting a client credentials grant which returns an access token. Instead, you need to: # Modified Step 1 - Use authorization code flow to get ID token curl -X POST https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=authorization_code" \ -d "client_id={ClientID}" \ -d "client_secret={ClientSecret}" \ -d "code={AuthorizationCode}" \ -d "redirect_uri={RedirectURI}" \ -d "scope=openid email profile Update NiFi properties: Ensure these settings match your Azure configuration: # Make sure these settings are correct nifi.security.user.oidc.jwt.algorithm=RS256 nifi.security.user.oidc.preferred.jwsalgorithm=RS256 Check Azure app registration In your Azure portal, verify: Redirect URI is properly set to your NiFi callback URL The app has appropriate API permissions Token configuration includes ID tokens Complete Authentication Flow For NiFi OAuth with Azure AD, the proper flow should be: 1. Initiate login via NiFi UI or using GET https://NIFIDnsName:9444/nifi-api/access/oidc/request 2. This redirects to Microsoft login page, where user authenticates 3. After successful authentication, Azure redirects back to NiFi with an authorization code 4. NiFi exchanges this code for tokens automatically 5. If you're doing this programmatically, use the authorization code flow, not client credentials The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. happy hadooping
... View more
03-17-2025
10:18 AM
@AllIsWell Somehow I feel you have some stale data. Before deleting the process group, fetch its current state using the API to confirm the correct version number curl -k -X GET "https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6" \ -H "Authorization: Bearer Token" Look for the revision object in the response. The version field should match what you include in your DELETE request. Update the DELETE Request If the version in the response is not 0, update your DELETE request with the correct version. For example, if the current version is 5, your request should look like this: curl -k -X DELETE "https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer Token" \ --data '{ "revision": { "version": <Value from above> }, "disconnectedNodeAcknowledged": false }' Validate the JSON: Ensure that the JSON payload is valid. You can use tools like JSONLint to validate the structure. Check for Trailing Slashes: Ensure there are no trailing slashes in the URL. For example, use https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6 instead of https://localhost:28443/nifi-api/process-groups/836e216e-0195-1000-d3b8-771b257f1fe6/. Disconnected Node Acknowledgment: If your NiFi cluster has disconnected nodes, you may need to set disconnectedNodeAcknowledged to true Final Notes: If the issue persists, double-check the API documentation for any changes or additional requirements. Ensure that the Authorization token is valid and has the necessary permissions to delete the process group. If you are using a NiFi version older than 1.12.0, the API behavior might differ slightly, so consult the documentation for your specific version. Happy hadooping
... View more
03-11-2025
04:16 PM
@Ytch Can you try this solution that has been accepted in this forum MSSQL to PostgreSQL happy hadooping
... View more