Member since
11-17-2021
1116
Posts
253
Kudos Received
28
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 224 | 10-16-2025 02:45 PM | |
| 473 | 10-06-2025 01:01 PM | |
| 441 | 09-24-2025 01:51 PM | |
| 399 | 08-04-2025 04:17 PM | |
| 480 | 06-03-2025 11:02 AM |
03-18-2025
11:22 AM
@Scorpy257 As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
... View more
03-17-2025
11:47 AM
@Rafiy Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
03-17-2025
11:45 AM
@NaveenSagar Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
03-17-2025
06:24 AM
@BuffaloDaddy What method of user authentication is configured in your NiFi? Single-user ldap-provider Kerberos-provider etc? Thank you, Matt
... View more
03-12-2025
09:12 AM
@rosejo Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
03-11-2025
07:59 AM
Hi, I have a similar issue. When I use "Advanced" to check what output nifi would provide, I get the output there, but when I check the queue after the JoltTransformJSON has processed the output, it says null. I changed "Jolt Transform DSL" from "chain" to "default", as it eliminated the space that was there between the square brackets and the array elements in an array. This is my Jolt specification - [ { "operation": "shift", "spec": { "uid": "uid", "location_name": "name", "location_city": "city", "age_min": "age_min", "age_max": "age_max", "firstdate_begin": "begin", "lastdate_end": "end", "location_coordinates": { "lon": "location[0]", "lat": "location[1]" } } } ] and this is the expected output - { "uid": "54570223", "name": "Théâtre Plaza", "city": "Montréal", "age_min": null, "age_max": null, "begin": "2024-03-23T23:30:00+00:00", "end": "2024-03-24T01:00:00+00:00", "location": [ -73.603196, 45.536315 ] } EDIT - I edited my jolt specification, and it looks like this now- [ { "operation": "shift", "spec": { "*": { "uid": "uid", "location_name": "location_name", "location_city": "location_city", "age_min": "age_min", "age_max": "age_max", "firstdate_begin": "firstdate_begin", "lastdate_end": "lastdate_end", "location_coordinates": { "lon": "location[0]", "lat": "location[1]" } } } } ] I am getting the desired output - { "uid": "70029814", "location_name": "Sanctuaire du Saint-Sacrement", "location_city": "Montreal", "age_min": null, "age_max": null, "firstdate_begin": "2019-04-13T18:00:00+00:00", "lastdate_end": "2019-04-14T10:00:00+00:00", "location": [-73.581721, 45.525243] } But the issue now is that after processing, the output looks something like this - { "uid" : [ "54570223" ], "location_name" : [ "Théâtre Plaza" ], "location_city" : [ "Montréal" ], "age_min" : [ 16 ], "age_max" : [ 99 ], "firstdate_begin" : [ "2024-03-23T23:30:00+00:00" ], "lastdate_end" : [ "2024-03-24T01:00:00+00:00" ], "location" : [ [ -73.603196 ], [ 45.536315 ] ] } Working on keeping only the location as arrays. 😅😅😅
... View more
03-10-2025
05:27 AM
Try query below and let us know if it solves the problem and any other issues you are facing related select id, st.value street, ph.value phone
from table, table.result st, table.result ph
where st.key=’street’ and st.value='abc' and ph.key=’phone’ and ph.value='123'
... View more
03-07-2025
03:24 PM
@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping
... View more
03-04-2025
02:34 AM
Thank you @abdulpasithali @DianaTorres for your response. I really appreciate it! I'll take some time to review the details, and hopefully, I won’t have more questions. 😉 Thanks again for your help!
... View more
03-03-2025
03:07 PM
Here are some highlights from the month of January
Coming on March 13th: The Power of Metadata: Build an interoperable data estate you can control and trust. Register Now!
Check out the FY25 Cloudera Meetup Events Calendar for upcoming & past event details!
713 new support questions
8 new community articles
731 new members
Community Article
Author
Components/ Labels
Fully Private Agents with Cloudera's AI Inference Service and CrewAI
@shresh
Cloudera Data Science and Engineering Cloudera Machine Learning (CML)
Iceberg WAP – Failsafe ETL with Iceberg and CDE
@wdyson
Apache Iceberg Cloudera Data Engineering (CDE) Cloudera Data Warehouse (CDW)
Getting Started with Spark GraphFrames in Cloudera AI Workbench
@pauldefusco
Apache Spark Cloudera Data Platform (CDP) Cloudera Machine Learning (CML)
JupyterLab and Spark Connect Quickstart in Cloudera Data Engineering
Apache Spark Cloudera Data Engineering (CDE)
PyCharm and Spark Connect Quickstart in Cloudera Data Engineering
Apache Spark Cloudera Data Engineering (CDE)
We would like to recognize the below community members and employees for their efforts over the last month to provide community solutions.
See all our top participants at Top Solution Authors leaderboard and all the other leaderboards on our Leaderboards and Badges page.
@MattWho @venkatsambath @vaishaakb @vikas @james_jones @Seaport @raph @amd0629 @wkwi @tuyen123
Share your expertise and answer some of the below open questions. Also, be sure to bookmark the unanswered question page to find additional open questions.
Unanswered Community Post
Components/ Labels
External zookeeper and nifi cluster connection issue
Apache NiFi Apache Zookeeper
Parsing nested JSOn in HIVE
Apache Hive
Unable to run spark-shell command with k8s as master
Apache Spark
Apache Nifi IMAP/POP3 authentication Error
Apache NiFi
GenerateFlowFile "Failed to properly initialize Processor. If still scheduled to run"
Apache NiFi
... View more