About DianaTorres

DianaTorres · ‎03-12-2025

@rosejo Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

kavyashree · ‎03-11-2025

Hi, I have a similar issue. When I use "Advanced" to check what output nifi would provide, I get the output there, but when I check the queue after the JoltTransformJSON has processed the output, it says null. I changed "Jolt Transform DSL" from "chain" to "default", as it eliminated the space that was there between the square brackets and the array elements in an array. This is my Jolt specification - [ { "operation": "shift", "spec": { "uid": "uid", "location_name": "name", "location_city": "city", "age_min": "age_min", "age_max": "age_max", "firstdate_begin": "begin", "lastdate_end": "end", "location_coordinates": { "lon": "location[0]", "lat": "location[1]" } } } ] and this is the expected output - { "uid": "54570223", "name": "Théâtre Plaza", "city": "Montréal", "age_min": null, "age_max": null, "begin": "2024-03-23T23:30:00+00:00", "end": "2024-03-24T01:00:00+00:00", "location": [ -73.603196, 45.536315 ] } EDIT - I edited my jolt specification, and it looks like this now- [ { "operation": "shift", "spec": { "*": { "uid": "uid", "location_name": "location_name", "location_city": "location_city", "age_min": "age_min", "age_max": "age_max", "firstdate_begin": "firstdate_begin", "lastdate_end": "lastdate_end", "location_coordinates": { "lon": "location[0]", "lat": "location[1]" } } } } ] I am getting the desired output - { "uid": "70029814", "location_name": "Sanctuaire du Saint-Sacrement", "location_city": "Montreal", "age_min": null, "age_max": null, "firstdate_begin": "2019-04-13T18:00:00+00:00", "lastdate_end": "2019-04-14T10:00:00+00:00", "location": [-73.581721, 45.525243] } But the issue now is that after processing, the output looks something like this - { "uid" : [ "54570223" ], "location_name" : [ "Théâtre Plaza" ], "location_city" : [ "Montréal" ], "age_min" : [ 16 ], "age_max" : [ 99 ], "firstdate_begin" : [ "2024-03-23T23:30:00+00:00" ], "lastdate_end" : [ "2024-03-24T01:00:00+00:00" ], "location" : [ [ -73.603196 ], [ 45.536315 ] ] } Working on keeping only the location as arrays. 😅😅😅

Meenambigai · ‎03-10-2025

Try query below and let us know if it solves the problem and any other issues you are facing related select id, st.value street, ph.value phone from table, table.result st, table.result ph where st.key=’street’ and st.value='abc' and ph.key=’phone’ and ph.value='123'

Shelton · ‎03-07-2025

@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping

Sushant_spk · ‎03-04-2025

Thank you @abdulpasithali @DianaTorres for your response. I really appreciate it! I'll take some time to review the details, and hopefully, I won’t have more questions. 😉 Thanks again for your help!

DianaTorres · ‎03-03-2025

Here are some highlights from the month of January Coming on March 13th: The Power of Metadata: Build an interoperable data estate you can control and trust. Register Now! Check out the FY25 Cloudera Meetup Events Calendar for upcoming & past event details! 713 new support questions 8 new community articles 731 new members Community Article Author Components/ Labels Fully Private Agents with Cloudera's AI Inference Service and CrewAI @shresh Cloudera Data Science and Engineering Cloudera Machine Learning (CML) Iceberg WAP – Failsafe ETL with Iceberg and CDE @wdyson Apache Iceberg Cloudera Data Engineering (CDE) Cloudera Data Warehouse (CDW) Getting Started with Spark GraphFrames in Cloudera AI Workbench @pauldefusco Apache Spark Cloudera Data Platform (CDP) Cloudera Machine Learning (CML) JupyterLab and Spark Connect Quickstart in Cloudera Data Engineering Apache Spark Cloudera Data Engineering (CDE) PyCharm and Spark Connect Quickstart in Cloudera Data Engineering Apache Spark Cloudera Data Engineering (CDE) We would like to recognize the below community members and employees for their efforts over the last month to provide community solutions. See all our top participants at Top Solution Authors leaderboard and all the other leaderboards on our Leaderboards and Badges page. @MattWho @venkatsambath @vaishaakb @vikas @james_jones @Seaport @raph @amd0629 @wkwi @tuyen123 Share your expertise and answer some of the below open questions. Also, be sure to bookmark the unanswered question page to find additional open questions. Unanswered Community Post Components/ Labels External zookeeper and nifi cluster connection issue Apache NiFi Apache Zookeeper Parsing nested JSOn in HIVE Apache Hive Unable to run spark-shell command with k8s as master Apache Spark Apache Nifi IMAP/POP3 authentication Error Apache NiFi GenerateFlowFile "Failed to properly initialize Processor. If still scheduled to run" Apache NiFi

euklidas · ‎03-03-2025

Writing the output back into the database results in the same error. However, I checked the physical and logical plan of these operations and noticed that Spark does a "relation" operation for the second table that is over 900GB, reading all the columns within it instead of choosing the subset that is in the query. Thus, I translated the whole code into SQL and returned the table in dataframe format perfectly... Perhaps you have an idea why doesn't Spark push down the filtering and column prunning operations?

DianaTorres · ‎02-28-2025

@ajay32 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

DianaTorres · ‎02-27-2025

@JohnCarter Welcome to the Cloudera Community! To help you get the best possible solution, I have tagged our NiFi experts @SAMSAL @MattWho who may be able to assist you further. Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

DianaTorres · ‎02-25-2025

@pablobhz Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Online	Online
Last Visited	‎01-29-2026 09:38 AM

Member Since	‎11-17-2021 08:08 AM
Last Visited	‎01-29-2026 09:38 AM
Posts	1,127
Kudos received	236

Cloudera Community

Re: Error connecting to NiFi Registry from NiFi UI...

Re: How to change my Account Email Address?

Re: Cannot erase old /opt/cloudera/parcels

Re: Issues Install and Unistall Apache Minifi - Wi...

Re: How to change my company name

Re: Exception IllegalStateException: nested jar UR...

Re: getting null from JoltTransformJson in nifi wh...

Re: Impala multiple Key values in where clause wit...

Re: How to connect hive using Python for pytest da...

Re: Clarification on Cloudera Navigator Availabili...

February 2025 Community Highlights

Re: Keep getting "ConnectionRefused" or "OOM" erro...

Re: How to Track and Display Integration Execution...

Re: CI/CD-style automation solution for NiFi flow ...

Re: [28000][unixODBC][Cloudera][ThriftExtension] (...