Member since
02-08-2015
460
Posts
11
Kudos Received
3
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2567 | 07-20-2023 02:34 PM |
08-04-2023
08:11 AM
Hi @archer2012 , that error output doesn't give us a lot of information about what went wrong, but it looks like the connection wasn't successful. I recommend using Beeline in verbose mode from a command line on a suitable node to troubleshoot the connection independently. Once you have the connection to Impala working separately, then you can come back to Airflow and use the working connection settings.
... View more
07-20-2023
02:34 PM
1 Kudo
Hi @KienKim ! Impala has a lot of different configuration options, and increasing concurrency is a broad topic to tackle. If you haven't already, I recommend consulting the documentation for the version of CDP you're using. If you get stuck on a particular configuration property, then providing those specifics here would be a good place to start.
... View more
03-27-2023
01:40 PM
Closing the connection while the query is still executing is generally not good practice. Think of it as taking ownership for the query you've executed, sort of like turning the lights off before you leave a room. If you're going to close the session (leave the room), you need to first cancel the query that's consuming resources (turn off the lights you turned on). The normal expectation is that an executing query will have an associated active session.
... View more
03-21-2023
09:54 AM
@hqbhoho , if the query is executing, it probably makes sense to cancel it before you try to call close.
... View more
06-28-2022
09:34 AM
Hi @data_diver. To start with, in CDP Public Cloud, you write all your data to the cloud storage service for your platform (such as S3 or ADLS). After doing that, you can read it from a data hub cluster. Regarding your question about writing a DataFrame from Python, I want to start by clarifying a couple of points. You want to write a DataFrame, which is a Spark object, from Python, but without using PySpark, which is the framework that allows Python to interact with Spark objects such as DataFrames. Is all that correct? Perhaps you can start by giving us a bit of context. Why do you want to write a DataFrame without using PySpark? How will the DataFrame object exist in your Python program without PySpark in the first place? Any context you can provide for your use case would be helpful.
... View more