Member since
02-08-2015
386
Posts
11
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1008 | 07-20-2023 02:34 PM | |
9784 | 02-08-2017 01:38 PM | |
5091 | 06-17-2016 02:56 PM |
08-04-2023
08:11 AM
Hi @archer2012 , that error output doesn't give us a lot of information about what went wrong, but it looks like the connection wasn't successful. I recommend using Beeline in verbose mode from a command line on a suitable node to troubleshoot the connection independently. Once you have the connection to Impala working separately, then you can come back to Airflow and use the working connection settings.
... View more
07-20-2023
02:34 PM
1 Kudo
Hi @KienKim ! Impala has a lot of different configuration options, and increasing concurrency is a broad topic to tackle. If you haven't already, I recommend consulting the documentation for the version of CDP you're using. If you get stuck on a particular configuration property, then providing those specifics here would be a good place to start.
... View more
03-27-2023
01:40 PM
Closing the connection while the query is still executing is generally not good practice. Think of it as taking ownership for the query you've executed, sort of like turning the lights off before you leave a room. If you're going to close the session (leave the room), you need to first cancel the query that's consuming resources (turn off the lights you turned on). The normal expectation is that an executing query will have an associated active session.
... View more
03-21-2023
09:54 AM
@hqbhoho , if the query is executing, it probably makes sense to cancel it before you try to call close.
... View more
10-31-2022
12:41 PM
Hi @Emiller ! One thing I notice right away about the format string you're trying to use is that "month" works with a full month name rather than an abbreviation. That is, you have "JAN" in your source data, but the "month" format string would work with "January". I suspect you'll have more luck with something like "DDmonRR". I recommend consulting the documentation on date casts if you need additional help.
... View more
06-28-2022
09:34 AM
Hi @data_diver. To start with, in CDP Public Cloud, you write all your data to the cloud storage service for your platform (such as S3 or ADLS). After doing that, you can read it from a data hub cluster. Regarding your question about writing a DataFrame from Python, I want to start by clarifying a couple of points. You want to write a DataFrame, which is a Spark object, from Python, but without using PySpark, which is the framework that allows Python to interact with Spark objects such as DataFrames. Is all that correct? Perhaps you can start by giving us a bit of context. Why do you want to write a DataFrame without using PySpark? How will the DataFrame object exist in your Python program without PySpark in the first place? Any context you can provide for your use case would be helpful.
... View more
02-16-2017
01:12 PM
Thanks for asking about this. The max message size for Hive Metastore should be set to 10% of the Metastore server heap size, up to a maximum of 2,147,483,647 bytes. Unfortunately, the values used or displayed by that configuration validator may be incorrect in some cases. Until that's fixed, I recommend checking the actual HMS heap size and configuring the max message size accordingly.
... View more
02-08-2017
01:38 PM
1 Kudo
Here's what the most recent version of the CDH Hive documentation says about this:
http://www.cloudera.com/documentation/enterprise/latest/topics/hive.html#hive_transaction_support
"Transaction (ACID) Support in Hive
The CDH distribution of Hive does not support transactions (HIVE-5317). Currently, transaction support in Hive is an experimental feature that only works with the ORC file format. Cloudera recommends using the Parquet file format, which works across many tools. Merge updates in Hive tables using existing functionality, including statements such as INSERT, INSERT OVERWRITE, and CREATE TABLE AS SELECT."
... View more
06-20-2016
06:59 AM
The bit was an attempt to format the post. It wasn't supposed to be part of the query, sorry about that. Try adding "as" after "create table tester" and before the nested select query.
... View more
06-17-2016
02:56 PM
2 Kudos
Hi Lucille, For the example you provided, you could get the file names with a query like this: SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME
FROM
hive_magnum; It will actually provide the full HDFS location, which includes the file name. I hope this is helpful.
... View more