Created 07-09-2025 02:35 AM
id | data_dt | len |
1 | 2021 | 4 |
2 | 2022 | 4 |
3 | __HIVE_DEFAULT_PARTITION__ | 26 |
--4.wired results with length(partition )
select * from aaa where length(data_dt)=26 ; -- return no data WHY ?????😂
select * from (select *, length(data_dt) len from aaa )t where len=26 ; -- return no data WHY ?????🙀
select * from aaa where data_dt ='__HIVE_DEFAULT_PARTITION__' ; --return 1 record
Created 07-09-2025 08:29 PM
@lingloong Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive experts @cravani @james_jones @ggangadharan who may be able to assist you further
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 07-10-2025 09:21 AM
In Hive, each partition corresponds to a physical directory on the file system. Because an empty string ('') or a NULL value cannot be used as a directory name, Hive substitutes it with the default value from its configuration.
The setting hive.exec.default.partition.name provides this value, which is __HIVE_DEFAULT_PARTITION__.
When you query with a WHERE clause on a partitioned column, Hive performs partition pruning by filtering the directory names before it ever reads the data files. Therefore, to find data in the default partition, you must filter on the literal string name that Hive assigned to the directory.
This will work:
SELECT * FROM aaa WHERE data_dt = '__HIVE_DEFAULT_PARTITION__';
This won't work:
SELECT * FROM aaa WHERE data_dt IS NULL;
— This fails because the column's value is the string '__HIVE_DEFAULT_PARTITION__', not a true NULL.
Created 07-10-2025 09:31 AM
Also, if you use Iceberg table format instead of default Hive format it will act more intuitively as a NULL without the placeholder partition name "__HIVE_DEFAULT_PARTITION__".
So with Iceberg you can just do this:
SELECT * FROM aaa where data_dt IS NULL;
Depending on the environment/version you're running in, you can use iceberg this in any or all of: Impala, Spark3 and Hive.
Created 07-15-2025 12:19 AM
In Hive , __HIVE_DEFAULT_PARTITION__ is a special value used internally to represent:
NULL or EMPTY STRINGS in partition column values .
Since it's just a string literal in the metadata , you can access it just like any other string in a query.
Example :
SELECT * FROM your_table WHERE data_dt = '__HIVE_DEFAULT_PARTITION__';
We cannot do WHERE data_dt IS NULL — because NULLs are replaced with __HIVE_DEFAULT_PARTITION__ before being written
If you do WHERE length(data_dt) = 26 — it won’t match anything, because partition columns behave differently.
Created on 07-24-2025 05:01 AM - edited 07-24-2025 05:04 AM
SELECT * FROM your_table WHERE data_dt = '__HIVE_DEFAULT_PARTITION__';
ok thank's for your reply ,i understand this example
But my question is, in my Third SQL statement, using length() function in the select statement is effective and the result is 26, while in the first and second SQL , why can't this record be filtered by length in the where statement when I use the length function on the partition key?
Thank You
Created 07-15-2025 07:13 PM
@lingloong Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,