Member since
07-11-2016
14
Posts
3
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2126 | 08-24-2018 10:53 PM | |
546 | 01-22-2018 11:59 PM | |
627 | 08-11-2017 08:02 PM |
11-19-2018
06:35 PM
Are you using remote metastore? If so then probably you are hitting https://issues.apache.org/jira/browse/HIVE-19316.
... View more
09-10-2018
06:52 PM
What version of hive are you using? Can you share the query along with DDL/DMLs? I would like to see if I am able to reproduce this.
... View more
09-06-2018
10:22 PM
I don't think there is any such configuration. You can use CASE statement to achieve this e.g. SELECT CASE WHEN col is NULL THEN '' ELSE col END from table;
... View more
08-28-2018
06:40 PM
You can use something like following to find the maximum number of characters being used by the column. This will give you the maximum length (number of UTF-8 characters) for the column. select char_length(column_in_question) as len from tvar order by len desc limit 1;
... View more
08-24-2018
10:53 PM
This is a bug (confirmed that this exist on upstream/apache master as well). Please go ahead and open a bug on apache hive (Or let me know and I can do that)
... View more
02-22-2018
06:52 PM
1 Kudo
Try set hive.compute.query.using.stats=false and see if you get correct count using select count(*). If this works that means you have stale statistics and you need to re-run analyze table on all partitions.
... View more
01-22-2018
11:59 PM
This looks like BUG (regression). I am able to observe/reproduce the same behavior on latest hive master. Though I haven't confirmed that it was working in previous version. Feel free to open a JIRA to report this. EDIT: Digged more into the code and found a workaround for this. Use set hive.mv.files.thread=0. This will disable parallel load of directories and LOAD should be able to load all directories by renaming them. This is definitely a bug which needs to be fixed. Please go ahead with the JIRA report if you can. Otherwise let me know and I'll file one.
... View more
01-17-2018
08:19 PM
You can also run DESC <TABLE_NAME> command from CLI or BEELINE.
... View more
01-09-2018
07:42 PM
I don't think there is a way to do it at once for all partitions, best you could is to specify multiple partitions like ALTER TABLE tableName PARTITION(dt=20180109, dt=20180110..) CONCATENATE. Please note that there are known issues with ALTER TABLE CONCATENATE in versions earlier than HDP 2.6 and it is not recommended to run CONCATENATE.
... View more
12-15-2017
05:57 AM
Gunther is right, Hive planner rewrites distinct using group by, so it doesn't matter what do you use from performace point of view.
... View more
08-11-2017
08:02 PM
1 Kudo
These numbers (Num rows, Data size) are estimated by Hive (optimizer) and do not represent actual numbers. You can run EXPLAIN + ANALYZE to see both Estimated and Actual numbers.
... View more
05-11-2017
11:37 PM
1 Kudo
If table statistics are updated you can run DESC FORMATTED <table_name>. This will display row count.
... View more
03-22-2017
06:43 PM
You can also achieve this by using following query: SELECT date_column,
count(*)-count(c1) as null_c1,
count(*)-count(c2) as null_c2,
count(*)-count(c3) as null_c3,
count(*)-count(c4) as null_c4
FROM t1
GROUP BY date_column;
... View more
03-15-2017
11:55 PM
(NULL=NULL) is neither true nor false it will evaluate to NULL/UNKNOWN. (reference https://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/) As pointed by @Murali Ramasami above you should use <=> operator if you would like null=null evaluate to true.
... View more