About vgarg

vgarg · ‎09-06-2018

I don't think there is any such configuration. You can use CASE statement to achieve this e.g. SELECT CASE WHEN col is NULL THEN '' ELSE col END from table;

vgarg · ‎08-24-2018

This is a bug (confirmed that this exist on upstream/apache master as well). Please go ahead and open a bug on apache hive (Or let me know and I can do that)

vgarg · ‎01-22-2018

This looks like BUG (regression). I am able to observe/reproduce the same behavior on latest hive master. Though I haven't confirmed that it was working in previous version. Feel free to open a JIRA to report this. EDIT: Digged more into the code and found a workaround for this. Use set hive.mv.files.thread=0. This will disable parallel load of directories and LOAD should be able to load all directories by renaming them. This is definitely a bug which needs to be fixed. Please go ahead with the JIRA report if you can. Otherwise let me know and I'll file one.

vgarg · ‎01-17-2018

You can also run DESC <TABLE_NAME> command from CLI or BEELINE.

vgarg · ‎01-09-2018

I don't think there is a way to do it at once for all partitions, best you could is to specify multiple partitions like ALTER TABLE tableName PARTITION(dt=20180109, dt=20180110..) CONCATENATE. Please note that there are known issues with ALTER TABLE CONCATENATE in versions earlier than HDP 2.6 and it is not recommended to run CONCATENATE.

vgarg · ‎12-15-2017

Gunther is right, Hive planner rewrites distinct using group by, so it doesn't matter what do you use from performace point of view.

vgarg · ‎08-11-2017

These numbers (Num rows, Data size) are estimated by Hive (optimizer) and do not represent actual numbers. You can run EXPLAIN + ANALYZE to see both Estimated and Actual numbers.

vgarg · ‎05-11-2017

If table statistics are updated you can run DESC FORMATTED <table_name>. This will display row count.

vgarg · ‎03-22-2017

You can also achieve this by using following query: SELECT date_column, count(*)-count(c1) as null_c1, count(*)-count(c2) as null_c2, count(*)-count(c3) as null_c3, count(*)-count(c4) as null_c4 FROM t1 GROUP BY date_column;

vgarg · ‎03-15-2017

(NULL=NULL) is neither true nor false it will evaluate to NULL/UNKNOWN. (reference https://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/) As pointed by @Murali Ramasami above you should use <=> operator if you would like null=null evaluate to true.

Online	Offline
Last Visited	‎02-26-2020 10:55 PM

Member Since	‎07-11-2016 08:07 PM
Last Visited	‎02-26-2020 10:55 PM
Posts	14
Kudos received	3

Cloudera Community

Re: HDP 3 Hive: create view if not exists fails

Re: "Load data into table" behavior is different b...

Re: Hive Explain Plan Predicate Question

Re: Hive set Null value to Empty string

Re: HDP 3 Hive: create view if not exists fails

Re: "Load data into table" behavior is different b...

Re: Best way for checking hive table metadata at o...

Re: orc small files Concatenate in Hive

Re: distinct vs group by

Re: Hive Explain Plan Predicate Question

Re: table row count from hive metastore

Re: HIVE : counting null values based on group by

Re: Hive Null Timestamp comparison not works prope...