New Contributor
Posts: 1
Registered: ‎03-13-2018

stddev_samp when dealing with combination of NULLS and values giving NaN's

Hi All,

i have a query which does several aggregations on different columns in a 24 rollup job, but when it does stddev_samp on decimal columns which contain both nulls and values it is giving out NaN values. Is there anyway we could ignore the NULL's and do a std deviation on only the numbers? 


for eg: 

column1 has 24 values(from 24 different partitions, rollup). Of the 24 values if 20 has decimal values and 4 have NULL's, is there a way to consider only those 20 values and so a standard deviation on them instead of 24?


i mean if i need to calculate mean for this 20 values, it would be (sum of 20 values)/20


how could i achieve this in hive?


Thanks in advance.