Support Questions

Find answers, ask questions, and share your expertise

drop the columns with more than 50% missing values

avatar
New Contributor

I have a dataset of columns maker, model, mileage, manufacture_year, engine_displacement, engine_power, body_type, color_slug, skt_year, transmission, door_count, seat_count, fuel_type, date_created, date_last_seen, price_eur

And I need to drop the columns with more than 50% missing values so currently my code is

SELECT

maker, model, mileage, manufacture_year, engine_displacement, engine_power, body_type, color_slug, skt_year, transmission, door_count, seat_count, fuel_type, date_created, date_last_seen, price_eur from used_cars where manufacture_year >=2000

and manufacture_year <= 2017 and price_eur >=3000

and price_eur <=2000000 and maker!='' and model!=''

and price_eur !=1295.34

 

How can I do this within my code?

1 REPLY 1

avatar
Master Collaborator

@Shawn Here is a small example on how to find out what's the percentage of NOT NULL value of column maker:

 

select ((tot_cnt - a.null_maker)/tot_cnt)*100 as pcnt_not_null_maker
from
(select count(maker) as null_maker from used_cars where maker is NULL) a
Left outer join
(select count(*) as tot_cnt from used_cars) b

 

You may try this for all individual columns.