Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Select * from and & Select count (*) different result

avatar
Rising Star

Hi Team,

Iam getting different result while executing below command

select count (*) --output 1028

&

Select * from --output 500 ( which is correct)

What could be the issue

Regards

suresh bk

11 REPLIES 11

avatar
Expert Contributor
Run analyze table <table name> compute statistics;

avatar
Rising Star

Error: Failed semanticexception, table is partitioned and partition specific is needed

avatar
Expert Contributor

Here is the link that has syntax to compute statisitcs on hive partitioned table.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_dataintegration/content/cost-based-opt.h...

ANALYZE TABLE employees PARTITION (dt) COMPUTE STATISTICS

avatar
Expert Contributor

If this fixed your issue, can you accept this as Answer. It would help others in community.

avatar
Rising Star

Hi Kashif,

I tried with partition details. But i did not get any output . I just got like 0 row affected. Is their anything i need to do . please help

avatar
Explorer

Suresh,

do something like below.

1) select partition_column, count(*) from table; -- look for count of records by partition.

2) select * from table ; - look if this query is reading data from all partitions or only from a few.

This should atleast point you to find what is actually happening.

avatar

@suresh krish

Here issue with cache in partitions, Clear your partitions and MSCK to repair the table.

dropping partitions:

alter table dbschemaname.tableanme DROP partition (source <> '');

Run MSCK for table repair,

MSCK REPAIR TABLE <tablename>;

run count command.

Basically when you run select script, your request directly goes to hdfs file to get the requested information, here request wouldn't run mr job.

But when you run count(*) it will trigger mr job to pull requested info.

avatar
Explorer

Hello Divakar,

I don't fully understand why we need to drop partitions ? as you suggested, MSCK command would be good enough to update the hive metastore with latest partitions info. Can you help me understand the need for dropping the partitions ?

avatar

yes, MSCK will create partitions again.

it's good practice dropping partitions as well along with files before going to re-load the data.

I believe that few older partitions are causing an issue.

let me know if it's fix your issue while following above mentioned steps