Created 08-11-2016 10:02 AM
Value differs when I check the count(*) using hive query and when I check the no of lines in the corresponding file. Seelect count(*) from table gives 100 as output where as wc -l of the corresponding file gives 30. My table is in ORC format. Is it how orc behaves?
Created 08-11-2016 10:06 AM
Yes, the difference in count is expected. ORC converts the table data into groups of rows called stripes along with auxiliary information in a file footer, default size of stripe is 250 MB. Hence, there will be difference in wc -l on orc file compared to actual numbers of rows in the table.
Created 08-11-2016 10:06 AM
Yes, the difference in count is expected. ORC converts the table data into groups of rows called stripes along with auxiliary information in a file footer, default size of stripe is 250 MB. Hence, there will be difference in wc -l on orc file compared to actual numbers of rows in the table.
Created 08-11-2016 10:15 AM
Thanks Sindhu. What happens when we store it as TEXTFILE. The count should remain the same right?
Created 08-11-2016 10:44 AM
Yes, it would be same.