Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

Solved Go to solution
Highlighted

For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

New Contributor

Is it necessary to remove header record while analysis of CSV format file?

When i checked solution for practice exam of HDPCD I observed, header record for CSV is not been removed and data is analysed.

Shall we remove the header record or not ,because it may affect the final output and record count?

How this kind of solutions will be rated in real exam?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

Hi @Anand Pawar

Ofcourse you should not consider the header!

While analyzing you should remove the header only then you will be able to get proper output. As you have mentioned it will end up in misinterpretation and sometime error. Also this can be handled easily in whatever the tool you choose in hadoop. If you are storing it in a hive table then use tblproperties("skip.header.line.count"="1"); to skip the header.

If it is in pig then you can skip the first line while processing it. For sure you should not consider the header in the file when you analyze the data but however you can store the file with header. Hope this would answer your question.

4 REPLIES 4

Re: For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

Hi @Anand Pawar

Ofcourse you should not consider the header!

While analyzing you should remove the header only then you will be able to get proper output. As you have mentioned it will end up in misinterpretation and sometime error. Also this can be handled easily in whatever the tool you choose in hadoop. If you are storing it in a hive table then use tblproperties("skip.header.line.count"="1"); to skip the header.

If it is in pig then you can skip the first line while processing it. For sure you should not consider the header in the file when you analyze the data but however you can store the file with header. Hope this would answer your question.

Re: For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

New Contributor

Thank you for your answer @Bala Vignesh N V

It means the final output should have header before we store the output to HDFS.

Please correct me if I am wrong.

Re: For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

@Anand Pawar Its kind of tricky here. You can have the header when storing in HDFS. While processing the data for analysis you should remember that file contains header and it should be skipped orelse it will cause errors. As mentioned above if you use skip header properties it will be skipped by default in hive. However the base data lying underneath the hive table will contain header which can be used for any further processing. In simple when storing it you can have header but when processing the data you should not have header. If you feel it satisfies your question then accept the answer.

Re: For HDPCD exam, ​Is it necessary to remove header record while analysis of CSV format file?

New Contributor

@rich Rich and Team

Don't have an account?
Coming from Hortonworks? Activate your account here