Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Which types of files can we load into HDP data platform?

avatar
New Contributor

I noticed in the tutorials files that end with '.csv' were used so I was wondering if other file formats of data are accepted, such as 'html' 'xml' 'xls', and what other formats are accepted?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Abdi Ismail

hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Abdi Ismail

hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.