Support Questions

aryib · ‎03-02-2016

I noticed in the tutorials files that end with '.csv' were used so I was wondering if other file formats of data are accepted, such as 'html' 'xml' 'xls', and what other formats are accepted?

aervits · ‎03-02-2016

@Abdi Ismail

hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.

View solution in original post

aervits · ‎03-02-2016

@Abdi Ismail

hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.

aervits · ‎03-02-2016

@Abdi Ismail some examples here https://community.hortonworks.com/questions/15422/hive-and-avro-schema-defined-in-tblproperties-vs-s...

https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html

https://community.hortonworks.com/questions/4345/querying-json-data-using-hive.html

https://community.hortonworks.com/questions/18792/pig-and-json.html

Cloudera Community

Support Questions

Which types of files can we load into HDP data platform?