Created 03-02-2016 07:43 PM
I noticed in the tutorials files that end with '.csv' were used so I was wondering if other file formats of data are accepted, such as 'html' 'xml' 'xls', and what other formats are accepted?
Created 03-02-2016 07:45 PM
hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.
Created 03-02-2016 07:45 PM
hadoop is a schema on read, generic, multi-purpose framework. You can ingest any type of file, you provide instructions to tools accessing your files like Hive, Pig, MapReduce and Spark. Out of the box, you can read CSV, JSON, Avro and XML, perhaps I should clarify that for example with Hive, you can provide a "SerDe" stands for serializer deserializer, think of it as a translator for your file type and read your files then. For HTML, you can use a library like jsoup to read those files and parse them with tools I mentioned above.
Created 03-02-2016 07:49 PM
@Abdi Ismail some examples here https://community.hortonworks.com/questions/15422/hive-and-avro-schema-defined-in-tblproperties-vs-s...
https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
https://community.hortonworks.com/questions/4345/querying-json-data-using-hive.html
https://community.hortonworks.com/questions/18792/pig-and-json.html