Hi All, Problem Statement: I have a source system that takes website data and sends out a lot of columns about 500+, At this point we only have interest in 100+ columns, we are trying to create a mechanism by which its easy to modify the HIVE table schema as the need to read more column arises. I have previously used AVRO schema files with JSON data and it worked quiet well to help add new columns easily to the schema file without altering data file. But currently the source system can only send flat CSV. I am trying to create a HIVE table and read this file as below
CREATE EXTERNAL TABLE players ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'hdfs://xxxxxcluster1/apps/hive/warehouse/test.db/players' tblproperties ('avro.schema.url'='hdfs://xxxxxcluster1/tmp/prototype/players.avsc',"skip.header.line.count"="1");
Error: (I have changed all file permissions to 777)
FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified
I have not been able to find references to reading TEXTFILE with avro schema. Is this possible to do ? or is there a better way to make schema update modular with TEXTFILE data rather than having to drop and recreate table every time.
HI @Priya Parmar , From some amount of research(Not deep dive) that i did during that time , i could not find a reference to what i wanted to achieve hence i abandoned the approach and did not use AVRO schema , as i had limited time at hand. Ref: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHiv... Do let me know if you got around this pattern or if a way forward has been developed.
--Create an avro table using your avro schema file, then create a csv referencing the avro table. Lastly, drop the avro table (if desired). DROP TABLE IF EXISTS avro_test; CREATE EXTERNAL TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' TBLPROPERTIES ('avro.schema.url'='/user/BLAH/avro_schema.avsc'); ---do we have a table? SELECT * FROM avro_test; ---Now, create the external table on top of the csv data. DROP TABLE IF EXISTS csv_test; CREATE EXTERNAL TABLE csv_test LIKE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION '/user/BLAH/somedata'; SELECT * FROM csv_test;