Created on 04-20-2017 06:54 AM - edited 09-16-2022 04:29 AM
Is there a way to change column names in a hive table using first row in the CSV? I know I can skip the first row through skip.header property but I would rather like to set column names using this so as to be able to have right names in the table if the structure of CSV ever changes. Is there a workaround or a proper way to do that?
Created 04-20-2017 10:26 AM
There is no way that first column can be considered as column name. But if the structure changes the its better to load the data as AVRO or Parquet file in hive. Even if the structure changes there is no need for you to change the old data and new data can be inserted into the same hive table.
Points to be noted: 1.External table has to be used 2.You might need a stage table before loading into External hive table which should be in avro/parquet format
Steps: 1. Create external table with columns which you have as avro/parquet. 2. Load the csv into stage table and then load the stage data into external table hive table. 3. If the columns changes then drop the external table and re-create with additional fields. 4. Insert the new file by following steps 1-2.
By this way there will not be any manually work needed to modify the existing data as avro by default will show 'null' for columns which are available in the table but not in the file. The only manually work is to drop and re-create the table ddl. Let me know if you needed any details. And if you feel it answers your question then please accept the answer
Created 04-20-2017 05:56 PM
Hi @Simran Kaur,
There is no way that first column can be considered as column name. But if the structure changes the its better to load the data as AVRO or Parquet file in hive. Even if the structure changes there is no need for you to change the old data and new data can be inserted into the same hive table.
Points to be noted:
1.External table has to be used
2.You might need a stage table before loading into External hive table which should be in avro/parquet format
Steps:
1. Create external table with columns which you have as avro/parquet.
2. Load the csv into stage table and then load the stage data into external table hive table.
3. If the columns changes then drop the external table and re-create with additional fields.
4. Insert the new file by following steps 1-2
By this way there will not be any manually work needed to modify the existing data as avro by default will show 'null' for columns which are available in the table but not in the file. The only manually work is to drop and re-create the table ddl. Let me know if you needed any details. And if you feel it answers your question then please accept the answer