- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
column names in a hive table
- Labels:
-
Apache Hive
Created on ‎04-20-2017 06:54 AM - edited ‎09-16-2022 04:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to change column names in a hive table using first row in the CSV? I know I can skip the first row through skip.header property but I would rather like to set column names using this so as to be able to have right names in the table if the structure of CSV ever changes. Is there a workaround or a proper way to do that?
Created ‎04-20-2017 10:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no way that first column can be considered as column name. But if the structure changes the its better to load the data as AVRO or Parquet file in hive. Even if the structure changes there is no need for you to change the old data and new data can be inserted into the same hive table.
Points to be noted: 1.External table has to be used 2.You might need a stage table before loading into External hive table which should be in avro/parquet format
Steps: 1. Create external table with columns which you have as avro/parquet. 2. Load the csv into stage table and then load the stage data into external table hive table. 3. If the columns changes then drop the external table and re-create with additional fields. 4. Insert the new file by following steps 1-2.
By this way there will not be any manually work needed to modify the existing data as avro by default will show 'null' for columns which are available in the table but not in the file. The only manually work is to drop and re-create the table ddl. Let me know if you needed any details. And if you feel it answers your question then please accept the answer
Created ‎04-20-2017 05:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Simran Kaur,
There is no way that first column can be considered as column name. But if the structure changes the its better to load the data as AVRO or Parquet file in hive. Even if the structure changes there is no need for you to change the old data and new data can be inserted into the same hive table.
Points to be noted:
1.External table has to be used
2.You might need a stage table before loading into External hive table which should be in avro/parquet format
Steps:
1. Create external table with columns which you have as avro/parquet.
2. Load the csv into stage table and then load the stage data into external table hive table.
3. If the columns changes then drop the external table and re-create with additional fields.
4. Insert the new file by following steps 1-2
By this way there will not be any manually work needed to modify the existing data as avro by default will show 'null' for columns which are available in the table but not in the file. The only manually work is to drop and re-create the table ddl. Let me know if you needed any details. And if you feel it answers your question then please accept the answer
