Created on 04-17-2018 09:38 AM - edited 09-16-2022 06:06 AM
I have a requirement where I need to create a hive table which uses the same data multiple times.
In order to do that I'm having multiple replace text steps followed by a merge content.
File content is as below :
abc123 active true sometext
I want to read the file as below into hive table :
abc123;active;true;sometext;123;active;c1;true;
In order to achieve this I'm using replace text where first I read as below(regex replace):
abc123;active;true;sometext;
Then
123;active;
And finally as
c1;true;
But I'm not able to merge content horizontally, is there a way to do it ? Or may be a easier way to achieve the same result. Any help will be appriciated. Thanks.
Created 04-17-2018 10:44 AM
maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?
To me it looks as if you don't do a text replace at all, please correct me if am wrong?
Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete
So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?
If it is a file, you may try the serde feature of hive.
Created 04-17-2018 10:44 AM
maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?
To me it looks as if you don't do a text replace at all, please correct me if am wrong?
Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete
So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?
If it is a file, you may try the serde feature of hive.
Created 04-17-2018 11:11 AM
Thanks for the Answer Herald, your understanding of my question is correct. All I'm trying is to get the format correct for my hive table column names. I mean I just put the formatted file on the hdfs and read using the external hive table.