Support Questions

Find answers, ask questions, and share your expertise

Merge content of a same file

avatar
Contributor

I have a requirement where I need to create a hive table which uses the same data multiple times.

In order to do that I'm having multiple replace text steps followed by a merge content.

File content is as below :

abc123 active true sometext

I want to read the file as below into hive table :

abc123;active;true;sometext;123;active;c1;true;

In order to achieve this I'm using replace text where first I read as below(regex replace):

abc123;active;true;sometext;

Then

123;active;

And finally as

c1;true;

But I'm not able to merge content horizontally, is there a way to do it ? Or may be a easier way to achieve the same result. Any help will be appriciated. Thanks.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?

To me it looks as if you don't do a text replace at all, please correct me if am wrong?

Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete

So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?

If it is a file, you may try the serde feature of hive.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?

To me it looks as if you don't do a text replace at all, please correct me if am wrong?

Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete

So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?

If it is a file, you may try the serde feature of hive.

avatar
Contributor

Thanks for the Answer Herald, your understanding of my question is correct. All I'm trying is to get the format correct for my hive table column names. I mean I just put the formatted file on the hdfs and read using the external hive table.