Support Questions

contactvivekjai · ‎04-17-2018

I have a requirement where I need to create a hive table which uses the same data multiple times.

In order to do that I'm having multiple replace text steps followed by a merge content.

File content is as below :

abc123 active true sometext

I want to read the file as below into hive table :

abc123;active;true;sometext;123;active;c1;true;

In order to achieve this I'm using replace text where first I read as below(regex replace):

abc123;active;true;sometext;

Then

123;active;

And finally as

c1;true;

But I'm not able to merge content horizontally, is there a way to do it ? Or may be a easier way to achieve the same result. Any help will be appriciated. Thanks.

arald · ‎04-17-2018

maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?

To me it looks as if you don't do a text replace at all, please correct me if am wrong?

Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete

So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?

If it is a file, you may try the serde feature of hive.

View solution in original post

arald · ‎04-17-2018

maybe I get your question wrong, but you want to convert 1 line of the file into 1 line of the hive table right? Your target table has 8 columns, while the text file only has 4 columns/words?

To me it looks as if you don't do a text replace at all, please correct me if am wrong?

Col1: Word1 of file complete
Col2: Word2 of file complete
Col3: Word3 of file complete
Col4: Word4 of file complete
Col5: second part of Word1 (not sure if digits part, or just the last 3 chars or just half of the chars?)
Col6: Word2 of file complete
Col7: middlepart of Word1 (just the middle two char, or the two char around the split of Col5?)
Col8: Word3 of file complete

So it looks like you are populating three columns with the word1 of the file, word 2 and word 3 are populated into 2 columns each? The result needs to be a Hive table or a file? And is the input a file stored on hdfs or a stream where you receive line by line?

If it is a file, you may try the serde feature of hive.

contactvivekjai · ‎04-17-2018

Thanks for the Answer Herald, your understanding of my question is correct. All I'm trying is to get the format correct for my hive table column names. I mean I just put the formatted file on the hdfs and read using the external hive table.

Cloudera Community

Support Questions

Merge content of a same file