Support Questions

Find answers, ask questions, and share your expertise

How to Process Data with Apache Hive - REGX,How to Process Data with Apache Hive - Reason for using regex

New Contributor

the question is regarding regx being used in loading drivers data in tutorial How to Process Data with Apache Hive


In the tutorial How to Process Data with Apache Hive, I would like to know why regex was used to insert data into drivers.


Rising Star
@Girish Jaiswal

Here, regex is used as one of the options to process CSV data since in the first step, complete record as a string in a staging table.

you can also directly load csv file into target table (without using any staging table) using comma delimiter.

New Contributor

ok. but even if it is staging whats the need to use regx to load the data. Are we trying to find some characters for example a @ to define the portion of string to be an email id.

@Girish Jaiswal

The use of regexp function is just to illustrate it's usage. The pattern used in the example is straightforward and do not look for any special characters etc. As @rtrivedi mentioned, you can create a table directly according to the schema of the file using following properties after defining the schema of the table.

create table <your table name>
<your schema>
row format delimited
fields terminated by ',';

However, the use of regexp in the example may expand the list of the topics covered in the example.

Hope that helps!