the question is regarding regx being used in loading drivers data in tutorial How to Process Data with Apache Hive,
In the tutorial How to Process Data with Apache Hive, I would like to know why regex was used to insert data into drivers.
Here, regex is used as one of the options to process CSV data since in the first step, complete record as a string in a staging table.
you can also directly load csv file into target table (without using any staging table) using comma delimiter.
ok. but even if it is staging whats the need to use regx to load the data. Are we trying to find some characters for example a @ to define the portion of string to be an email id.
The use of regexp function is just to illustrate it's usage. The pattern used in the example is straightforward and do not look for any special characters etc. As @rtrivedi mentioned, you can create a table directly according to the schema of the file using following properties after defining the schema of the table.
create table <your table name> ( <your schema> ) row format delimited fields terminated by ',';
However, the use of regexp in the example may expand the list of the topics covered in the example.
Hope that helps!