Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Table Default Format

avatar
Contributor

Hello Friends,

I have couple of questions related to Hive and I am confused about the correct answer.

Q1: What is the correct way to define default Hive table ?

Q2: Also, what is the default delimiter for Hive table ?

In other words, requirement says create hive table with default format.

Now I am checking "set hive.default.fileformat;" on hive cli. It is showing "TextFile".

So I am creating the Hive table like : create table mydb.user (uid int,name string) ;

But it is creating table with Row Format "LazySimpleSerDe" and without any delimiter.

Is this correct way to define default hive table ?

Or shall I define it as : create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;

Because, in this case it is showing Row Format "DELIMITED" and Fields Terminated By '\t'.

Thanking you

Santanu

1 ACCEPTED SOLUTION

avatar
Master Guru

@Santanu Ghosh
Q1: What is the correct way to define default Hive table ?

It depends on how you datafile is delimited

if your datafile is delimited with ^A(ctrl-a) then you don’t need to specify any delimiter,because hive default delimiter is ^A(ctrl-a) in ASCII. in this case if we do desc formatted <table-name>; hive doesn't shows any delimiter because table is created with default delimiter i.e ^A

If your datafile is comma delimited then we need to create table with row format delimited fields terminated by ','.

when we created table with create table mydb.user (uid int,name string); this ddl statement without any format and delimiters then hive creates user table with default serde (serialize,deserializer ). This serde instructs hive on how to process a record (Row) and serde library is inbuilt to Hadoop API.

Default serde -->  LazySimpleSerDe<br>Default InputFormat --> TextInputFormat<br>Default OutputFormat --> HiveIgnoreKeyTextOutputFormat<br>Default Delimiter --> ^A(ctrl-a) //although the delimiter is not showing in desc formatted <table-name>;

Q2: Also, what is the default delimiter for Hive table ?

By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a).

Please refer to this link for more details regarding hive ddl operations.

Q3: create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;

If we define table with specified delimiter so that’s the reason why you are able to see delimiter as \t in desc formatted table command.

View solution in original post

4 REPLIES 4

avatar
Master Guru

@Santanu Ghosh
Q1: What is the correct way to define default Hive table ?

It depends on how you datafile is delimited

if your datafile is delimited with ^A(ctrl-a) then you don’t need to specify any delimiter,because hive default delimiter is ^A(ctrl-a) in ASCII. in this case if we do desc formatted <table-name>; hive doesn't shows any delimiter because table is created with default delimiter i.e ^A

If your datafile is comma delimited then we need to create table with row format delimited fields terminated by ','.

when we created table with create table mydb.user (uid int,name string); this ddl statement without any format and delimiters then hive creates user table with default serde (serialize,deserializer ). This serde instructs hive on how to process a record (Row) and serde library is inbuilt to Hadoop API.

Default serde -->  LazySimpleSerDe<br>Default InputFormat --> TextInputFormat<br>Default OutputFormat --> HiveIgnoreKeyTextOutputFormat<br>Default Delimiter --> ^A(ctrl-a) //although the delimiter is not showing in desc formatted <table-name>;

Q2: Also, what is the default delimiter for Hive table ?

By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a).

Please refer to this link for more details regarding hive ddl operations.

Q3: create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;

If we define table with specified delimiter so that’s the reason why you are able to see delimiter as \t in desc formatted table command.

avatar
Contributor

Thanks @Shu.

So if input file is tab delimited and it says create hive table with default format, then just :

"create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ; "

should be sufficient. This will create table with tab delimiter, and it will take file storage from default format which is TextFile. (hive.default.fileformat)

Thanking you

Santanu

avatar
Master Guru

@Santanu Ghosh

That's correct.Just create table with \t delimited then the hive can read the file with the delimiter specified.

Example:-

Creating Tab delimited table in Hive:-

I'm having tab delimited file in hdfs

bash$ hadoop fs -cat /apps/hive/warehouse/t1/t.txt<br>1\tfoo

I have created a table in hive with tab delimited as

hive> create table default.t1 (id string,name string) row format delimited fields terminated by '\t';

Do desc formatted default.t1; to see the delimiter for the table and all the Serde,Input,Output Formats are set to be default(as mentioned in the above answer).

# Storage Information         | NULL                                                        | NULL 
SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          | NULL 
InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat                    | NULL 
OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat  | NULL 
Compressed:                   | No                                                          | NULL 
Storage Desc Params:          | NULL                                                        | NULL 
                              | field.delim                                                 | \t   

Now check the data in t1 table

hive> select * from default.t1;
+--------+----------+--+
| t1.id  | t1.name  |
+--------+----------+--+
| 1      | foo      |
+--------+----------+--+

Hive read the input tab delimited t.txt file and shown the results.

Let me know if you are having any additional questions..!!

.

If the Answer Addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Contributor

Thanks @Shu for the explanation. I really appreciate it.