Created 04-19-2018 08:59 AM
Hello Friends,
I have couple of questions related to Hive and I am confused about the correct answer.
Q1: What is the correct way to define default Hive table ?
Q2: Also, what is the default delimiter for Hive table ?
In other words, requirement says create hive table with default format.
Now I am checking "set hive.default.fileformat;" on hive cli. It is showing "TextFile".
So I am creating the Hive table like : create table mydb.user (uid int,name string) ;
But it is creating table with Row Format "LazySimpleSerDe" and without any delimiter.
Is this correct way to define default hive table ?
Or shall I define it as : create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;
Because, in this case it is showing Row Format "DELIMITED" and Fields Terminated By '\t'.
Thanking you
Santanu
Created 04-19-2018 03:12 PM
@Santanu Ghosh
Q1:
What is the correct way to define default Hive table ?
It depends on how you datafile is delimited
if your datafile is delimited with ^A(ctrl-a) then you don’t need to specify any delimiter,because hive default delimiter is ^A(ctrl-a) in ASCII. in this case if we do desc formatted <table-name>; hive doesn't shows any delimiter because table is created with default delimiter i.e ^A
If your datafile is comma delimited then we need to create table with row format delimited fields terminated by ','.
when we created table with create table mydb.user (uid int,name string); this ddl statement without any format and delimiters then hive creates user table with default serde (serialize,deserializer ). This serde instructs hive on how to process a record (Row) and serde library is inbuilt to Hadoop API.
Default serde --> LazySimpleSerDe<br>Default InputFormat --> TextInputFormat<br>Default OutputFormat --> HiveIgnoreKeyTextOutputFormat<br>Default Delimiter --> ^A(ctrl-a) //although the delimiter is not showing in desc formatted <table-name>;
Q2: Also, what is the default delimiter for Hive table ?
By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a).
Please refer to this link for more details regarding hive ddl operations.
Q3: create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;
If we define table with specified delimiter so that’s the reason why you are able to see delimiter as \t in desc formatted table command.
Created 04-19-2018 03:12 PM
@Santanu Ghosh
Q1:
What is the correct way to define default Hive table ?
It depends on how you datafile is delimited
if your datafile is delimited with ^A(ctrl-a) then you don’t need to specify any delimiter,because hive default delimiter is ^A(ctrl-a) in ASCII. in this case if we do desc formatted <table-name>; hive doesn't shows any delimiter because table is created with default delimiter i.e ^A
If your datafile is comma delimited then we need to create table with row format delimited fields terminated by ','.
when we created table with create table mydb.user (uid int,name string); this ddl statement without any format and delimiters then hive creates user table with default serde (serialize,deserializer ). This serde instructs hive on how to process a record (Row) and serde library is inbuilt to Hadoop API.
Default serde --> LazySimpleSerDe<br>Default InputFormat --> TextInputFormat<br>Default OutputFormat --> HiveIgnoreKeyTextOutputFormat<br>Default Delimiter --> ^A(ctrl-a) //although the delimiter is not showing in desc formatted <table-name>;
Q2: Also, what is the default delimiter for Hive table ?
By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a).
Please refer to this link for more details regarding hive ddl operations.
Q3: create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ;
If we define table with specified delimiter so that’s the reason why you are able to see delimiter as \t in desc formatted table command.
Created 04-19-2018 05:05 PM
Thanks @Shu.
So if input file is tab delimited and it says create hive table with default format, then just :
"create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ; "
should be sufficient. This will create table with tab delimiter, and it will take file storage from default format which is TextFile. (hive.default.fileformat)
Thanking you
Santanu
Created 04-19-2018 05:32 PM
That's correct.Just create table with \t delimited then the hive can read the file with the delimiter specified.
Example:-
Creating Tab delimited table in Hive:-
I'm having tab delimited file in hdfs
bash$ hadoop fs -cat /apps/hive/warehouse/t1/t.txt<br>1\tfoo
I have created a table in hive with tab delimited as
hive> create table default.t1 (id string,name string) row format delimited fields terminated by '\t';
Do desc formatted default.t1; to see the delimiter for the table and all the Serde,Input,Output Formats are set to be default(as mentioned in the above answer).
# Storage Information | NULL | NULL SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL Compressed: | No | NULL Storage Desc Params: | NULL | NULL | field.delim | \t
Now check the data in t1 table
hive> select * from default.t1; +--------+----------+--+ | t1.id | t1.name | +--------+----------+--+ | 1 | foo | +--------+----------+--+
Hive read the input tab delimited t.txt file and shown the results.
Let me know if you are having any additional questions..!!
.
If the Answer Addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 04-20-2018 04:27 AM
Thanks @Shu for the explanation. I really appreciate it.