Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to dump the output to a file from Beeline?

avatar
Expert Contributor

Hi,

I am trying to dump the output of a beeline query (below) to a file but it prints all the logs along with the output in the file.

beeline -u jdbc:hive2://somehost_ip/ -f hive.hql >op.txt

Here is the output

0: jdbc:hive2://10.211.1.5:10000/> use db;
0: jdbc:hive2://10.211.1.5:10000/> select count(*) from sample_table;
+------+--+
| _c0  |
+------+--+
| 131  |
+------+--+
0: jdbc:hive2://10.211.1.5:10000/>
0: jdbc:hive2://10.211.1.5:10000/>
0: jdbc:hive2://10.211.1.5:10000/>

Can someone let me know how to get the output alone in a file as we get using Hive?

1 ACCEPTED SOLUTION

avatar
Master Guru

You have different options.

1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example

beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )

2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.

3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.

Like

create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;

you can then get that output in the local file system with

hadoop fs -getmerge /tmp/myfolder myoutput.csv

 --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 inste

View solution in original post

16 REPLIES 16

avatar

Hi @Alex Raj, to clarify, do you just want the "131" output to a file? You can try this in your hql:

INSERT OVERWRITE DIRECTORY '/user/user1/results' select count(*) from sample_table;

This will write the output of your query into the results directory on HDFS.

avatar
Expert Contributor

Hi @Ana Gillan, thank you for your answer. This method is perfect to dump the output into a file however the HQL I write has some echo statements which will be printed before the result and this is why I wanted to redirect the output in the SQL style. However after reading @Benjamin Leonhardi 's answer I got a clue from it. Here is the modified query that will print a neat output without any logs included. Thanks to @Benjamin Leonhardi.

beeline -u jdbc:hive2://somehost_ip/ --silent=true -f hive.hql >op.txt

But in the output file there are still couple of Blank lines printed. Here is the screen shot of the output.

3202-samp.jpg

In the output file there are two blank lines at the beginning and three trailing blank lines. If this can be fixed it would be great.

avatar
Explorer

does 'INSERT OVERWRITE DIRECTORY' worked with beeline? i use it with hive , and it worked well, but when i switch to beeline, nothing output

Thank you

avatar
Master Guru

You have different options.

1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example

beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )

2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.

3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.

Like

create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;

you can then get that output in the local file system with

hadoop fs -getmerge /tmp/myfolder myoutput.csv

 --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 inste

avatar
Expert Contributor

Hi @Benjamin Leonhardi, Thanks for your answer. I got a clue from it that has made task to attain the output almost close to my expectation. If you go through my comments in @Ana Gillan 's answer I have described my issue and if you can get an answer for it, I am done for the day 🙂

avatar
Master Guru

How about this?

beeline -u jdbc:hive2://localhost:10000/default --silent=true --outputformat=csv2 -e "select * from sample_07 limit 10" > out.txt

[root@sandbox ~]# cat out.txt

sample_07.code,sample_07.description,sample_07.total_emp,sample_07.salary

00-0000,All Occupations,134354250,40690

11-0000,Management occupations,6003930,96150

11-1011,Chief executives,299160,151370

11-1021,General and operations managers,1655410,103780

11-1031,Legislators,61110,33880

11-2011,Advertising and promotions managers,36300,91100

11-2021,Marketing managers,165240,113400

11-2022,Sales managers,322170,106790

11-2031,Public relations managers,47210,97170

11-3011,Administrative services managers,239360,76370

avatar
Expert Contributor

Hi @Benjamin Leonhardi, still the same problem.

Here is what I get.

3234-hort.png

I am using CDH 5.3.6 with 0.13.1 beeline version.

Also I have another question if you don't mind answering, please. How to add a string in the output file while we execute the above command? Here is what I want to see in my output file

3235-hort.png

In Hive I can get this using '!echo' but the same does not work with beeline. I have tried '!sh echo' also but that does not seems to work. Much much appreciate your time.

avatar
Master Guru

No idea about the empty lines sorry, I don't have these lines as you can see. Just delete them with sed? ( google delete empty lines linux )

To add a string on top of a file you could just use linux as well from the outside i.e. :

echo "The count is:" > out.txt

beeline .... >>out.txt

the double ">>" appends to a file

avatar

Hi Alex raj,

You are missing --outputformat=csv2 in the command , kindly mention it and the file format should be the csv. It will work. I have tried it. As, you have only one value so only one value will come in the csv file, try export more columns and rows then it will give you the better understanding.

beeline -u jdbc:hive2://somehost_ip/ --silent=true --outputformat=csv2 -f hive.hql >op.csv