Created 04-04-2016 11:54 AM
Hi,
I am trying to dump the output of a beeline query (below) to a file but it prints all the logs along with the output in the file.
beeline -u jdbc:hive2://somehost_ip/ -f hive.hql >op.txt
Here is the output
0: jdbc:hive2://10.211.1.5:10000/> use db; 0: jdbc:hive2://10.211.1.5:10000/> select count(*) from sample_table; +------+--+ | _c0 | +------+--+ | 131 | +------+--+ 0: jdbc:hive2://10.211.1.5:10000/> 0: jdbc:hive2://10.211.1.5:10000/> 0: jdbc:hive2://10.211.1.5:10000/>
Can someone let me know how to get the output alone in a file as we get using Hive?
Created 04-04-2016 01:51 PM
You have different options.
1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example
beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )
2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.
3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.
Like
create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;
you can then get that output in the local file system with
hadoop fs -getmerge /tmp/myfolder myoutput.csv
--showHeader=[true/false] show column names in query results --headerInterval=ROWS; the interval between which heades are displayed --fastConnect=[true/false] skip building table/column list for tab-completion --autoCommit=[true/false] enable/disable automatic transaction commit --verbose=[true/false] show verbose error messages and debug info --showWarnings=[true/false] display connection warnings --showNestedErrs=[true/false] display nested errors --numberFormat=[pattern] format numbers using DecimalFormat pattern --force=[true/false] continue running script even after errors --maxWidth=MAXWIDTH the maximum width of the terminal --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns --silent=[true/false] be more silent --autosave=[true/false] automatically save preferences --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display Note that csv, and tsv are deprecated - use csv2, tsv2 inste
Created 04-04-2016 12:43 PM
Hi @Alex Raj, to clarify, do you just want the "131" output to a file? You can try this in your hql:
INSERT OVERWRITE DIRECTORY '/user/user1/results' select count(*) from sample_table;
This will write the output of your query into the results directory on HDFS.
Created on 04-05-2016 05:26 AM - edited 08-18-2019 05:30 AM
Hi @Ana Gillan, thank you for your answer. This method is perfect to dump the output into a file however the HQL I write has some echo statements which will be printed before the result and this is why I wanted to redirect the output in the SQL style. However after reading @Benjamin Leonhardi 's answer I got a clue from it. Here is the modified query that will print a neat output without any logs included. Thanks to @Benjamin Leonhardi.
beeline -u jdbc:hive2://somehost_ip/ --silent=true -f hive.hql >op.txt
But in the output file there are still couple of Blank lines printed. Here is the screen shot of the output.
In the output file there are two blank lines at the beginning and three trailing blank lines. If this can be fixed it would be great.
Created 06-12-2016 06:51 AM
does 'INSERT OVERWRITE DIRECTORY' worked with beeline? i use it with hive , and it worked well, but when i switch to beeline, nothing output
Thank you
Created 04-04-2016 01:51 PM
You have different options.
1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example
beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )
2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.
3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.
Like
create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;
you can then get that output in the local file system with
hadoop fs -getmerge /tmp/myfolder myoutput.csv
--showHeader=[true/false] show column names in query results --headerInterval=ROWS; the interval between which heades are displayed --fastConnect=[true/false] skip building table/column list for tab-completion --autoCommit=[true/false] enable/disable automatic transaction commit --verbose=[true/false] show verbose error messages and debug info --showWarnings=[true/false] display connection warnings --showNestedErrs=[true/false] display nested errors --numberFormat=[pattern] format numbers using DecimalFormat pattern --force=[true/false] continue running script even after errors --maxWidth=MAXWIDTH the maximum width of the terminal --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns --silent=[true/false] be more silent --autosave=[true/false] automatically save preferences --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display Note that csv, and tsv are deprecated - use csv2, tsv2 inste
Created 04-05-2016 05:31 AM
Hi @Benjamin Leonhardi, Thanks for your answer. I got a clue from it that has made task to attain the output almost close to my expectation. If you go through my comments in @Ana Gillan 's answer I have described my issue and if you can get an answer for it, I am done for the day 🙂
Created 04-05-2016 10:25 AM
How about this?
beeline -u jdbc:hive2://localhost:10000/default --silent=true --outputformat=csv2 -e "select * from sample_07 limit 10" > out.txt
[root@sandbox ~]# cat out.txt
sample_07.code,sample_07.description,sample_07.total_emp,sample_07.salary
00-0000,All Occupations,134354250,40690
11-0000,Management occupations,6003930,96150
11-1011,Chief executives,299160,151370
11-1021,General and operations managers,1655410,103780
11-1031,Legislators,61110,33880
11-2011,Advertising and promotions managers,36300,91100
11-2021,Marketing managers,165240,113400
11-2022,Sales managers,322170,106790
11-2031,Public relations managers,47210,97170
11-3011,Administrative services managers,239360,76370
Created on 04-06-2016 07:12 AM - edited 08-18-2019 05:29 AM
Hi @Benjamin Leonhardi, still the same problem.
Here is what I get.
I am using CDH 5.3.6 with 0.13.1 beeline version.
Also I have another question if you don't mind answering, please. How to add a string in the output file while we execute the above command? Here is what I want to see in my output file
In Hive I can get this using '!echo' but the same does not work with beeline. I have tried '!sh echo' also but that does not seems to work. Much much appreciate your time.
Created 04-06-2016 11:42 AM
No idea about the empty lines sorry, I don't have these lines as you can see. Just delete them with sed? ( google delete empty lines linux )
To add a string on top of a file you could just use linux as well from the outside i.e. :
echo "The count is:" > out.txt
beeline .... >>out.txt
the double ">>" appends to a file
Created 05-08-2017 07:40 AM
Hi Alex raj,
You are missing --outputformat=csv2 in the command , kindly mention it and the file format should be the csv. It will work. I have tried it. As, you have only one value so only one value will come in the csv file, try export more columns and rows then it will give you the better understanding.
beeline -u jdbc:hive2://somehost_ip/ --silent=true --outputformat=csv2 -f hive.hql >op.csv