Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to dump the output to a file from Beeline?

avatar
Expert Contributor

Hi,

I am trying to dump the output of a beeline query (below) to a file but it prints all the logs along with the output in the file.

beeline -u jdbc:hive2://somehost_ip/ -f hive.hql >op.txt

Here is the output

0: jdbc:hive2://10.211.1.5:10000/> use db;
0: jdbc:hive2://10.211.1.5:10000/> select count(*) from sample_table;
+------+--+
| _c0  |
+------+--+
| 131  |
+------+--+
0: jdbc:hive2://10.211.1.5:10000/>
0: jdbc:hive2://10.211.1.5:10000/>
0: jdbc:hive2://10.211.1.5:10000/>

Can someone let me know how to get the output alone in a file as we get using Hive?

1 ACCEPTED SOLUTION

avatar
Master Guru

You have different options.

1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example

beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )

2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.

3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.

Like

create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;

you can then get that output in the local file system with

hadoop fs -getmerge /tmp/myfolder myoutput.csv

 --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 inste

View solution in original post

16 REPLIES 16

avatar

Hi 苏 斌,

I have not tried INSERT OVERWRITE DIRECTORY when i switch to beeline. Try to use this command:-

beeline -u jdbc:hive2://somehost_ip/ --silent=true --outputformat=csv2 -f hive.hql >op.csv

avatar
New Contributor

Hi Team,

I am trying to print output along with number records selected or number record affected messages in the output file but I am unable to print it.

I have tried with --verbose=true also but it is not helping.

<p>

beeline -u "jdbc:hive2://abc.nsroot.net:10000/;principal=hive/abc.nsroot.net@ABC.NSROOT.NET;ssl=true " -f /home/test1.hql > testresult1.txt

</p>

hql files has only one query "select * from mytable;"

can any one suggest me on this?

Thanks & regards,

Kamleshkumar Gujarathi

avatar
New Contributor

@Benjamin Leonhardi I am facing a problem while fetching ouput in csv2 format. Since my data itself contains comma hence that specific column is encapsulated in special character "^@". Can you please help of how to avoid it or if I can get the output in a pipe delimited form?

avatar
Expert Contributor

Also, just wanted to point out that, depending on the version of Hive being used, may default to buffering within the beeline client.  Be sure to enable 'incremental' fetches of data from the Hive server when dealing with large result sets.

 

--incremental=[true/false]

Defaults to true from Hive 2.3 onwards, before it defaulted to false. When set to false, the entire result set is fetched and buffered before being displayed, yielding optimal display column sizing. When set to true, result rows are displayed immediately as they are fetched, yielding lower latency and memory usage at the price of extra display column padding. Setting --incremental=true is recommended if you encounter an OutOfMemoryException.

 

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

avatar
Expert Contributor

Another option for large data sets, if ordering doesn't matter, is to create an EXTERNAL table with the necessary delimiters and issue an INSERT statement into the table instead of a SELECT statement in beeline.  To copy the data locally, issue a:

 

hdfs dfs -cat /my/table/*

"Order doesn't matter' because the cat application will not necessarily read the files in proper order.  If an ORDER BY is included in the query, the contents of each file will be in order, but the files may be read out of order by the 'cat' application.

 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html#cat

avatar
New Contributor

Please try this, this will only output the stadard output of the not error or logs:

 

beeline -u jdbc:hive2://somehost_ip/ -f hive.hql 2> /dev/null >op.txt

 

avatar
New Contributor

Please check below command, here 2> /dev/null will consume all the logs and error. It will now allow standard output to be shown: 

beeline -u jdbc:hive2://somehost_ip/ -f 2> /dev/null hive.hql >op.txt

if you like this please give me kudos. Thanks!!!