Support Questions

Find answers, ask questions, and share your expertise

How to store Query Results to Local Drive

avatar
Explorer

Hello Community,

The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'

I would like to store the results on a local drive called /hadoop/hdfs'

Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)

The script is as follows:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; 
DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(
acorn_category int,
acorn_categorycount int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography';
INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Many thanks

1 ACCEPTED SOLUTION

avatar
Contributor

@Carlton Patterson

If you are using ambari views, you can use hive view and after executing your query, you can export the results to csv file on your local sytem or to HDFS.

ambari-hive-view.png

View solution in original post

8 REPLIES 8

avatar

@Carlton Patterson

In your table.hql, please copy all your query and past it. Then run below command, by passing your beeline connection and input as table.hql(which has your query) so that it will allow you to store your output in required format. The output format can anything from below formats.

beeline -u 'jdbc:hive2://zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'--outputformat=csv2 -f table.hql > /hadoop/hdfs/tableslist.csv 
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]Format mode for result display. Default is table. See Separated-Value Output Formats below for description of recommended sv options.

Usage: beeline --outputformat=tsv

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-OutputFormat...

Hope this helps you.

avatar
Explorer

Hi Sridhar

Thanks for reaching out

Unfortunately’, I’m a real novice and I don’t understand beeline.

I love the fact that I can store the results in multiple formats, however I need to store the results in a destination that is easily accessible. For example, I can access certain directories using a regular ftp client, but I can’t readily access the hdfs dfs directory so I would like to store the results on a local directory like Hadoop/hive/ or any other easily accsssible directory.

I hope you can understand me

Cheers

avatar
Super Collaborator

@Carlton Patterson, There are multiple ways of doing that. One of them explained above by Sridhar.

Another way: after you execute your script, all your files will be CSV formatted under '/user/hive/geography' on HDFS.

So, you will need to run in command line:

hdfs dfs -get /user/hive/geography /hadoop/hdfs/

avatar
Super Collaborator

Use "-getmerge" to combine all files into one.

avatar
Super Collaborator

@Carlton Patterson , you don't need to understand beeline, if you take that command that was given to you, it is mostly copy-paste. The URL is even on the Ambari dashboard for you to copy exactly

beeline -u <URL> --outputformat=<FORMAT> -f <YOUR_SCRIPT> > <DESTINATION FILE ON LOCAL DISK>

The only confusing part there if you are unfamiliar with shell commands is Output Redirection to a file. The rest is very similar to any terminal based execution of a SQL script.

As far as I know, this is the only free way (as in, money) to get the data out to a file, in full. The alternative solution is to download a trial version of RazorSQL or pay for a tool like Tableau, that can export the SQL results.

Depending on your data size, Excel or LibreOffice might work

avatar
Contributor

@Carlton Patterson

If you are using ambari views, you can use hive view and after executing your query, you can export the results to csv file on your local sytem or to HDFS.

ambari-hive-view.png

avatar
Explorer

Sankaru,

I just realised that ..

avatar
Super Collaborator

I assume this returns a limited result set, though, for large tables?