Support Questions

barlow · ‎01-31-2018

Hello Community,

The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'

I would like to store the results on a local drive called /hadoop/hdfs'

Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)

The script is as follows:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; 
DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(
acorn_category int,
acorn_categorycount int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography';
INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Many thanks

sankar_tumuluru · ‎02-02-2018

@Carlton Patterson

If you are using ambari views, you can use hive view and after executing your query, you can export the results to csv file on your local sytem or to HDFS.

ambari-hive-view.png

View solution in original post

bandarusridhar1 · ‎01-31-2018

@Carlton Patterson

In your table.hql, please copy all your query and past it. Then run below command, by passing your beeline connection and input as table.hql(which has your query) so that it will allow you to store your output in required format. The output format can anything from below formats.

beeline -u 'jdbc:hive2://zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'--outputformat=csv2 -f table.hql > /hadoop/hdfs/tableslist.csv

--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]

Format mode for result display. Default is table. See Separated-Value Output Formats below for description of recommended sv options.

Usage: beeline --outputformat=tsv

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-OutputFormat...

Hope this helps you.

barlow · ‎01-31-2018

Hi Sridhar

Thanks for reaching out

Unfortunately’, I’m a real novice and I don’t understand beeline.

I love the fact that I can store the results in multiple formats, however I need to store the results in a destination that is easily accessible. For example, I can access certain directories using a regular ftp client, but I can’t readily access the hdfs dfs directory so I would like to store the results on a local directory like Hadoop/hive/ or any other easily accsssible directory.

I hope you can understand me

Cheers

eberezitsky · ‎02-01-2018

@Carlton Patterson, There are multiple ways of doing that. One of them explained above by Sridhar.

Another way: after you execute your script, all your files will be CSV formatted under '/user/hive/geography' on HDFS.

So, you will need to run in command line:

hdfs dfs -get /user/hive/geography /hadoop/hdfs/

JordanMoore · ‎02-02-2018

Use "-getmerge" to combine all files into one.

JordanMoore · ‎02-02-2018

@Carlton Patterson , you don't need to understand beeline, if you take that command that was given to you, it is mostly copy-paste. The URL is even on the Ambari dashboard for you to copy exactly

beeline -u <URL> --outputformat=<FORMAT> -f <YOUR_SCRIPT> > <DESTINATION FILE ON LOCAL DISK>

The only confusing part there if you are unfamiliar with shell commands is Output Redirection to a file. The rest is very similar to any terminal based execution of a SQL script.

As far as I know, this is the only free way (as in, money) to get the data out to a file, in full. The alternative solution is to download a trial version of RazorSQL or pay for a tool like Tableau, that can export the SQL results.

Depending on your data size, Excel or LibreOffice might work

sankar_tumuluru · ‎02-02-2018

@Carlton Patterson

If you are using ambari views, you can use hive view and after executing your query, you can export the results to csv file on your local sytem or to HDFS.

ambari-hive-view.png

barlow · ‎02-02-2018

Sankaru,

I just realised that ..

JordanMoore · ‎02-02-2018

I assume this returns a limited result set, though, for large tables?

Cloudera Community

Support Questions

How to store Query Results to Local Drive

Spark Python Integration Test Result Exceptions

Store intermediate Query Result to variable in Bee...

How to use query result in another query Apache Ni...

Apache Spark Fine Grain Security with LLAP Test Dr...

HBase – Query Performance when storing and retriev...

Auth-to-local Rules Syntax

How to integrate a Feature Store on Cloudera Machi...

Apache Atlas as an Avro Schema Registry Test Drive

Storing Apache NiFi Versioned Flows in a Git Repos...

Running Apache Pig Scripts from Apache NiFi and St...