Created 01-31-2018 09:06 PM
Hello Community,
The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'
I would like to store the results on a local drive called /hadoop/hdfs'
Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)
The script is as follows:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
Many thanks
Created 02-02-2018 06:10 PM
Created 01-31-2018 11:07 PM
In your table.hql, please copy all your query and past it. Then run below command, by passing your beeline connection and input as table.hql(which has your query) so that it will allow you to store your output in required format. The output format can anything from below formats.
beeline -u 'jdbc:hive2://zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'--outputformat=csv2 -f table.hql > /hadoop/hdfs/tableslist.csv
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] | Format mode for result display. Default is table. See Separated-Value Output Formats below for description of recommended sv options. Usage: |
Hope this helps you.
Created 01-31-2018 11:39 PM
Hi Sridhar
Thanks for reaching out
Unfortunately’, I’m a real novice and I don’t understand beeline.
I love the fact that I can store the results in multiple formats, however I need to store the results in a destination that is easily accessible. For example, I can access certain directories using a regular ftp client, but I can’t readily access the hdfs dfs directory so I would like to store the results on a local directory like Hadoop/hive/ or any other easily accsssible directory.
I hope you can understand me
Cheers
Created 02-01-2018 09:36 PM
@Carlton Patterson, There are multiple ways of doing that. One of them explained above by Sridhar.
Another way: after you execute your script, all your files will be CSV formatted under '/user/hive/geography' on HDFS.
So, you will need to run in command line:
hdfs dfs -get /user/hive/geography /hadoop/hdfs/
Created 02-02-2018 03:56 PM
Use "-getmerge" to combine all files into one.
Created 02-02-2018 04:07 PM
@Carlton Patterson , you don't need to understand beeline, if you take that command that was given to you, it is mostly copy-paste. The URL is even on the Ambari dashboard for you to copy exactly
beeline -u <URL> --outputformat=<FORMAT> -f <YOUR_SCRIPT> > <DESTINATION FILE ON LOCAL DISK>
The only confusing part there if you are unfamiliar with shell commands is Output Redirection to a file. The rest is very similar to any terminal based execution of a SQL script.
As far as I know, this is the only free way (as in, money) to get the data out to a file, in full. The alternative solution is to download a trial version of RazorSQL or pay for a tool like Tableau, that can export the SQL results.
Depending on your data size, Excel or LibreOffice might work
Created 02-02-2018 06:10 PM
Created 02-02-2018 08:15 PM
Sankaru,
I just realised that ..
Created 02-02-2018 09:58 PM
I assume this returns a limited result set, though, for large tables?