- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 05-22-2017 08:16 PM - edited 08-17-2019 12:52 PM
Backup Files from Hadoop
- ListHDFS
- set parameters, pick a high level directory and work down. /etc/hadoop/conf/core-site.xml
- FetchHDFS
- ${path}/${filename}
- PutFile
- Store in local
Backup Hive Tables
- SelectHiveQL
- Output format AVRO, with SQL: select * from beaconstatus
- ConvertAVROtoORC
- generic for all the tables
- UpdateAttribute
- tablename ${hive.ddl:substringAfter('CREATE EXTERNAL TABLE IF NOT EXISTS '):substringBefore(' (')}
- PutFile
- Use replace directories and create missing directories with directory: /Volumes/Transcend/HadoopBackups/hive/${tablename}
For Phoenix tables, I use the same ConvertAvroToORC, UpdateAttribute and PutFile boxes and just add ExecuteSQL to ingest Phoenix data.
For every new table, I add one box and link it to ConvertAvroToORC. Done!
This is enough of a backup so if I need to rebuild and refill my development cluster, I can do so easily. Also I have them schedule for once a day to rewrite everything.
This is Not For Production or Extremely Large Data! This works great for a development cluster or personal dev cluster.
You can easily backup files by ingesting with GetFile and other things can be backed up by called ExecuteStreamCommand.
Local File Storage of Backed up Data
drwxr-xr-x 3 tspann staff 102 May 20 23:00 any_data_trials2 drwxr-xr-x 3 tspann staff 102 May 20 22:59 any_data_meetup drwxr-xr-x 3 tspann staff 102 May 20 22:59 any_data_ibeacon drwxr-xr-x 3 tspann staff 102 May 20 22:57 any_data_gpsweather drwxr-xr-x 3 tspann staff 102 May 20 10:53 any_data_beaconstatus drwxr-xr-x 3 tspann staff 102 May 20 10:52 any_data_beacongateway drwxr-xr-x 3 tspann staff 102 May 19 17:36 any_data_atweetshive2 drwxr-xr-x 3 tspann staff 102 May 19 17:31 any_data_atweetshive
Other Tools to Extract Data
ShowTables to get your list and then you can grab all the DDL for the Hive tables.
ddl.sql
show create table atweetshive; show create table atweetshive2; show create table beacongateway; show create table beaconstatus; show create table dronedata; show create table gps; show create table gpsweather; show create table ibeacon; show create table meetup; show create table trials2;
Hive Script to Export Table DDL
beeline -u jdbc:hive2://myhiveserverthrift:10000/default --color=false --showHeader=false --verbose=false --silent=true --outputformat=csv -f ddl.sql
Backup Zeppelin Notebooks in Bulk
tar -cvf notebooks.tar /usr/hdp/current/zeppelin-server/notebook/ gzip -9 notebooks.tar scp userid@pservername:/opt/demo/notebooks.tar.gz .