Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Master Guru



Backup Files from Hadoop

  • ListHDFS
    • set parameters, pick a high level directory and work down. /etc/hadoop/conf/core-site.xml
  • FetchHDFS
    • ${path}/${filename}
  • PutFile
    • Store in local

Backup Hive Tables

  • SelectHiveQL
    • Output format AVRO, with SQL: select * from beaconstatus
  • ConvertAVROtoORC
    • generic for all the tables
  • UpdateAttribute
    • tablename ${hive.ddl:substringAfter('CREATE EXTERNAL TABLE IF NOT EXISTS '):substringBefore(' (')}
  • PutFile
    • Use replace directories and create missing directories with directory: /Volumes/Transcend/HadoopBackups/hive/${tablename}

For Phoenix tables, I use the same ConvertAvroToORC, UpdateAttribute and PutFile boxes and just add ExecuteSQL to ingest Phoenix data.

For every new table, I add one box and link it to ConvertAvroToORC. Done!

This is enough of a backup so if I need to rebuild and refill my development cluster, I can do so easily. Also I have them schedule for once a day to rewrite everything.

This is Not For Production or Extremely Large Data! This works great for a development cluster or personal dev cluster.

You can easily backup files by ingesting with GetFile and other things can be backed up by called ExecuteStreamCommand.

Local File Storage of Backed up Data

drwxr-xr-x  3 tspann  staff  102 May 20 23:00 any_data_trials2
drwxr-xr-x  3 tspann  staff  102 May 20 22:59 any_data_meetup
drwxr-xr-x  3 tspann  staff  102 May 20 22:59 any_data_ibeacon
drwxr-xr-x  3 tspann  staff  102 May 20 22:57 any_data_gpsweather
drwxr-xr-x  3 tspann  staff  102 May 20 10:53 any_data_beaconstatus
drwxr-xr-x  3 tspann  staff  102 May 20 10:52 any_data_beacongateway
drwxr-xr-x  3 tspann  staff  102 May 19 17:36 any_data_atweetshive2
drwxr-xr-x  3 tspann  staff  102 May 19 17:31 any_data_atweetshive

Other Tools to Extract Data

ShowTables to get your list and then you can grab all the DDL for the Hive tables.


show create table atweetshive;
show create table atweetshive2;
show create table beacongateway;
show create table beaconstatus;
show create table dronedata;
show create table gps;
show create table gpsweather;
show create table ibeacon;
show create table meetup;
show create table trials2;

Hive Script to Export Table DDL

beeline -u jdbc:hive2://myhiveserverthrift:10000/default --color=false --showHeader=false --verbose=false --silent=true --outputformat=csv -f ddl.sql

Backup Zeppelin Notebooks in Bulk

tar -cvf notebooks.tar /usr/hdp/current/zeppelin-server/notebook/
gzip -9 notebooks.tar
scp userid@pservername:/opt/demo/notebooks.tar.gz .