专栏名称: 生信宝典
生物信息分析入门、晋级和经验分享。Linux、R、Python学习教程;高通量测序数据分析学习教程;生信软件安装教程。所有内容均为原创分享,致力于从基础学习到提高整个过程。
目录
相关文章推荐
生物学霸  ·  浙大出手了,直接带你畅游满血版 DeepSeek ·  昨天  
BioArt  ·  ​Genome Biol | ... ·  昨天  
生物探索  ·  Science | ... ·  3 天前  
51好读  ›  专栏  ›  生信宝典

Linux服务器数据定期同步和备份方式

生信宝典  · 公众号  · 生物  · 2017-07-11 06:54

正文

数据安全是做数据分析的人需要关注的一大问题。对于我们分析的关键数据、使用的关键脚本都需要定期备份。

scp

最简单的备份方式,就是使用 cp (本地硬盘)或 scp (远程硬盘)命令,给自己的结果文件新建一个拷贝;每有更新,再拷贝一份。具体命令如下:

cp -fur source_project project_bak
scp -r source_project user@remote_server_ip:project_bak

为了实现定期备份,我们可以把上述命令写入 crontab 程序中,设置每天的晚上23:00执行。对于远程服务器的备份,我们可以配置免密码登录,便于自动备份。后台输入 免密码登录服务器 ,获取免密码登录服务器的方法。

# Crontab format
# Minute  Hour  Day  Month  Week  command 
# * 表示每分/时/天/月/周
# 每天23:00 执行cp命令
0          23      *       *       *      cp -fur source_project project_bak
# */2 表示每隔2分分/时/天/月/周执行命令
# 每隔24小时执行cp命令
0          */24      *       *       *      cp -fur source_project project_bak
0          0          */1     *        *     scp -r source_project user@remote_server_ip:project_bak

# 另外crotab还有个特殊的时间
# @reboot: 开机运行指定命令
@reboot cmd

rsync

cp scp 使用简单,但每次执行都会对所有文件进行拷贝,耗时耗力,尤其是需要拷贝的内容很多时,重复拷贝对时间和硬盘都是个损耗。

rsync 则是一个增量备份工具,只针对 修改过的文件 修改过的部分 进行同步备份,大大缩短了传输的文件的数量和传输时间。具体使用如下 :

# 把本地project目录下的东西备份到远程服务器的/backup/project目录下
# 注意第一个project后面的反斜线,表示拷贝目录内的内容,不在目标目录新建project文件夹。注意与第二个命令的比较,两者实现同样的功能。
# -a: archive mode, quals -rlptgoD
# -r: 递归同步
# -p: 同步时保留原文件的权限设置
# -u: 若文件在远端做过更新,则不同步,避免覆盖远端的修改
# -L: 同步符号链接链接的文件,防止在远程服务器出现文件路径等不匹配导致的软连接失效
# -t: 保留修改时间
# -v: 显示更新信息
# -z: 传输过程中压缩文件,对于传输速度慢时适用
rsync -aruLptvz --delete project/ user@remoteServer:/backup/project
rsync -aruLptvz --delete project user@remoteServer:/backup/

rsync 所做的工作为镜像,保证远端服务器与本地文件的统一。如果本地文件没问题,远端也不会有问题。但如果发生误删或因程序运行错误,导致文件出问题,而 在同步之前又没有意识到 的话,远端的备份也就没了备份的意义,因为它也被损坏了。误删是比较容易发现的,可以及时矫正。但程序运行出问题,则不一定了。

rdiff-backup

这里推荐一个工具 rdiff-backup 不只可以做增量备份,而且会保留每次备份的状态,新备份和上一次备份的差别,可以轻松回到之前的某个版本。唯一的要求就是,本地服务器和远端服务器需要安装统一版本的 rdiff-backup 。另外还有2款工具 duplicity `Rsnapshot 也可以做类似工作,但方法不一样,占用的磁盘空间也不一样,具体可查看原文链接中的比较。

具体的 rdiff-backup 安装和使用如下 (之前写的是英文,内容比较简单,就不再翻译了):

Install rdiff-backup at both local and remote computers

  • Install requirements

    #install for ubuntu, debian
    sudo apt-get install python-dev librsync-dev
    #self compile
    #downlaod rsync-dev from https://sourceforge.net/project/showfiles.php?group_id=56125
    tar xvzf librsync-0.9.7.tar.gz
    export CFLAGS="$CFLAGS -fPIC"
    ./configure --prefix=/home/user/rsync --with-pic
    make
    make install
  • Install rdiff-backup

    #See Reference part for download link
    # http://www.nongnu.org/rdiff-backup/ 
    python setup.py install --prefix=/home/user/rdiff-backup
    #If you complied rsync-dev yourself, please specify the location of rsync-dev
    python setup.py --librsync-dir=/home/user/rsync install --     prefix=/home/user/rdiff-backup
  • Add exeutable files and python modules to environmental variables

    #Add the following words into .bashrc or .bash_profile or any other config files
    export PATH=${PATH}:/home/user/rdiff-backup/bin
    export PYTHONPATH=${PYTHONPATH}:/home/user/rdiff-backup/lib/python2.x/site-packages
    #pay attention to the x in python2.x of above line which can be 6 or 7 depending on 
    #the Python version used.
  • Test environmental variable when executing commands through ssh

    ssh user@host 'echo ${PATH}' #When I run this command in my local computer, 
                                 #I found only system environmetal variable is used 
                                 #and none of my self-defined environmetal variable is used.
    #Then, I modified the following lines in file 'SetConnections.py' in 
    #/home/user/rdiff-backup/lib/python2.x/site-packages/rdiff_backup
    #to set environmental explicitly when login.
    #pay attention to the single quote used inside double quote
    __cmd_schema = "ssh -C %s 'source ~/.bash_profile; rdiff-backup --server'"
    __cmd_schema_no_compress = "ssh %s 'source ~/.bash_profile; rdiff-backup --server'"
    #choose the one contains environmental variable for rdiff-backup from .bash_profile and .bashrc.

Use rdiff-backup

  • Start backup

    • rdiff-backup --no-compression --print-statistics user@host::/home/user/source_dir destination_dir

    • If the destination_dir exists, please add --force like rdiff-backup --no-compression --force --print-statistics user@host::/home/user/source_dir destination_dir . All things in original destination_dir will be depleted.

    • If you want to exclude or include special files or dirs please specify like --exclude '**trash' or --include /home/user/source_dir/important .

  • Timely backup your data

    • Add the above command into crontab (hit 'crontab -e' in terminal to open crontab) in the format like 5   22  */1    *   *   command which means executing the command at 22:05 everyday.

  • Restore data

    • Restore the latest data by running rdiff-backup -r now destination_dir user@host::/home/user/source_dir.restore . Add --force if you want to restore to source_dir .

    • Restore files 10 days ago by running rdiff-backup -r 10D destination_dir user@host::/home/user/source_dir.restore . Other acceptable time formats include 5m4s (5 minutes 4 seconds) and 2014-01-01 (January 1st, 2014).

    • Restore files from an increment file by running rdiff-backup destination_dir/rdiff-backup-data/increments/server_add.2014-02-21T09:22:45+08:00.missing user@host::/home/user/source_dir.restore/server_add . Increment files are stored in destination_dir/rdiff-backup-data/increments/server_add.2014-02-21T09:22:45+08:00.missing .

  • Remove older records to save space

    • Deletes all information concerning file versions which have not been current for 2 weeks by running rdiff-backup --remove-older-than 2W --force destination_dir . Note that an existing file which has not changed for a year will still be preserved. But a file which was deleted 15 days ago can not be restored after this command. Normally one should use --force since it is used to delete multiple increments at the same time which --remove-older-than refuses to do by default.

    • Only keeps the last n rdiff-backup sessions by running rdiff-backup --remove-older-than 20B --force destination_dir .

  • Statistics

    • Lists increments in given golder by rdiff-backup --list-increments destination_dir/ .

    • Lists of files changed in last 5 days by rdiff-backup --list-changed-since 5D destination_dir/ .

    • Compare the difference between source and bak by rdiff-backup --compare user@host::source-dir destination_dir

    • Compare the sifference between source and bak (as it was two weeks ago) by rdiff-backup --compare-at-time 2W user@host::source-dir destination_dir .

A complete script (automatically sync using crontab )

#!/bin/bash

export PYTHONPATH=${PYTHONPATH}:/soft/rdiff_backup/lib/python2.7/site-packages/

rdiff-backup --no-compression -v5 --exclude '**trash' user@server::source/ bak_dir/

ret=$?
if test $ret -ne 0; then
    echo "Wrong in bak" | mutt -s "Wrong in bak" [email protected]
else
    echo "Right in bak" | mutt -s "Right in bak" [email protected]
fi

echo "Finish rdiff-backup $0 ---`date`---"  >>bak.log 2>&1

echo "`rdiff-backup --exclude '**trash' --compare-at-time 1D user@server::source/ bak_dir/`" | mutt -s "Lists of baked files" [email protected]

更多

程序学习心得

生物信息之程序学习

Linux 学习

Linux学习-文件和目录

Linux学习-文件操作

Linux学习 - 文件内容操作(1)

Linux学习 - 管道、标准输入输出

Linux学习-常见错误和快捷操作

Linux学习-环境变量和可执行属性







请到「今天看啥」查看全文