专栏名称: 狗厂
目录
相关文章推荐
51好读  ›  专栏  ›  狗厂

【英】使用Go备份Github和Gitlab

狗厂  · 掘金  ·  · 2018-06-06 03:44

正文

Want to learn Golang and build something useful? Learn how to write a tool to back up your GitHub and GitLab repositories.

GitHub and GitLab are two popular Git repository hosting services that are used to host and manage open-source projects. They also have become an easy way for content creators to be able to invite others to share and collaborate without needing to have their own infrastructure setup.

Using hosted services that you don't manage yourself, however, comes with a downside. Systems fail, services go down and disks crash. Content hosted on remote services can simply vanish. Wouldn't it be nice if you could have an easy way to back up your git repositories periodically into a place you control?

If you follow along with this article, you will write a Golang program to back up git repositories from GitHub and GitLab (including custom GitLab installations). Being familiar with Golang basics will be helpful, but not required. Let's get started!

Hello Golang

The latest stable release of Golang at the time of this writing is 1.8. The package name is usually golang, but if your Linux distro doesn't have this release, you can download the Golang compiler and other tools for Linux. Once downloaded, extract it to /usr/local:


$ sudo tar -C /usr/local -xzf <filename-from-above>
$ export PATH=$PATH:/usr/local/go/bin

Opening a new terminal and typing $ go version should show the following:


$ go version
go version go1.8 linux/amd64

Let's write your first program. Listing 1 shows a program that expects a -name flag (or argument) when run and prints a greeting using the specified name. Compile and run the program as follows:


$ go build listing1.go
$ ./listing1 -name "Amit"
Hello Amit

$ ./listing1
./listing1
2017/02/18 22:48:25 Please specify your name using -name
$ echo $?
1

If you don't specify the -name argument, it exits printing a message with a non-zero exit code. You can combine both compiling and running the program using go run :


$ go run listing1.go -name Amit
2017/03/04 23:08:11 Hello Amit

Listing 1. Example Program listing1.go


package main

import (
    "flag"
    "log"
)

func main() {
    name := flag.String("name", "", "Your Name")
    flag.Parse()

    if len(*name) != 0 {
            log.Printf("Hello %s", *name)
    } else {
            log.Fatal("Please specify your name using -name")
    }
}

The first line in the program declares the package for the program. The main package is special, and any executable Go program must live in the main package. Next, the program imports two packages from the Golang standard library using the import statement:


import (
        "flag"
        "log"
)

The "flag" package is used to handle command-line arguments to programs, and the "log" package is used for logging.

Next, the program defines the main() function where the program execution starts:


func main() {
    name := flag.String("name", "", "Your Name")
    flag.Parse()

    if len(*name) != 0 {
        log.Printf("Hello %s", *name)
    } else {
        log.Fatal("Please specify your name using -name")
    }
}

Unlike other functions you'll write, the main function doesn't return anything nor does it take any arguments. The first statement in the main() function above defines a string flag, "name" , with a default value of an empty string and "Your Name" as the help message. The return value of the function is a string pointer stored in the variable, name . The := is a shorthand notation of declaring a variable where its type is inferred from the value being assigned to it. In this case, it is of type *string —a reference or pointer to a string value.

The Parse() function parses the flags and makes the specified flag values available via the returned pointer. If a value has been provided to the "-name" flag when executing the program, the value will be stored in "name" and is accessible via *name (recall that name is a string pointer). Hence, you can check whether the length of the string referred to via name is non-zero, and if so, print a greeting via the Printf() function of the log package. If, however, no value was specified, you use the Fatal() function to print a message. The Fatal() function prints the specified message and terminates the program execution.

Structures, Slices and Maps

The program shown in Listing 2 demonstrates the following different things:

  • Defining a struct data type.
  • Creating a map.
  • Creating a slice and iterating over it.
  • Defining a user-defined function.

Listing 2. Structures, Slices and Maps Example


package main


import (
    "log"
)

type Repository struct {
    GitURL string
    Name   string
}

func getRepo(id int) Repository {
    repos := map[int]Repository{
        1: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
         ↪Name: "gitbackup"},
        2: Repository{GitURL: "ssh://github.com/amitsaha/lj_gitbackup",
         ↪Name: "lj_gitbackup"},
    }

    return repos[id]
}

func backUp(r *Repository) {
    log.Printf("Backing up %s\n", r.Name)
}

func main() {
    var repositories []Repository
    repositories = append(repositories, getRepo(1))
    repositories = append(repositories, getRepo(2))
    repositories = append(repositories, getRepo(3))

    for _, r := range repositories {
            if (Repository{}) != r {
                    backUp(&r)
            }
    }
}

At the beginning, you define a new struct data type Repository as follows:


type Repository struct {
    GitURL string
    Name   string
}

The structure Repository has two members: GitURL and Name , both of type string . You can define a variable of this structure type using r := Repository{"git+ssh://git.mydomain.com/myrepo", "myrepo"} . You can choose to leave one or both members out when defining a structure variable. For example, you can leave the GitURL unset using r := Repository{Name: "myrepo"} , or you even can leave both out. When you leave a member unset, the value defaults to the zero value for that type—0 for int, empty string for string type.

Next, you define a function, getRepo , which takes an integer as argument and returns a value of type Repository :


func getRepo(id int) Repository {
    repos := map[int]Repository{
        1: Repository{GitURL: "git+ssh://github.com/amitsaha/gitbackup",
 ↪Name: "gitbackup"},
        2: Repository{GitURL:
 ↪"git+ssh://github.com/amitsaha/lj_gitbackup", Name: "lj_gitbackup"},
    }

    return repos[id]
}

In the getRepo() function, you create a map or a hash table of key-value pairs—the key being an integer and a value of type Repository . The map is initialized with two key-value pairs.

The function returns the Repository , which corresponds to the specified integer. If a specified key is not found in a map, a zero value of the value's type is returned. In this case, if an integer other than 1 or 2 is supplied, a value of type Repository is returned with both the members set to empty strings.

Next, you define a function backUp() , which accepts a pointer to a variable of type Repository as an argument and prints the Name of the repository. In the final program, this function actually will create a backup of a repository.

Finally, there is the main() function:


func main() {
    var repositories []Repository
    repositories = append(repositories, getRepo(1))
    repositories = append(repositories, getRepo(2))
    repositories = append(repositories, getRepo(3))

    for _, r := range repositories {
        if (Repository{}) != r {
            backUp(&r)
        }
    }
}

In the first statement, you create a slice, repositories , that will store elements of type Repository . A slice in Golang is an dynamically sized array—similar to a list in Python. You then call the getRepo() function to obtain a repository corresponding to the key 1 and store the returned value in the repositories slice using the append() function. You do the same in the next two statements. When you call the getRepo() function with the key, 3, you get back an empty value of type Repository .

You then use a for loop with the range clause to iterate over the elements of the slice, repositories . The index of the element in a slice is stored in the _ variable, and the element itself is referred to via the r variable. You check if the element is not an empty Repository variable, and if it isn't, you call the backUp() function, passing the address of the element. It is worth mentioning that there is no reason to pass the element's address; you could have passed the element's value itself. However, passing by address is a good practice when a structure has a large number of members.

When you build and run this program, you'll see the following output:


$ go run listing2.go
2017/02/19 19:44:32 Backing up gitbackup
2017/02/19 19:44:32 Backing up lj_gitbackup

Goroutines and Channels

Consider the previous program (Listing 2). You call the backUp() function with every repo in the repositories serially. When you actually create a backup of a large number of repositories, doing them serially can be slow. Since each repository backup is independent of any other, they can be run in parallel. Golang makes it really easy to have multiple simultaneous units of execution in a program using goroutines.

A goroutine is what other programming languages refer to as lightweight threads or green threads. By default, a Golang program is said to be executing in a main goroutine, which can spawn other goroutines. A main goroutine can wait for all the spawned goroutines to finish before finishing up using a variable of WaitGroup type, as you'll see next.

Listing 3 modifies the previous program such that the backUp() function is called in a goroutine. The main() function declares a variable, wg of type WaitGroup defined in the sync package, and then sets up a deferred call to the Wait() function of this variable. The defer statement is used to execute any function just before the current function returns. Thus, you ensure that you wait for all the goroutines to finish before exiting the program.

Listing 3. Goroutine Example


package main

import (
    "log"
    "sync"
)

type Repository struct {
    GitURL string
    Name   string
}

func getRepo(id int) Repository {

    repos := map[int]Repository{
            1: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            2: Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
    }

    return repos[id]
}

func backUp(r *Repository, wg *sync.WaitGroup) {
    defer wg.Done()
    log.Printf("Backing up %s\n", r.Name)
}

func main() {
    var wg sync.WaitGroup
    defer wg.Wait()

    var repositories []Repository
    repositories = append(repositories, getRepo(1))
    repositories = append(repositories, getRepo(2))
    repositories = append(repositories, getRepo(3))

    for _, r := range repositories {
            if (Repository{}) != r {
                    wg.Add(1)
                    go func(r Repository) {
                            backUp(&r, &wg)
                    }(r)
            }
    }
}

The other primary change in the main() function is how you call the backUp() function. Instead of calling this function directly, you call it in a new goroutine as follows:


wg.Add(1)
go func(r Repository) {
        backUp(&r, &wg)
}(r)

You call the Add() function with an argument 1 to indicate that you'll be creating a new goroutine that you want to wait for before you exit. Then, you define an anonymous function taking an argument, r of type Repository , which calls the function backUp() with an additional argument, a reference to the variable, wg —the WaitGroup variable declared earlier.

Consider the scenario where you have a large number of elements in your repositories list—a very realistic scenario for this backup tool. Spawning a goroutine for each element in the repository can easily lead to having an uncontrolled number of goroutines running concurrently. This can lead to the program hitting per-process memory and file-descriptor limits imposed by the operating system.

Thus, you would want to regulate the maximum number of goroutines spawned by the program and spawn a new goroutine only when the ones executing have finished. Channels in Golang allow you to achieve this and other synchronization operations among goroutines. Listing 4 shows how you can regulate the maximum number of goroutines spawned.

Listing 4. Channels Example


package main

import (
    "log"
    "sync"
)

type Repository struct {
    GitURL string
    Name   string
}

func getRepo(id int) Repository {

    repos := map[int]Repository{
            1:  Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            2:  Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
            3:  Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            4:  Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
            5:  Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            6:  Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
            7:  Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            8:  Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
            9:  Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
             ↪Name: "gitbackup"},
            10: Repository{GitURL: "ssh://github.com/amitsaha/
            ↪lj_gitbackup", Name: "lj_gitbackup"},
    }

    return repos[id]
}

func backUp(r *Repository, wg *sync.WaitGroup) {
    defer wg.Done()
    log.Printf("Backing up %s\n", r.Name)
}

func main() {
    var wg sync.WaitGroup
    defer wg.Wait()

    var repositories []Repository
    repositories = append(repositories, getRepo(1))
    repositories = append(repositories, getRepo(2))
    repositories = append(repositories, getRepo(3))
    repositories = append(repositories, getRepo(4))
    repositories = append(repositories, getRepo(5))
    repositories = append(repositories, getRepo(6))
    repositories = append(repositories, getRepo(7))
    repositories = append(repositories, getRepo(8))
    repositories = append(repositories, getRepo(9))
    repositories = append(repositories, getRepo(10))

    // Create a channel of capacity 5
    tokens := make(chan bool, 5)

    for _, r := range repositories {
            if (Repository{}) != r {
                    wg.Add(1)
                    // Get a token
                    tokens <- true
                    go func(r Repository) {
                            backUp(&r, &wg)
                            // release the token
                            <-tokens
                    }(r)
            }
    }
}

You create a channel of capacity 5 and use it to implement a token system. The channel is created using make :


tokens := make(chan bool, 5)

The above statement creates a "buffered channel"—a channel with a capacity of 5 and that can store only values of type "bool". If a buffered channel is full, writes to it will block, and if a channel is empty, reads from it will block. This property allows you to implement your token system.







请到「今天看啥」查看全文


推荐文章
中科院物理所  ·  五次方程里的英雄泪
8 年前
功夫财经  ·  王福重:调控能让房价下跌吗?
8 年前