正文
Want to learn Golang and build something useful? Learn how to write a
tool to back up your GitHub and GitLab repositories.
GitHub and GitLab are two popular Git repository hosting services that are used to host and manage open-source projects. They also have become an easy way for content creators to be able to invite others to share and collaborate without needing to
have their own infrastructure setup.
Using hosted services that you don't manage yourself, however, comes with a downside. Systems fail, services go down and disks crash. Content hosted on remote services can simply vanish. Wouldn't it be nice if you could have an easy way to back up
your git repositories periodically into a place you control?
If you follow along with this article, you will write a Golang program to back up git repositories from
GitHub
and
GitLab
(including custom GitLab installations). Being familiar
with Golang basics will be helpful, but not required. Let's get started!
Hello Golang
The latest stable release of Golang at the time of this writing is 1.8. The package name is usually golang, but if your Linux distro doesn't have this release, you can
download the Golang compiler
and other tools
for Linux. Once downloaded, extract it to /usr/local:
$ sudo tar -C /usr/local -xzf <filename-from-above>
$ export PATH=$PATH:/usr/local/go/bin
Opening a new terminal and typing
$ go version
should show the following:
$ go version
go version go1.8 linux/amd64
Let's write your first program. Listing 1 shows a program that expects a
-name
flag (or argument) when run and prints a greeting using the specified name. Compile and run the program as follows:
$ go build listing1.go
$ ./listing1 -name "Amit"
Hello Amit
$ ./listing1
./listing1
2017/02/18 22:48:25 Please specify your name using -name
$ echo $?
1
If you don't specify the
-name
argument, it exits printing a message with a non-zero exit code. You can combine both compiling and running the program using
go run
:
$ go run listing1.go -name Amit
2017/03/04 23:08:11 Hello Amit
Listing 1. Example Program listing1.go
package main
import (
"flag"
"log"
)
func main() {
name := flag.String("name", "", "Your Name")
flag.Parse()
if len(*name) != 0 {
log.Printf("Hello %s", *name)
} else {
log.Fatal("Please specify your name using -name")
}
}
The first line in the program declares the package for the program. The
main
package is special, and any executable Go program must live in the
main
package. Next, the program imports two packages from the Golang standard
library using the
import
statement:
import (
"flag"
"log"
)
The
"flag"
package is used to handle command-line arguments to programs, and the
"log"
package is used for logging.
Next, the program defines the
main()
function where the program execution starts:
func main() {
name := flag.String("name", "", "Your Name")
flag.Parse()
if len(*name) != 0 {
log.Printf("Hello %s", *name)
} else {
log.Fatal("Please specify your name using -name")
}
}
Unlike other functions you'll write, the
main
function doesn't return anything nor does it take any arguments. The first statement in the
main()
function above defines a string flag,
"name"
, with a default value
of an empty string and
"Your Name"
as the help message. The return value of the function is a string pointer stored in the variable,
name
. The
:=
is a shorthand notation of declaring a variable where its
type is inferred from the value being assigned to it. In this case, it is of type
*string
—a reference or pointer to a string value.
The
Parse()
function parses the flags and makes the specified flag values available via the returned pointer. If a value has been provided to the
"-name"
flag when executing the program, the value will be stored in
"name"
and is accessible via
*name
(recall that
name
is a string pointer). Hence, you can check whether the length of the string referred to via
name
is non-zero, and if so, print a greeting via the
Printf()
function of the log package. If, however, no value was specified, you use the
Fatal()
function to print a message. The
Fatal()
function prints the specified message and terminates the program execution.
Structures, Slices and Maps
The program shown in Listing 2 demonstrates the following different things:
-
Defining a struct data type.
-
Creating a map.
-
Creating a slice and iterating over it.
-
Defining a user-defined function.
Listing 2. Structures, Slices and Maps Example
package main
import (
"log"
)
type Repository struct {
GitURL string
Name string
}
func getRepo(id int) Repository {
repos := map[int]Repository{
1: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
2: Repository{GitURL: "ssh://github.com/amitsaha/lj_gitbackup",
↪Name: "lj_gitbackup"},
}
return repos[id]
}
func backUp(r *Repository) {
log.Printf("Backing up %s\n", r.Name)
}
func main() {
var repositories []Repository
repositories = append(repositories, getRepo(1))
repositories = append(repositories, getRepo(2))
repositories = append(repositories, getRepo(3))
for _, r := range repositories {
if (Repository{}) != r {
backUp(&r)
}
}
}
At the beginning, you define a new struct data type
Repository
as follows:
type Repository struct {
GitURL string
Name string
}
The structure
Repository
has two members:
GitURL
and
Name
, both of type
string
. You can define a variable of this structure type using
r
:= Repository{"git+ssh://git.mydomain.com/myrepo", "myrepo"}
. You can choose to leave one or both members out when defining a structure variable. For example, you can leave the
GitURL
unset using
r :=
Repository{Name: "myrepo"}
, or you even can leave both out. When you leave a member unset, the value defaults to the zero value for that type—0 for int, empty string for string type.
Next, you define a function,
getRepo
, which takes an integer as argument and returns a value of type
Repository
:
func getRepo(id int) Repository {
repos := map[int]Repository{
1: Repository{GitURL: "git+ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
2: Repository{GitURL:
↪"git+ssh://github.com/amitsaha/lj_gitbackup", Name: "lj_gitbackup"},
}
return repos[id]
}
In the
getRepo()
function, you create a map or a hash table of key-value pairs—the key being an integer and a value of type
Repository
. The map is initialized with two key-value pairs.
The function returns the
Repository
, which corresponds to the specified integer. If a specified key is not found in a map, a zero value of the value's type is returned. In this case, if an integer other than 1 or 2 is supplied, a value
of type
Repository
is returned with both the members set to empty strings.
Next, you define a function
backUp()
, which accepts a pointer to a variable of type
Repository
as an argument and prints the
Name
of the repository. In the final program, this function actually will create a
backup of a repository.
Finally, there is the
main()
function:
func main() {
var repositories []Repository
repositories = append(repositories, getRepo(1))
repositories = append(repositories, getRepo(2))
repositories = append(repositories, getRepo(3))
for _, r := range repositories {
if (Repository{}) != r {
backUp(&r)
}
}
}
In the first statement, you create a slice,
repositories
, that will store elements of type
Repository
. A slice in Golang is an dynamically sized array—similar to a list in Python. You then call the
getRepo()
function to obtain a repository corresponding to the key 1 and store the returned value in the
repositories
slice using the
append()
function. You do the same in the next two statements. When you call the
getRepo()
function with the key, 3, you get back an empty value of type
Repository
.
You then use a for loop with the
range
clause to iterate over the elements of the slice,
repositories
. The index of the element in a slice is stored in the
_
variable, and the element itself is referred to via
the
r
variable. You check if the element is not an empty
Repository
variable, and if it isn't, you call the
backUp()
function, passing the address of the element. It is worth mentioning that there is no reason
to pass the element's address; you could have passed the element's value itself. However, passing by address is a good practice when a structure has a large number of members.
When you build and run this program, you'll see the following output:
$ go run listing2.go
2017/02/19 19:44:32 Backing up gitbackup
2017/02/19 19:44:32 Backing up lj_gitbackup
Goroutines and Channels
Consider the previous program (Listing 2). You call the
backUp()
function with every repo in the repositories serially. When you actually create a backup of a large number of repositories, doing them serially can be slow. Since each repository
backup is independent of any other, they can be run in parallel. Golang makes it really easy to have multiple simultaneous units of execution in a program using goroutines.
A goroutine is what other programming languages refer to as lightweight threads or green threads. By default, a Golang program is said to be executing in a main goroutine, which can spawn other goroutines. A main goroutine can wait for all the spawned
goroutines to finish before finishing up using a variable of
WaitGroup
type, as you'll see next.
Listing 3 modifies the previous program such that the
backUp()
function is called in a goroutine. The
main()
function declares a variable,
wg
of type
WaitGroup
defined in the sync package, and then
sets up a deferred call to the
Wait()
function of this variable. The
defer
statement is used to execute any function just before the current function returns. Thus, you ensure that you wait for all the goroutines to finish
before exiting the program.
Listing 3. Goroutine Example
package main
import (
"log"
"sync"
)
type Repository struct {
GitURL string
Name string
}
func getRepo(id int) Repository {
repos := map[int]Repository{
1: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
2: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
}
return repos[id]
}
func backUp(r *Repository, wg *sync.WaitGroup) {
defer wg.Done()
log.Printf("Backing up %s\n", r.Name)
}
func main() {
var wg sync.WaitGroup
defer wg.Wait()
var repositories []Repository
repositories = append(repositories, getRepo(1))
repositories = append(repositories, getRepo(2))
repositories = append(repositories, getRepo(3))
for _, r := range repositories {
if (Repository{}) != r {
wg.Add(1)
go func(r Repository) {
backUp(&r, &wg)
}(r)
}
}
}
The other primary change in the
main()
function is how you call the
backUp()
function. Instead of calling this function directly, you call it in a new goroutine as follows:
wg.Add(1)
go func(r Repository) {
backUp(&r, &wg)
}(r)
You call the
Add()
function with an argument 1 to indicate that you'll be creating a new goroutine that you want to wait for before you exit. Then, you define an anonymous function taking an argument,
r
of type
Repository
,
which calls the function
backUp()
with an additional argument, a reference to the variable,
wg
—the
WaitGroup
variable declared earlier.
Consider the scenario where you have a large number of elements in your repositories list—a very realistic scenario for this backup tool. Spawning a goroutine for each element in the repository can easily lead to having an uncontrolled number of goroutines
running concurrently. This can lead to the program hitting per-process memory and file-descriptor limits imposed by the operating system.
Thus, you would want to regulate the maximum number of goroutines spawned by the program and spawn a new goroutine only when the ones executing have finished. Channels in Golang allow you to achieve this and other synchronization operations among
goroutines. Listing 4 shows how you can regulate the maximum number of goroutines spawned.
Listing 4. Channels Example
package main
import (
"log"
"sync"
)
type Repository struct {
GitURL string
Name string
}
func getRepo(id int) Repository {
repos := map[int]Repository{
1: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
2: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
3: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
4: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
5: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
6: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
7: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
8: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
9: Repository{GitURL: "ssh://github.com/amitsaha/gitbackup",
↪Name: "gitbackup"},
10: Repository{GitURL: "ssh://github.com/amitsaha/
↪lj_gitbackup", Name: "lj_gitbackup"},
}
return repos[id]
}
func backUp(r *Repository, wg *sync.WaitGroup) {
defer wg.Done()
log.Printf("Backing up %s\n", r.Name)
}
func main() {
var wg sync.WaitGroup
defer wg.Wait()
var repositories []Repository
repositories = append(repositories, getRepo(1))
repositories = append(repositories, getRepo(2))
repositories = append(repositories, getRepo(3))
repositories = append(repositories, getRepo(4))
repositories = append(repositories, getRepo(5))
repositories = append(repositories, getRepo(6))
repositories = append(repositories, getRepo(7))
repositories = append(repositories, getRepo(8))
repositories = append(repositories, getRepo(9))
repositories = append(repositories, getRepo(10))
// Create a channel of capacity 5
tokens := make(chan bool, 5)
for _, r := range repositories {
if (Repository{}) != r {
wg.Add(1)
// Get a token
tokens <- true
go func(r Repository) {
backUp(&r, &wg)
// release the token
<-tokens
}(r)
}
}
}
You create a channel of capacity 5 and use it to implement a token system. The channel is created using
make
:
tokens := make(chan bool, 5)
The above statement creates a "buffered channel"—a channel with a capacity of 5 and that can store only values of type "bool". If a buffered channel is full, writes to it will block, and if a channel is empty, reads from it will block. This property
allows you to implement your token system.