Database Testing Patterns in Go

At clypd, we place a lot of value on testing as a mechanism for ensuring code correctness and our approach to testing constantly evolves. We’ve spent the past few months learning and creating a new approach to mocking and testing functions that need to access data, whether from a database, a file, or over the network.

Growing pains

When we wrote our first lines of Go code in January 2014, our tests were exclusively unit tests. We wrote our processes so that we could ingest all the required data at the very beginning, operate on it, and output the result. The inputs and outputs were everything from files to databases to network calls – but it didn’t matter, because we could test that separately from the actual logic. Most of our top level functions looked something like this:

func main() {
database := Open("dbname")
data := GetData(database)
results := DoSomethingWith(data)
CommitResults(database, result)
}

Eventually the size of our data outgrew this pattern — we couldn’t hold all the data we needed to operate in memory, and so we needed to gather new data and output results during the execution of the program. We took the simplest approach (which is also often the best) to solve this problem and attempted to deal with subsets of data at a time.  The function above turned into:

func main() {
database := Open("dbname")
ids := GetIDs(database)
for _, id := range IDs {
data := GetDataFor(database, id)
result := DoSomethingWith(data)
CommitResult(database, result)
}
}

We’d retrieve a slice of all the primary keys we wanted to operate on from the database at the start of execution. We’d then take a subset of those keys and retrieve the associated information, consume it, and repeat. Database calls moved lower and lower into the application and consequently, our test coverage dropped.  Eventually, we moved to correct this.

Mocking the database

Our first approach was very simple: mock out the database. We wrapped our database library, with various methods like Select() and Exec() that accepted queries as arguments. These methods formed an interface for our database. We created an alternate implementation with a map for a backend that would match SQL queries up to responses to help with unit testing.

It solved our immediate problem and allowed us to provide data to our programs. By writing tests for functions that accessed the database, our coverage improved.  However, after a short time using this approach, we  grew increasingly dissatisfied. The tests just weren’t very rigorous. Code coverage isn’t necessarily a good measurement for test quality.

Some of the weaknesses of this approach worried us. First, we wanted some degree of input validation. While the ability to provide data to our programs improved, we couldn’t validate the correctness of the output (what was inserted, updated, or deleted in the database). We also had some difficulty exactly replicating the behaviour of our database package.

Our database package uses reflection for much of its API, so our test solution had to make use of reflection in much the same way. We are fairly inexperienced with reflection, and tend to avoid it because of the additional complexity associated with generic solutions. There are exceptions, times when reflection has drastically reduced code duplication by enabling a generic solution.

We had some ideas to improve this implementation, but weren’t very excited about them. There was the possibility of doing input validation with some ugly type assertions. While not inherently evil, type assertions add some burden of knowledge to the programmer and generally reduce the effectiveness of writing in a language with strict typing. To solve the bugs relating to an inexact implementation of the database library, we could limit the power of our database wrapper to the features. But if many on the team were already unhappy with the design, forcing additional features is not the right approach.

Interfaces

One of the interesting things about Go is that there is usually an idiomatic solution to a problem that is far cleaner and easier to implement than any other solution. Clearly, if we were already struggling this much, then this solution was not the solution. It was time to  scrap it and try again.

Our second approach relied heavily on Go’s interfaces. We chose an approach based on interfaces because, at the very least, we’d learn more about interfaces and become more comfortable with them. So we picked an isolated area of the codebase, a place that could be a prototype for the pattern, and came up with a structure that encapsulated our database calls. The implementation was a series of one-line functions that passed their data straight through to our database package. The example above becomes:

type datasource interface {
GetIDs()
GetDataFor(int)
CommitResult(resultType)
}

type db struct {
database
}

func (db *db) GetIDs() {
GetIDs(database)
}

func (db *db) GetDataFor(id int) {
GetDataFor(database, id)
}

func (db *db) CommitResult(result resultType) {
CommitResult(database, result)
}

func Run(db *db) {
ids := db.GetIDs()
for _, id := range IDs {
data := db.GetDataFor(id)
result := DoSomethingWith(data)
db.CommitResult(result)
}
}

func main() {
database := Open("dbname")
Run(&db{database})
}

We can now write a test for the function Run(). The test implementation of datasource is a structure that had two slices for each function in the interface, one to capture inputs, and one to provide outputs. Each function definition, however, was only two lines – which surprised us.  One line to add the the inputs to a slice and one line to pop the outputs out of a slice. The new design was simpler, with no need to worry about the implementation of the database wrapper, and benefitted from Go’s strict typing. It seemed like a good solution, but how did it scale?

As we rolled out the new pattern, the solution scaled without much pain. Many of the benefits can be attributed to Go’s interfaces. We kept our interfaces function local. Each function that had to access data would have an interface associated with it, and those interfaces are combined as we climbed higher in the code.

The end result was one structure per package that handled all the production logic and several much smaller, more specialized structures for testing purposes. While this new pattern did introduce more test code than the previous approach, the additional boilerplate of setting up test structures was expected. Test code bloat is much less worrying than production code bloat, and the increased clarity is well worth the additional cost. There is some extra production code, but it has the nice side effect of concentrating all of our external calls in one place.

With this new pattern in place, other programmers can easily see, at a glance, all the information that a program will need to run. Also, this approach has the nice side effect of being able to run much of the same code from any data source. We can run the same processes from a CSV and all that’s needed is another implementation of the data interface.

This new pattern suits us, for now. It’s a clean solution that avoids reflection entirely. We don’t have to worry about the implementation details of how we get our data and it allows us to provide and test functions that gather data and output data.

While tests can always be improved, this pattern has now been rolled out to the entire codebase as a means of testing data-access calls and we’ve witnessed the power of Go’s interfaces firsthand.

Leave a Reply