Motivation

There has been more than a year since I was introduced to Go by the lead architect at the company I was working for.

I really liked the language: strongly typed, with its very decoupled interface declarations (or protocols, whatever you prefer to call them), and specially the simple concurrency model (deeply rooted in the language with goroutines and channels).

After that I did not give it more attention. But recently I’ve decided that I want to be proficient with it.

Instead of starting my own project, I decided to try to contribute to an existing one: Krakend. It is an API Gateway. (You can look at the Krakend features in its offical webpage).

So, this is the worklog of a small contribution: Unlimit the whitelist depth

Introduction to the issue

One main concern of the API Gateway is to be fast, and support a lot of connections. That was the reason why the white listing (see the response manipulation documentation ) was only supporting two level depth element selection (e.g: foo.bar, but no foo.bar.baz).

Adding more depth means more processing, and traversing nodes to check for the field that we want to pick. But even knowing that having deep fields slows down the data processing, the feature could be useful.

So, if implemented, it should not perform worse than before for non deep cases.

How are response data handled internally in krakend ?

The data in krakend is represented as tree of hashmaps, mapping a string to “anything” ( map[string]interface{})

The go built-in map type is implemented using a hash table.

But, we must take into account that:

The specifics of that data structure are an implementation detail of the runtime and are not specified by the language itself.

Initial approach

The first implementation involved creating a new dictionary, and adding there the new fields that matched the whitelist (creating all the intermediate nodes for the data).

That worked, however, it allocated a lot of new maps (one for each new node). This is a problem, because it is slower to allocate memory but also because we are adding memory pressure and more work to do to the garbage collector when that memory is not needed anymore.

The good approach is to actually delete the nodes that are not part of one found whitelisted element.

Recursive implementation

The initial implementation involves the use of recursive calls. I do not really know at this point how the stack frame is allocated (I’ve read somewhere that the stack is actually allocated in the heap), and how expensive is to create a stack frame for each call.

After the initial implementation these are the numbers for the benchmarks:

dhontecillas@doozer:~/prg/tools/gopackages/src/krakend$ go test -bench=deepWhite -benchtime=3s -run xxx -v krakend/proxy
goos: linux
goarch: amd64
pkg: krakend/proxy
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:1,extraFields:2,extraSiblings:0-8            100000000               56.6 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:3,extraFields:6,extraSiblings:0-8            100000000               55.9 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:7,extraFields:14,extraSiblings:0-8           100000000               56.8 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:1,extraFields:3,extraSiblings:1-8            30000000               150 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:3,extraFields:7,extraSiblings:1-8            20000000               267 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:7,extraFields:15,extraSiblings:1-8           10000000               485 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:1,extraFields:4,extraSiblings:2-8            20000000               215 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:3,extraFields:8,extraSiblings:2-8            10000000               436 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:7,extraFields:16,extraSiblings:2-8            5000000               823 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:1,extraFields:7,extraSiblings:5-8            10000000               466 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:3,extraFields:11,extraSiblings:5-8            5000000               988 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:7,extraFields:19,extraSiblings:5-8            2000000              1932 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:1,extraFields:12,extraSiblings:10-8          5000000              1090 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:3,extraFields:16,extraSiblings:10-8          2000000              2743 ns/op              48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:7,extraFields:24,extraSiblings:10-8          1000000              5707 ns/op              48 B/op          1 allocs/op
PASS
ok      krakend/proxy   84.534s

Recursive implementation optimization try:

Another implementation I tried did the following:

Two consecutive iterations:

  • One that stores the brothers at each node to be saved (using a list)
  • Another one, that in case there is at least a brother to be saved, deletes the others. If there is no brother to be saved, we simply delegate the job of deleting the full node to the parent.

Well, the description is a little vague, but is not important as the result was worse than the previous implementation:

Benchmark results with new implementation:

dhontecillas@doozer:~/prg/tools/gopackages/src/krakend$ go test -bench=deepWhite -benchtime=3s -run xxx -v krakend/proxy
goos: linux
goarch: amd64
pkg: krakend/proxy
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:1,extraFields:2,extraSiblings:0-8            100000000               64.5 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:3,extraFields:6,extraSiblings:0-8            100000000               58.2 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:7,extraFields:14,extraSiblings:0-8           100000000               61.8 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:1,extraFields:3,extraSiblings:1-8            20000000               221 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:3,extraFields:7,extraSiblings:1-8            20000000               329 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:7,extraFields:15,extraSiblings:1-8           10000000               569 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:1,extraFields:4,extraSiblings:2-8            20000000               322 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:3,extraFields:8,extraSiblings:2-8            10000000               536 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:7,extraFields:16,extraSiblings:2-8            5000000               963 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:1,extraFields:7,extraSiblings:5-8            10000000               676 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:3,extraFields:11,extraSiblings:5-8            5000000              1147 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:7,extraFields:19,extraSiblings:5-8            2000000              2165 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:1,extraFields:12,extraSiblings:10-8          3000000              1617 ns/op             208 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:3,extraFields:16,extraSiblings:10-8          1000000              3245 ns/op             208 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:7,extraFields:24,extraSiblings:10-8           500000              6183 ns/op             208 B/op          2 allocs/op
PASS
ok      krakend/proxy   88.827s

Also, it required two allocs per call instead of one.

Recursive implementation optimization (take two):

I could save the wlKeys that should be deleted (instead of the ones that should be maintained), and only used them in case we find we need to maintain any of the whitelisted fields. That improved a little bit on the previous benchmark, but still did not beat the initial recursive version.

dhontecillas@doozer:~/prg/tools/gopackages/src/krakend$ go test -bench=deepWhite -benchtime=3s -run xxx -v krakend/proxy
goos: linux
goarch: amd64
pkg: krakend/proxy
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:1,extraFields:2,extraSiblings:0-8            100000000               68.2 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:3,extraFields:6,extraSiblings:0-8            100000000               70.4 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:0,depth:7,extraFields:14,extraSiblings:0-8           100000000               56.9 ns/op            48 B/op          1 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:1,extraFields:3,extraSiblings:1-8            20000000               223 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:3,extraFields:7,extraSiblings:1-8            20000000               347 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:1,depth:7,extraFields:15,extraSiblings:1-8           10000000               571 ns/op              64 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:1,extraFields:4,extraSiblings:2-8            20000000               331 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:3,extraFields:8,extraSiblings:2-8            10000000               539 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:2,depth:7,extraFields:16,extraSiblings:2-8            5000000               962 ns/op              80 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:1,extraFields:7,extraSiblings:5-8            10000000               660 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:3,extraFields:11,extraSiblings:5-8            5000000              1123 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:5,depth:7,extraFields:19,extraSiblings:5-8            2000000              2118 ns/op             128 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:1,extraFields:12,extraSiblings:10-8          3000000              1409 ns/op             208 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:3,extraFields:16,extraSiblings:10-8          1000000              3013 ns/op             208 B/op          2 allocs/op
BenchmarkEntityFormatter_deepWhitelistingFilter/numTargets:10,depth:7,extraFields:24,extraSiblings:10-8          1000000              6021 ns/op             208 B/op          2 allocs/op
PASS
ok      krakend/proxy   91.893s

Perhap this approach would work better if we have more cases when we do not find any key from the whitelist (that I think would not happen too often).

Go back to first recursive approach

So, I decide to clean up the first implementation and go with it.

Converting recursive calls to iterative

A common undergraduate exercise in computer science is to convert any recursive algorithm into an iterative one. Could it be applied here, to avoid the recursive call cost ?

Since we know in advance the maximum depth of the elements in the whitelist, we could construct (a single alloc ?) a list of references to ranges, (as long as we can stop iterating a range and start another one, and when finished with that other one continue with the first one).

The range is explained in the “For statements with range clause” section of the spec. What I am looking for is something that behaves like C++ iterator, so we can save it for later continue iterating the range.

We can look at golang compiler source code to find how range is implemented for maps:

In this google groups thread, we find someone asking for the same feature, puting as an example what could be a path finding algorithm.

In the responses sections there are some alternatives to save the state of the iteration by using function calls (that is what we are actually trying to avoid)

The approach would be to store a stack of keys to save, and then delete all keys that are not in there. So, first, instead of iterating over input dictionary keys, we iterate over whitelist candidates at the root level, and we store a list of pointers to strings in the white list.

And then perform the deletion of the ones that are not in the dontDelete slice, for that level.

But this is far more complex that the recursive implementation.

Contributing is a way to learn

Even I am not used to Golang, I’ve been able to modify the code and run the tests to create the PR.

Sure I spent more time than required for this feature enhancement, but I’ve been learning through the path.

From my experience, contributing to a project written in a language you do not know is a very good way to learn.

References

Hashmap Analysis

Golang Hashmap source code

Go Range loop internals