Motivation

I have a server that collects some data, and I want to do some offline processing with that in my local computer, and then upload the computation results to ’enhance’ the data in the server.

The data is not so small, but also not too big (in the order of tens of megs), and I am currently running an instance in Google Cloud (with free credits).

The tool is not for production use, so the first approach I thought was to make something dirty, like PUTing the file into cloud storage, and create a lambda to perform the operation, but at this moment it only supports JavaScript (Node.js).

I could set up notifications to build a pipeline using Google Cloud Pub-Sub service, but that is too much work for the job.

So, I decide to go with a simple HTTP api (I could call it REST), an is a good opportunity to play, and know a little bit more about Python’s AIOHttp library.

Minimal server

At first sight, something that catches my attention is the fact that to declare a route, the function has the HTTP verb to use:

This is minimal server script test_server.py:

from aiohttp import web

def handler_func(request):
    return web.json_response({'bar': 2})

def entry_func(args):
    app = web.Application()
    app.router.add_get('/path', handler_func)
    return app

we can use the more generic method, and use * as HTTP verb:

app.router.add_route('*', '/path', handler_func)

Run the server from command line:

python -m aiohttp.web -H 0.0.0.0 -P 8080 test_server:run_service [some_extra_args] ...

The list of some_extra_args will be passed to the entry_func so, you can get configuration parameters from the commandline.

Faster iteration

We can run the server easily, but a way to ‘hot reload’ the code would be nice (specially when toying with it). So we have aiohttp-devtools package, that gives us exactly that.

pip install aiohttp-devtools

We can run the auto-reload server with the adev script installed with the package. But the entry function differs from the one used for aiohttp.web: it does not allow for extra arguments. In this case, we can get the arguments from the sys module, but taking into account, that it wont strip the adev params.

def app_factory():
    return entry_func(sys.argv)

And we launch the dev server with:

adev runserver test_server.py --app-factory app_factory

(There is also another option: using Gunicorn, with the ```–reload`` flag.)

Sending POST data

I already have a GET endpoint to retrieve the information, but I need to send the computed data to the server.

Given the size of the JSON we are posting, we can directly read all it into memory :

async def update_tile(request):
    print('received request')
    content = await request.content.read()
    content = str(content, 'utf-8')

And to test it works as expected, I use the curl command:

curl -X POST -d @temp/deleteme.json http://localhost:8080/update_tile

Going the dirty way

For each request we should create a db session / connection that could be rolled back like in most frameworks do (or, better said, provide a connection from a DB connection pool). However, given this is only an small tool, I’m going the dirty way and I will share a single connection to the db that I would reuse for each request :/.

We can store “global vars” (yeah! that’s ugly), into the app directly, like a dictionary:

    app['session'] = create_session()

(Here we should put a reference to a function that creates the session instead of directly put the created session).

The same can be done for a request. We can hold that for that ‘context’ in using the dictionary syntax. So the clean way to do it would be :

def setup_app(args):
    db_session_factory = create_db_session_factory_from_args(args)
    app = web.Application()
    app['db_session_factory'] = db_session_factory

def handler_func(request):
    db_session_factory = request.app['db_session_factory']
    db_session = db_session_factory()
    request['db_session'] = db_session
    call_whatever_function(request)

Actually the handler function should be more like a decorator to wrap all other handler functions, this way we would provide DB access per request. But, for that we have aiohttp web Middlewares

Deploying the app

We can run the server from the command line, in production, under supervisord, but it is better to follor the official documentation about aiohttp deployment

Testing on a local environment

Creating a VMs

We can actually use Vagrant to spin up a new VM, however, I have a reference VM that I clone to create new instances, that already has my public key for the root user to login through ssh.

To create a new VM I use this script:


export SRCNAME=debian9base
export SRCIP=192.168.56.180
export DSTIP=192.168.56.70

if [ $# -gt 0 ]
then
    export DSTNAME=$1
fi

if [ $# -gt 1 ]
then
    export DSTIP=$2
fi

echo "Creating $DSTNAME from $SRCNAME at $DSTIP"

VBoxManage clonevm $SRCNAME --mode machine --name $DSTNAME --register
VBoxManage startvm --type headless $DSTNAME

sleep 10

ssh -T -l root $SRCIP bash -c "'
echo $DSTNAME | cat > /etc/hostname
sed s/$SRCIP/$DSTIP/ /etc/network/interfaces > tst
cp tst /etc/network/interfaces
rm tst
shutdown -h now
'"

echo "Created VM $1 at $2"

What is doing is:

  • cloning a virtual machine snapshot: VBoxManage clonevm $SRCNAME --mode machine --name $DSTNAME --groups /AIOHTTP --register

  • then we start the VM: VBoxManage startvm --type headless $DSTNAME

  • since it is a clone of the previous one, we must log in and change its hostname, and static IP. We could create an Ansible Playbook, but it is too much job for this simple ‘setup’ of our clean install machine

  • to delete the VMs: VBoxManage unregistervm AIOHTTPServer --delete

Then I add a friendly name to the /etc/hosts file

192.168.56.90   server.aiohttp.lh
192.168.56.91   client.aiohttp.lh

Deploying a client process

Now I only need to create my client process, that fetches the information, performs the computation and then reports back the results.

I could use a cron process to run every X time the script, to do the job, but I am going to run a process that keeps asking the server for tasks to do, that will run in a raspberry pi.

References