Changing Primary Key in Django Postgresql Setup

After visiting my own code after 8 months to make a change, i realised that the schema of the table needed an update and will need a change in primary key field. What i had assumed would be unique turns out wasn’t unique after all. So, i needed to put the default “id” field as auto incremental primary key.

Changing the primary key by raw sql in postgresql is simple. The tricky part is how you tell django that you have added a new field and changed the primary key so that it doesn’t try to create a new field on running  “django makemigrations”. This is where “state_operations” comes for help.

By specifying state_operations, you tell django that you have done “stuff” in raw sql that is equivalent to running “x operations” via orm migration.


  • Create an empty migration file: “python makemigrations <app name> –empty -n “change_primary_key”
  • Put following code in the migration’s operations:
    "ALTER TABLE $your table DROP CONSTRAINT $primary key constraint name",
    "ALTER TABLE $your table ALTER COLUMN $old primary key DROP NOT NULL;"
    "ALTER TABLE $your table DROP CONSTRAINT $primary key constraint name;",
    "ALTER TABLE $your table DROP COLUMN id",
    "ALTER TABLE $your table ALTER COLUMN $old primary key SET NOT NULL;",
    "ALTER TABLE $your table ADD PRIMARY KEY ($old primary key)",
    migrations.RemoveIndex(model_name='$model name as string', name='$primary key constraint name'),
    model_name='$model name as string', name='$old primary key',
    field=models.CharField(max_length=100, null=True, blank=True)
    model_name='$model name as string', name='id',
    field=models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
  • The only change that needs to go in the model code for the table is:
    $old_primary_key = models.CharField(max_length=100, null=True, blank=True)
  • Now, if you run makemigrations, you should not find any new migrations created by django for changing primary key.

The best part about postgresql in this is that if you add a auto increment integer primary key, it auto-fills the rows with the keys starting from 1.

Things i have become better at in 2018

It’s about 43 min left for 2018 to end in my timezone and i have no fancy plans.

So, here is an attempt to list down things i have learnt this year. There is no way i can list all the things, if i hadn’t been keeping track (which i haven’t).

  1. Django: I already was working with Django for 6 months in my previous job. Apart from that i had done a couple of projects in college. But, in this job, i think i have in depths with Django. I have also realised that although it makes it easier to ship products faster, it comes at a cost. Ex: The admin interface of django. It’s good for being a reflection of database and performing simple operations. The minute you want to do something even a little complex, it becomes really hard and makes ugly code.
  2. Debugger: Right now, i can’t believe that i wasn’t using the pdb since last 4 years. I used to rely on logging instead. The credit has to go to my Engineering Manager Manish . Writing code and debugging with him has helped me a become a lot better code in 2018.
  3. SQLI have gotten better at writing sql. Earlier, i never needed to think about it much. In all of the web applications i had built, including open source and at previous job, we always used an ORM. This year, we needed optimisations, raw sql queries for Metabase for which i wrote queries.
  4. Python: I have been writing python code since 2014, this is my language of preference. My learning of the standard library has grown this year.
  5. HTML/CSS/Javascript: I have a front end nano degree now ( \o/ ) . It took me a long time but, yes, as of December 2018 and by degree given to me by Udacity, I can totally pronounce (using Joey’s voice when he was marrying Chandler and Monica) myself:  A Full Stack Developer.
  6. Rabbitmq/Celery/Redis: I can say that i had some practice of this. I had already been using and the way i am using right now is same as before. Having said that, i did try to dig deep in Celery and i don’t think celery’s codebase is easy to mess with. (So is Django’s btw)
  7. AWS/EBS/: Towards the latter part of this year, i started using more of AWS console and EBS.

These were the broad categories of things i have gone better at. Actually, there is nothing new topic wise apart from EBS but, there are smaller details within each category that i didn’t know about.

Apart from all the progress that i have listed, there is a thing i haven’t done this year. I haven’t contributed to Open Source as much as i would have liked to. Actually, from the time i started contributing, this year i have contributed the least. I don’t think i have been able to give my 100% everyday which kills me sometimes. I know when i am not at my 100% and i know when i am. If i go by this logic, 2018 has been a crappy year.

Django’s transaction.on_commit

I am talking with Django’s 1.11 in mind.

Django provides a method using which you can execute a particular function when you are sure that a particular entry has made to the database, that’s transaction.on_commit.

There is also Django’s Post save signal. Both have their use cases. The difference between the two is that on_commit is executed when Django finally commits to the database and it’s only useful when you are inside a transaction. While, in case of post save signal, you are not even sure that the object you are currently saving will finally be committed to the database (if it’s inside a transaction block). Post save has it’s use case when you need to know the object which just got changed or got created since, you get boolean on whether it was updated or created and also the changed/created object itself.

How to work with transaction.on_commit?

When the code is running, you can register a function to be called, by calling


Remember, you are trying to pass a function here and not the function’s returned value. What happens here is, Django adds the my_fancy_function_to_be_called_on_commit to a list of functions which it calls when it finally commits the ongoing transaction. There are few small things here:

  1. Since the transactions can be nested, when does django finally execute these functions?Django executes them after it is no longer inside an atomic block (wrapped in a transaction). If there is no transaction block and you are trying to register the function, it won’t get register rather get called instead. Here is the link to the source code:
    Registering and What happens on registering
    Things happening while executing the registered functions
  2. How do i pass arguments to the above functions?This is actually a basic python issue. You can wrap your function which has arguments inside a lambda expression or a normal function.

    transaction.on_commit(lambda: my_fancier_func_with_args(a, b))


  3. From where do we get this “transaction”?By importing:
    from django.db import transaction


Running Unique tasks in celery

At my day job, we had a requirement to make one of the api endpoints fast. The endpoint was computation intense and was taking lot of time. After profiling the endpoint using django-silk, we came to the conclusion that sql wasn’t the issue.

One possible solution was to move the computation in a celery worker, put the results in a separate table and serve the web requests from the table directly. The computations for a particular loan was triggered by a couple of events which were easy to determine. Added benefit for us was that, we had other use cases for the table.

The uniqueness part comes in when the end result of computing multiple times is same as computing once (idempotent). If there are two workers and they start working on the computations for the same loan, they will produce same results and would end up updating the database with those results. To maintain uniqueness, we used locking using redis.

While working on this, i came across two solutions:

  1. Discarding the task on the worker

In this method, we would decorate our task with a function that would check if it can acquire the lock for the loan or not. If it can’t acquire the lock this means that there is a worker working on this loan and we don’t need to compute again so, it doesn’t run any further. In this, we are using transaction aware task (it should work on the Task class exposed by celery as well but, for our use case we need TransactionAwareTask;  )

def should_compute_for_loan(key):
    def decorated_func(func):
        def inner(*args, **kwargs):
                Apply a lock on a key and checks if we should go ahead
                and run the celery task
            has_lock, return_value = False, False
            loan_id = args[0]
            lock = cache.lock(key.format(loan_id=loan_id), timeout=600)
                has_lock = lock.acquire(blocking=False)
                if has_lock:
                    return_value = func(*args, **kwargs)
                if has_lock:
            return return_value

        return inner

    return decorated_func

def recompute_heavy_computation(loan_id):

The shortcoming with this method is that even if the computation is taking place only once, we still would need to publish the task which means the queue still gets flooded.

2. Discarding the task in django

class TransactionAwareUniqueTask(TransactionAwareTask):
        Makes sure that a task is computed only once using locking.
        The task itself is triggered by django as a callback when
        the transaction is committed successfully.
        Usage: subclassed_task.delay(some_args, key='some-namespacing-id-for-uniqueness')
    abstract = True

    def delay(self, *args, **kwargs):
            Makes a lock using redis for given key

        has_lock = False
        key = kwargs['key']
        lock = cache.lock(
            key, timeout=600, blocking_timeout=0.00001
        has_lock = lock.acquire(blocking=False)
        if has_lock:
            LOGGER.debug("Lock acquired: %s", key)
            super(TransactionAwareUniqueTask, self).delay(*args, **kwargs)
            LOGGER.debug("Can not get lock: %s", key)

and use it like:

@app.task(base=TransactionAwareUniqueTask, acks_late=True)

I have used acks_late instead of the default because we want to ack rmq when the task has finished and not when it has received. This means that in case the worker dies after taking up the job, rmq will make sure that it doesn’t remove the task from the queue.

The lock still needs to be removed by the worker thread because, once the computation is completed, you want to make sure that if there is a requirement to compute again within the timeout period of the lock, it’s possible to do so. This can be achieved by using task_postrun signal provided by celery. This also gets triggered when the task fails to run for some reason, i.e, if an exception happens within the task.

def release_the_lock(*args, **kwargs):
        Release the redis lock

    key = kwargs['kwargs']['key']
    LOGGER.debug("About to delete: %s", key)
    LOGGER.debug("Deleted lock: %s", key)

This deletes the key instead of using the locking interface. This is one thing that bothers me a little but, i couldn’t find any better solution. The lock itself needs to be acquired before releasing and you can’t pass the lock object to the worker thread because it won’t be json serialisable. This feels a bit hacky but, it sure worked.

Detecting USB Insertion/Removal using Python

In my last blog, i wrote about how usbmount can be used to automatically mount a usb. Today, i had to detect this mounting from python and show it in a web application that a usb has been inserted. I used pyudev for this and ran it in a different thread so that the main application thread is not affected by pyudev’s monitor loop.

class USBDetector():
    ''' Monitor udev for detection of usb '''

    def __init__(self):
        ''' Initiate the object '''
        thread = threading.Thread(target=self._work)
        thread.daemon = True

    def _work(self):
        ''' Runs the actual loop to detect the events '''
        self.context = pyudev.Context()
        self.monitor = pyudev.Monitor.from_netlink(self.context)
        # this is module level logger, can be ignored"Starting to monitor for usb")
        for device in iter(self.monitor.poll, None):
  "Got USB event: %s", device.action)
            if device.action == 'add':
                # some function to run on insertion of usb
                # some function to run on removal of usb

Since, usbmount mounts the filesystem of usb at some particular locations (/media/usb0-7), you can easily check for files in those folders and do whatever you wish to do with it.

Using the above code would be as easy as creating an object of USBDetector class.

Detecting and Automatically Mounting Pendrive on Raspbian Stretch Lite

In the lite version of Raspbian Stretch, i didn’t expect that i would have to manually mount the pendrive. I had a use case in which mounting it automatically was a necessity and thus i came across usbmount.

You can install it in raspbian using apt-get.

sudo apt-get install usbmount

Usbmount gives a lot of configuration options in /etc/usbmount/usbmount.conf  but, for my use case the default were good enough.

This wasn’t enough though. It wasn’t detecting the usb stick. And, on searching i found out that i wasn’t the only one who was having this problem. For the usbmount to work in raspbian stretch, you will have to manually edit systemd file for udevd located in /lib/systemd/system/systemd-udevd.service and change MountFlags=slave to MountFlags=shared as someone commented here.

Setting up postgres for python development

Postgresql is my database of choice mainly because, almost (if not all) web applications in fedora infrastructure use postgres and i have been using the same. But, every time i use a fresh system, i face some issues with getting started. Almost all of them are for client authentication. This post is here in case it happens the same in future.

  1. Install the dependencies:
    1. sudo dnf install postgresql postgresql-devel postgresql-server postgresql-contrib
  2. Create a db user other than postgres:
    1. sudo -i
    2. su – postgres
    3. initdb
    4. Start the database using command: pg_ctl -D /var/lib/pgsql/data -l logfile start
    5. psql -U postgres
    6. create database pagure;
    7. create user pagure with password ‘pagure’;
    8. grant all privileges on database pagure to pagure;
    9. show hba_file;
    10. Go to that file and change the auth mode to trust;
  3. If runs into: unregistered authentication agent for unix-process

Tornado with systemd

Unlike Django or Flask, tornado is not WSGI based. It also ships with it’s own HTTP server and running the tornado server in production is not very much different from what you do while running it while developing. You may want to run the server in background though.

The systemd config file for tornado server that i am currently using is:

Description=Tornado server service file

ExecStart=/usr/bin/python /path/to/tornado/server/main/


For my use case, it should start after rabbitmq-server has started. After putting this script in /etc/systemd/system/ , you can start the server by:

sudo systemctl start servicename.service

It’s status can be checked using:

sudo systemctl status servicename.service

For people with trust issues, you can use curl or visit the site itself to reconfirm. In my case, the tornado server was a websocket server and not http. So, i had to put some additional parameters with curl which i got from here.

➜ curl –include \
–no-buffer \
–header “Connection: Upgrade” \
–header “Upgrade: websocket” \
–header “Host:; \
–header “Origin:; \
–header “Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==” \
–header “Sec-WebSocket-Version: 13” \

That’s it.

Token based Websocket Authentication

At my day job, i had to implement websockets and thus authentication of the websocket connection came up. There were two different types of clients but, the authentication for browser client was the biggest headache.

For a normal HTTP request, we use cookies for authentication of a user. Websocket protocol in itself doesn’t define how a websocket connection should be authenticated. According to RFC 6455, under section 10.5:

   This protocol doesn't prescribe any particular way that servers can
   authenticate clients during the WebSocket handshake. The WebSocket
   server can use any client authentication mechanism available to a
   generic HTTP server, such as cookies, HTTP authentication, or TLS

So, the first thing that comes to mind is: Why not use the same cookies that we use for an HTTP request? I thought the same too but, eventually, decided to use token based authentication.

Why not use the cookie?
We are using Django for our main web application. Django and WSGI based python frameworks in general, are not built for long lived connections. So, for websockets we are using Tornado.

In Django, by default, cookies are not readable by javascript. They are marked as HTTP only and thus the browser uses the cookie only for making http/https requests to the origin server. It can be turned off by using:


Screenshot from 2017-10-15 00-16-41

The above image is when you have SESSION_COOKIE_HTTPONLY = True .

Screenshot from 2017-10-15 00-42-28

This is when you set SESSION_COOKIE_HTTPONLY  to False. The “`sessionid“` is the one which will be used by the server to identify the user.

The main benefit of not exposing sessionid to js in the browser is that it if someone successfully performs a XSS attack they won’t be able to hijack the session. Setting the cookie to be not http only would have been the easiest option for me but, as it was not recommended, i went for token based authentication.

Token based authentication

For token based authentication to work, the Django server will have to generate a token on every request (for the endpoints which requires the websocket connection). Once the browser gets the token, it can initiate a websocket connection to the tornado server. While opening the websocket connection, the browser will send the token as well. On the server side, there should be a common store where Django can store the token and Tornado can retrieve the token to verify the request.

Generating the token on server side for multiple views can be done by making a python decorator. But, if you are making a decorator and want to pass on a variable to the original function itself, you will have to add an extra parameter on the function to receive the variable’s value. This was a big task and would have meant a lot of changes across the project. Instead, i went on to make project wide template tags.

Making a project wide template tag in django for creating tokens

  1. Create a folder under the project’s main directory and create two files: and
  2.  In, you can put something like this.:
    import uuid
    import json
    import datetime
    from django import template
    from project_name import redis_conn
    register = template.Library()
    def create_ws_token(context):
        request = context['request']
        if not request.user.is_authenticated():
            return 'Not authed'
        user = request.user.username
        current_time = datetime.datetime.strftime(
        token = 'wstoken' + uuid.uuid4().hex
        output = {
           'user': user,
           'time': current_time
        redis_conn.set(token, json.dumps(output))
        return token
  3. Put the following snippet inside the Templates -> Options in settings of the project.
    'libraries': {
    'create_ws_token': 'project_name.templatetags.create_ws_token',
  4. Now to use this template tag in any template, you will need to load it.
    {% load create_ws_token %}

        var token = '{% create_ws_token %}';
        if (token.startsWith('wstoken')) {

socket is a function which is defined in other js file which creates a websocket connection.

ws = create_ws("ws://localhost:8080/wsb?ws_token="+ws_token);

From tornado side, we need to get the ws_token and query redis for a verification.

def open(self):
    ''' Called by tornado when a new connection opens up '''
    self.user = None
    if 'ws_token' in self.request.arguments:
        token = self.request.arguments['ws_token'][0]
        self.user = self.authenticate(token)
        if self.user:
             tsockets.add_socket(self.user, self)
             print 'New connection from browser!'

The authenticate method would be like:

def authenticate(self, token):
    ''' Check for authentic token in redis '''
    inredis = self.application.redis_conn.get(token)
    if inredis:
        inredis = json.loads(inredis)
        current_time = datetime.datetime.utcnow()
        valid_time = current_time - datetime.timedelta(seconds=20)
        inredis_time = datetime.datetime.strptime(
           inredis['time'], "%d:%m:%Y:%H:%M:%S"
        if valid_time <= inredis_time:
            return inredis['user']
    return False

I chose redis because, Tornado is a single threaded server and connecting to db, if it’s not async will result in a blocking connection which means the real time features will get affected.

That’s it.

Using C function from Python

ctypes is a python library which allows using C data types, functions from a python script. It’s in the standard python library. To use C functions using ctypes, you will need to compile the C code and create a shared library.


#include <stdio.h>

int add_two_numbers(int num1, int num2) {
    return num1 + num2;

I will be using a very simple C function in this case which adds two given numbers

Now compile this file using:
gcc -fPIC -shared -o add.c

This will create a shared library named which, for now, contains only one function.

# coding=utf-8

import ctypes

_add = ctypes.CDLL('/home/vivek/ctypestuts/')
_add.add_two_numbers.argtypes = (ctypes.c_int, ctypes.c_int)

def add_two_numbers(num1, num2):
   ''' Adds two numbers '''

   return _add.add_two_numbers(ctypes.c_int(num1), ctypes.c_int(num2))

I am using fedora 26. If you are using Windows, you will need to use ctypes.WinDLL.

_add here, is the shared library and we can access the C function using dot(.) .

# coding=utf-8

import add

num1 = int(raw_input("Enter num1: "))
num2 = int(raw_input("Enter num2: "))
print add.add_two_numbers(num1, num2)

That’s it.