Changing Primary Key in Django Postgresql Setup

After visiting my own code after 8 months to make a change, i realised that the schema of the table needed an update and will need a change in primary key field. What i had assumed would be unique turns out wasn’t unique after all. So, i needed to put the default “id” field as auto incremental primary key.

Changing the primary key by raw sql in postgresql is simple. The tricky part is how you tell django that you have added a new field and changed the primary key so that it doesn’t try to create a new field on running  “django manage.py makemigrations”. This is where “state_operations” comes for help.

By specifying state_operations, you tell django that you have done “stuff” in raw sql that is equivalent to running “x operations” via orm migration.

Steps:

  • Create an empty migration file: “python manage.py makemigrations <app name> –empty -n “change_primary_key”
  • Put following code in the migration’s operations:
    migrations.RunSQL(
    sql=[
    "ALTER TABLE $your table DROP CONSTRAINT $primary key constraint name",
    "ALTER TABLE $your table ADD COLUMN id SERIAL PRIMARY KEY;"
    "ALTER TABLE $your table ALTER COLUMN $old primary key DROP NOT NULL;"
    ],
    reverse_sql=[
    "ALTER TABLE $your table DROP CONSTRAINT $primary key constraint name;",
    "ALTER TABLE $your table DROP COLUMN id",
    "ALTER TABLE $your table ALTER COLUMN $old primary key SET NOT NULL;",
    "ALTER TABLE $your table ADD PRIMARY KEY ($old primary key)",
    ],
    state_operations=[
    migrations.RemoveIndex(model_name='$model name as string', name='$primary key constraint name'),
    migrations.AlterField(
    model_name='$model name as string', name='$old primary key',
    field=models.CharField(max_length=100, null=True, blank=True)
    ),
    migrations.AddField(
    model_name='$model name as string', name='id',
    field=models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
    ]
    )
    
  • The only change that needs to go in the model code for the table is:
    $old_primary_key = models.CharField(max_length=100, null=True, blank=True)
    
  • Now, if you run makemigrations, you should not find any new migrations created by django for changing primary key.

The best part about postgresql in this is that if you add a auto increment integer primary key, it auto-fills the rows with the keys starting from 1.

Things i have become better at in 2018

It’s about 43 min left for 2018 to end in my timezone and i have no fancy plans.

So, here is an attempt to list down things i have learnt this year. There is no way i can list all the things, if i hadn’t been keeping track (which i haven’t).

  1. Django: I already was working with Django for 6 months in my previous job. Apart from that i had done a couple of projects in college. But, in this job, i think i have in depths with Django. I have also realised that although it makes it easier to ship products faster, it comes at a cost. Ex: The admin interface of django. It’s good for being a reflection of database and performing simple operations. The minute you want to do something even a little complex, it becomes really hard and makes ugly code.
  2. Debugger: Right now, i can’t believe that i wasn’t using the pdb since last 4 years. I used to rely on logging instead. The credit has to go to my Engineering Manager Manish . Writing code and debugging with him has helped me a become a lot better code in 2018.
  3. SQLI have gotten better at writing sql. Earlier, i never needed to think about it much. In all of the web applications i had built, including open source and at previous job, we always used an ORM. This year, we needed optimisations, raw sql queries for Metabase for which i wrote queries.
  4. Python: I have been writing python code since 2014, this is my language of preference. My learning of the standard library has grown this year.
  5. HTML/CSS/Javascript: I have a front end nano degree now ( \o/ ) . It took me a long time but, yes, as of December 2018 and by degree given to me by Udacity, I can totally pronounce (using Joey’s voice when he was marrying Chandler and Monica) myself:  A Full Stack Developer.
  6. Rabbitmq/Celery/Redis: I can say that i had some practice of this. I had already been using and the way i am using right now is same as before. Having said that, i did try to dig deep in Celery and i don’t think celery’s codebase is easy to mess with. (So is Django’s btw)
  7. AWS/EBS/: Towards the latter part of this year, i started using more of AWS console and EBS.

These were the broad categories of things i have gone better at. Actually, there is nothing new topic wise apart from EBS but, there are smaller details within each category that i didn’t know about.

Apart from all the progress that i have listed, there is a thing i haven’t done this year. I haven’t contributed to Open Source as much as i would have liked to. Actually, from the time i started contributing, this year i have contributed the least. I don’t think i have been able to give my 100% everyday which kills me sometimes. I know when i am not at my 100% and i know when i am. If i go by this logic, 2018 has been a crappy year.

Django’s transaction.on_commit

I am talking with Django’s 1.11 in mind.

Django provides a method using which you can execute a particular function when you are sure that a particular entry has made to the database, that’s transaction.on_commit.

There is also Django’s Post save signal. Both have their use cases. The difference between the two is that on_commit is executed when Django finally commits to the database and it’s only useful when you are inside a transaction. While, in case of post save signal, you are not even sure that the object you are currently saving will finally be committed to the database (if it’s inside a transaction block). Post save has it’s use case when you need to know the object which just got changed or got created since, you get boolean on whether it was updated or created and also the changed/created object itself.

How to work with transaction.on_commit?

When the code is running, you can register a function to be called, by calling

transaction.on_commit(my_fancy_function_to_be_called_on_commit).

Remember, you are trying to pass a function here and not the function’s returned value. What happens here is, Django adds the my_fancy_function_to_be_called_on_commit to a list of functions which it calls when it finally commits the ongoing transaction. There are few small things here:

  1. Since the transactions can be nested, when does django finally execute these functions?Django executes them after it is no longer inside an atomic block (wrapped in a transaction). If there is no transaction block and you are trying to register the function, it won’t get register rather get called instead. Here is the link to the source code:
    Registering and What happens on registering
    Things happening while executing the registered functions
  2. How do i pass arguments to the above functions?This is actually a basic python issue. You can wrap your function which has arguments inside a lambda expression or a normal function.

    transaction.on_commit(lambda: my_fancier_func_with_args(a, b))

     

  3. From where do we get this “transaction”?By importing:
    from django.db import transaction

 

Running Unique tasks in celery

At my day job, we had a requirement to make one of the api endpoints fast. The endpoint was computation intense and was taking lot of time. After profiling the endpoint using django-silk, we came to the conclusion that sql wasn’t the issue.

One possible solution was to move the computation in a celery worker, put the results in a separate table and serve the web requests from the table directly. The computations for a particular loan was triggered by a couple of events which were easy to determine. Added benefit for us was that, we had other use cases for the table.

The uniqueness part comes in when the end result of computing multiple times is same as computing once (idempotent). If there are two workers and they start working on the computations for the same loan, they will produce same results and would end up updating the database with those results. To maintain uniqueness, we used locking using redis.

While working on this, i came across two solutions:

  1. Discarding the task on the worker

In this method, we would decorate our task with a function that would check if it can acquire the lock for the loan or not. If it can’t acquire the lock this means that there is a worker working on this loan and we don’t need to compute again so, it doesn’t run any further. In this, we are using transaction aware task (it should work on the Task class exposed by celery as well but, for our use case we need TransactionAwareTask; https://gist.github.com/tapanpandita/46d2e2f63c7425547a865cb6298a172f  )

def should_compute_for_loan(key):
    def decorated_func(func):
        @functools.wraps(func)
        def inner(*args, **kwargs):
            """
                Apply a lock on a key and checks if we should go ahead
                and run the celery task
            """
            has_lock, return_value = False, False
            loan_id = args[0]
            lock = cache.lock(key.format(loan_id=loan_id), timeout=600)
            try:
                has_lock = lock.acquire(blocking=False)
                if has_lock:
                    return_value = func(*args, **kwargs)
            finally:
                if has_lock:
                    lock.release()
            return return_value

        return inner

    return decorated_func

@app.task(base=TransactionAwareTask)
@should_compute_for_loan(key='heavy_computation:{loan_id}')
def recompute_heavy_computation(loan_id):

The shortcoming with this method is that even if the computation is taking place only once, we still would need to publish the task which means the queue still gets flooded.

2. Discarding the task in django

class TransactionAwareUniqueTask(TransactionAwareTask):
    '''
        Makes sure that a task is computed only once using locking.
        The task itself is triggered by django as a callback when
        the transaction is committed successfully.
        Usage: subclassed_task.delay(some_args, key='some-namespacing-id-for-uniqueness')
    '''
    abstract = True

    def delay(self, *args, **kwargs):
        '''
            Makes a lock using redis for given key
        '''

        has_lock = False
        key = kwargs['key']
        lock = cache.lock(
            key, timeout=600, blocking_timeout=0.00001
        )
        has_lock = lock.acquire(blocking=False)
        if has_lock:
            LOGGER.debug("Lock acquired: %s", key)
            super(TransactionAwareUniqueTask, self).delay(*args, **kwargs)
        else:
            LOGGER.debug("Can not get lock: %s", key)

and use it like:

@app.task(base=TransactionAwareUniqueTask, acks_late=True)

I have used acks_late instead of the default because we want to ack rmq when the task has finished and not when it has received. This means that in case the worker dies after taking up the job, rmq will make sure that it doesn’t remove the task from the queue.

The lock still needs to be removed by the worker thread because, once the computation is completed, you want to make sure that if there is a requirement to compute again within the timeout period of the lock, it’s possible to do so. This can be achieved by using task_postrun signal provided by celery. This also gets triggered when the task fails to run for some reason, i.e, if an exception happens within the task.

@task_postrun.connect(sender=recompute_heavy_computation)
def release_the_lock(*args, **kwargs):
    """
        Release the redis lock
    """

    key = kwargs['kwargs']['key']
    LOGGER.debug("About to delete: %s", key)
    cache.client.delete(key)
    LOGGER.debug("Deleted lock: %s", key)

This deletes the key instead of using the locking interface. This is one thing that bothers me a little but, i couldn’t find any better solution. The lock itself needs to be acquired before releasing and you can’t pass the lock object to the worker thread because it won’t be json serialisable. This feels a bit hacky but, it sure worked.

Detecting USB Insertion/Removal using Python

In my last blog, i wrote about how usbmount can be used to automatically mount a usb. Today, i had to detect this mounting from python and show it in a web application that a usb has been inserted. I used pyudev for this and ran it in a different thread so that the main application thread is not affected by pyudev’s monitor loop.

class USBDetector():
    ''' Monitor udev for detection of usb '''

    def __init__(self):
        ''' Initiate the object '''
        thread = threading.Thread(target=self._work)
        thread.daemon = True
        thread.start()

    def _work(self):
        ''' Runs the actual loop to detect the events '''
        self.context = pyudev.Context()
        self.monitor = pyudev.Monitor.from_netlink(self.context)
        self.monitor.filter_by(subsystem='usb')
        # this is module level logger, can be ignored
        LOGGER.info("Starting to monitor for usb")
        self.monitor.start()
        for device in iter(self.monitor.poll, None):
            LOGGER.info("Got USB event: %s", device.action)
            if device.action == 'add':
                # some function to run on insertion of usb
                self.on_created()
            else:
                # some function to run on removal of usb
                self.on_deleted()

Since, usbmount mounts the filesystem of usb at some particular locations (/media/usb0-7), you can easily check for files in those folders and do whatever you wish to do with it.

Using the above code would be as easy as creating an object of USBDetector class.

Detecting and Automatically Mounting Pendrive on Raspbian Stretch Lite

In the lite version of Raspbian Stretch, i didn’t expect that i would have to manually mount the pendrive. I had a use case in which mounting it automatically was a necessity and thus i came across usbmount.

You can install it in raspbian using apt-get.

sudo apt-get install usbmount

Usbmount gives a lot of configuration options in /etc/usbmount/usbmount.conf  but, for my use case the default were good enough.

This wasn’t enough though. It wasn’t detecting the usb stick. And, on searching i found out that i wasn’t the only one who was having this problem. For the usbmount to work in raspbian stretch, you will have to manually edit systemd file for udevd located in /lib/systemd/system/systemd-udevd.service and change MountFlags=slave to MountFlags=shared as someone commented here.

Setting up postgres for python development

Postgresql is my database of choice mainly because, almost (if not all) web applications in fedora infrastructure use postgres and i have been using the same. But, every time i use a fresh system, i face some issues with getting started. Almost all of them are for client authentication. This post is here in case it happens the same in future.

  1. Install the dependencies:
    1. sudo dnf install postgresql postgresql-devel postgresql-server postgresql-contrib
  2. Create a db user other than postgres:
    1. sudo -i
    2. su – postgres
    3. initdb
    4. Start the database using command: pg_ctl -D /var/lib/pgsql/data -l logfile start
    5. psql -U postgres
    6. create database pagure;
    7. create user pagure with password ‘pagure’;
    8. grant all privileges on database pagure to pagure;
    9. show hba_file;
    10. Go to that file and change the auth mode to trust;
  3. If runs into: unregistered authentication agent for unix-process https://techglimpse.com/solution-polkitd-postgresql-start-error/