updating shared variables in theano

Background: I am running python with Theano on a GPU, and I care about speed.

Scenario: I have a largeish matrix (C) which is stored as a shared variable, and I need to update a subset of the rows (modified_rows) by some other matrix (C_delta). What should I do?

Initialising, e.g.:

    import numpy as np  
    from theano import function, shared  
    from theano.tensor import fmatrix, ivector, set_subtensor  
    C = shared(np.random.normal(size=(70000, 100)))  
    modified_rows = np.random.randint(low=0, high=70000, size=200)  
    C_delta = np.random.normal(size=(len(modified_rows), 100))  
    C_d = fmatrix('C_delta')  
    mod_rows =ivector('modified_rows')

Slow method: manually reset the values:

    C_temp = C.get_value()
    C_temp[modified_rows, :] = C_temp[modified_rows, :] + C_delta
    C.set_value(C_temp)

Speed:

    32 function calls in 0.055 seconds
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    ...
    1 0.000 0.000 0.026 0.026 sharedvalue.py:100(set_value)
    1 0.000 0.000 0.027 0.027 sharedvalue.py:80(get_value)

This is bad because it requires unpacking and repacking the value in the shared variable (via get_value and set_value). We only need to modify a small number of the rows (200 out of 70k) so having to update every single one seems extremely wasteful.

Part of the 'nice thing' about shared variables is that they can be updated by functions which use them, so you might try:

    update_C = function([C_d, mod_rows], [],
                        updates=[(C[mod_rows, :], C[mod_rows, :] + C_d)], 
                        allow_input_downcast=True)

(remember, C_d and mod_rows are symbolic variables (specifically fmatrix and ivector) defined above). The allow_input_downcast=True will deal with numpy's love of dealing in double-precision floats, which Theano rejects for GPU work. This loss of precision may be important to you.

So then a simple call to update_C(C_delta, modified_rows) will do what you want, except that what I just wrote won't work. You can't update shared variables like that. I think it's because the first element of the tuple is not really the shared variable, so Theano freaks out. (Full disclosure: little idea of theano's inner workings.)

Focusing solely on the updates=[...] part (everything else should be OK), you need to do:

    updates = [(C, set_subtensor(C[modified_rows], C_delta))]

So the full command (if you are lazily copying and pasting this into iPython to test speed):

    update_C = function([C_d, mod_rows], [],  
                        updates=[(C, set_subtensor(C[mod_rows, :] + C_d))],  
                        allow_input_downcast=True)

Things which won't work (for reasons unknown to me):

    C[modified_rows] -> C[modified_rows, :]
    C_delta -> C_delta[:, :]

As for the speed for this method: well,

    28 function calls in 0.001 seconds
    ncalls tottime percall cumtime percall filename:lineno(function)
    ...
    1 0.001 0.001 0.001 0.001 subtensor.py:1644(perform)

I think that solves the problem.

Related:
https://stackoverflow.com/questions/24229361/theano-indexing-inside-a-compiled-function-gpu
https://stackoverflow.com/questions/15917849/how-can-i-assign-update-subset-of-tensor-shared-variable-in-theano

Possibly relevant technical information:
Theano version is 0.6.0.
Numpy version is 1.8.2, using Intel's Math Kernel Library (MKL) as part of Anaconda.
GPU is a GeForce GTX 680.

Note: this post first appeared on my wordpress blog.