Background: I am running python with Theano on a GPU, and I care about speed.
Scenario: I have a largeish matrix (
C) which is stored as a shared variable, and I need to update a subset of the rows (
modified_rows) by some other matrix (
C_delta). What should I do?
import numpy as np from theano import function, shared from theano.tensor import fmatrix, ivector, set_subtensor C = shared(np.random.normal(size=(70000, 100))) modified_rows = np.random.randint(low=0, high=70000, size=200) C_delta = np.random.normal(size=(len(modified_rows), 100)) C_d = fmatrix('C_delta') mod_rows =ivector('modified_rows')
Slow method: manually reset the values:
C_temp = C.get_value() C_temp[modified_rows, :] = C_temp[modified_rows, :] + C_delta C.set_value(C_temp)
32 function calls in 0.055 seconds ncalls tottime percall cumtime percall filename:lineno(function) ... 1 0.000 0.000 0.026 0.026 sharedvalue.py:100(set_value) 1 0.000 0.000 0.027 0.027 sharedvalue.py:80(get_value)
This is bad because it requires unpacking and repacking the value in the shared variable (via
set_value). We only need to modify a small number of the rows (200 out of 70k) so having to update every single one seems extremely wasteful.
Part of the 'nice thing' about shared variables is that they can be updated by functions which use them, so you might try:
update_C = function([C_d, mod_rows], , updates=[(C[mod_rows, :], C[mod_rows, :] + C_d)], allow_input_downcast=True)
mod_rows are symbolic variables (specifically
fmatrix and ivector) defined above).
allow_input_downcast=True will deal with numpy's love of dealing in double-precision floats, which Theano rejects for GPU work. This loss of precision may be important to you.
So then a simple call to
update_C(C_delta, modified_rows) will do what you want, except that what I just wrote won't work. You can't update shared variables like that. I think it's because the first element of the tuple is not really the shared variable, so Theano freaks out. (Full disclosure: little idea of theano's inner workings.)
Focusing solely on the
updates=[...] part (everything else should be OK), you need to do:
updates = [(C, set_subtensor(C[modified_rows], C_delta))]
So the full command (if you are lazily copying and pasting this into iPython to test speed):
update_C = function([C_d, mod_rows], , updates=[(C, set_subtensor(C[mod_rows, :] + C_d))], allow_input_downcast=True)
Things which won't work (for reasons unknown to me):
C[modified_rows] -> C[modified_rows, :] C_delta -> C_delta[:, :]
As for the speed for this method: well,
28 function calls in 0.001 seconds ncalls tottime percall cumtime percall filename:lineno(function) ... 1 0.001 0.001 0.001 0.001 subtensor.py:1644(perform)
I think that solves the problem.
Possibly relevant technical information:
Theano version is 0.6.0.
Numpy version is 1.8.2, using Intel's Math Kernel Library (MKL) as part of Anaconda.
GPU is a GeForce GTX 680.
Note: this post first appeared on my wordpress blog.