Background: I am running python with Theano on a GPU, and I care about speed.

Scenario: I have a largeish matrix (`C`

) which is stored as a shared variable, and I need to update a subset of the rows (`modified_rows`

) by some other matrix (`C_delta`

). What should I do?

Initialising, e.g.:

```
import numpy as np
from theano import function, shared
from theano.tensor import fmatrix, ivector, set_subtensor
C = shared(np.random.normal(size=(70000, 100)))
modified_rows = np.random.randint(low=0, high=70000, size=200)
C_delta = np.random.normal(size=(len(modified_rows), 100))
C_d = fmatrix('C_delta')
mod_rows =ivector('modified_rows')
```

Slow method: manually reset the values:

```
C_temp = C.get_value()
C_temp[modified_rows, :] = C_temp[modified_rows, :] + C_delta
C.set_value(C_temp)
```

Speed:

```
32 function calls in 0.055 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.000 0.000 0.026 0.026 sharedvalue.py:100(set_value)
1 0.000 0.000 0.027 0.027 sharedvalue.py:80(get_value)
```

This is bad because it requires unpacking and repacking the value in the shared variable (via `get_value`

and `set_value`

). We only need to modify a small number of the rows (200 out of 70k) so having to update every single one seems extremely wasteful.

Part of the 'nice thing' about shared variables is that they can be updated by functions which use them, so you might try:

```
update_C = function([C_d, mod_rows], [],
updates=[(C[mod_rows, :], C[mod_rows, :] + C_d)],
allow_input_downcast=True)
```

(remember, `C_d`

and `mod_rows`

are *symbolic variables* (specifically `fmatrix`

and ivector) defined above).
The `allow_input_downcast=True`

will deal with numpy's love of dealing in double-precision floats, which Theano rejects for GPU work. This loss of precision *may* be important to you.

So then a simple call to `update_C(C_delta, modified_rows)`

will do what you want, except that what I just wrote won't work. You can't update shared variables like that. I think it's because the first element of the tuple is not *really* the shared variable, so Theano freaks out. (Full disclosure: little idea of theano's inner workings.)

Focusing solely on the `updates=[...]`

part (everything else should be OK), you need to do:

```
updates = [(C, set_subtensor(C[modified_rows], C_delta))]
```

So the full command (if you are lazily copying and pasting this into iPython to test speed):

```
update_C = function([C_d, mod_rows], [],
updates=[(C, set_subtensor(C[mod_rows, :] + C_d))],
allow_input_downcast=True)
```

Things which won't work (for reasons unknown to me):

```
C[modified_rows] -> C[modified_rows, :]
C_delta -> C_delta[:, :]
```

As for the speed for this method: well,

```
28 function calls in 0.001 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.001 0.001 0.001 0.001 subtensor.py:1644(perform)
```

I think that solves the problem.

Related:

https://stackoverflow.com/questions/24229361/theano-indexing-inside-a-compiled-function-gpu

https://stackoverflow.com/questions/15917849/how-can-i-assign-update-subset-of-tensor-shared-variable-in-theano

Possibly relevant technical information:

*Theano version is 0.6.0.
Numpy version is 1.8.2, using Intel's Math Kernel Library (MKL) as part of Anaconda.
GPU is a GeForce GTX 680.*

**Note:** this post first appeared on my wordpress blog.