This documentation is for a development version. Click here for the latest stable release (v2.1.0).

API reference


This is the NengoOCL simulator. It uses the Nengo builder to take a model and turn it into signals and operators. Then, we copy all the signals into OpenCL, and create OpenCL versions of all the operators. This is what the Simulator.plan_* functions do; each one of them is responsible for creating an OpenCL kernel (or kernels) to execute the corresponding operator.

class nengo_ocl.Simulator(network, dt=0.001, seed=None, model=None, context=None, n_prealloc_probes=32, profiling=None, if_python_code='none', planner=<function greedy_planner>, progress_bar=True)[source]

Simulator for running Nengo models in OpenCL.

network, dt, seed, model

These parameters are the same as in nengo.Simulator.

contextpyopencl.Context (optional)

OpenCL context specifying which device(s) to run on. By default, we will create a context by calling pyopencl.create_some_context and use this context as the default for all subsequent instances.

n_prealloc_probesint (optional)

Number of timesteps to buffer when probing. Larger numbers mean less data transfer with the device (faster), but use more device memory.

profilingboolean (optional)

If True, print_profiling() will show profiling information. By default, will check the environment variable NENGO_OCL_PROFILING

if_python_code‘none’ | ‘warn’ | ‘error’

How the simulator should react if a Python function cannot be converted to OpenCL code.


A function to plan operator order. See nengo_ocl.planners.

property dt

(float) The time step of the simulator.

property n_steps

(int) The current time step of the simulator.

property time

(float) The current time of the simulator.

property signals

Get/set [properly-shaped] signal value (either 0d, 1d, or 2d)


Clear all probe histories.

New in version 2.0.0.


Closes the simulator.

Any call to, Simulator.run_steps, Simulator.step, and Simulator.reset on a closed simulator raises a nengo.exceptions.SimulatorClosed exception.

reset(self, seed=None)[source]

Reset the simulator state.


Not implemented. Changing the simulator seed during reset is not supported by NengoOCL.

run(self, time_in_seconds, progress_bar=None)[source]

Simulate for the given length of time.

If the given length of time is not a multiple of dt, it will be rounded to the nearest dt. For example, if dt is 0.001 and run is called with time_in_seconds=0.0006, the simulator will advance one timestep, resulting in the actual simulator time being 0.001.

The given length of time must be positive. The simulator cannot be run backwards.


Amount of time to run the simulation for. Must be positive.

progress_barbool or nengo.utils.progress.ProgressBar, optional

Progress bar for displaying the progress of the simulation run.

If True, the default progress bar will be used. If False, the progress bar will be disabled. For more control over the progress bar, pass in a nengo.utils.progress.ProgressBar instance.

run_steps(self, steps, progress_bar=True)[source]

Simulate for the given number of dt steps.


Number of steps to run the simulation for.

progress_barbool or nengo.utils.progress.ProgressBar, optional

Progress bar for displaying the progress of the simulation run.

If True, the default progress bar will be used. If False, the progress bar will be disabled. For more control over the progress bar, pass in a nengo.utils.progress.ProgressBar instance.


Advance the simulator by 1 step (dt seconds).

trange(self, sample_every=None, dt=None)[source]

Create a vector of times matching probed data.

Note that the range does not start at 0 as one might expect, but at the first timestep (i.e., dt).

sample_everyfloat, optional

The sampling period of the probe to create a range for. If None, a time value for every dt will be produced.

Changed in version 2.0.0: Renamed from dt to sample_every

print_profiling(self, sort=None)[source]

Print recorded profiling information in a sorted table.

To enable profiling, pass the profiling=True argument when creating the Simulator.

sortcolumn to sort by (negative number sorts ascending)

(0 = n_calls, 1 = runtime, 2 = q-time, 3 = subtime)


This is where the kernels for most operators are (i.e. most of the Simulator.plan_* functions call a plan_* function in here to generate the kernel). Each plan function follows roughly the same template:

  1. Do some checks on the arguments.

  2. Generate the C code for the kernel, using Mako to fill in variable things like datatypes.

  3. Compile the C code as a PyOpenCL Program, and set the arguments.

  4. Create and return a Plan object responsible for executing that pyopencl.Program.

OpenCL kernels for everything other than GEMV.

nengo_ocl.clra_nonlinearities.blockify_ij(max_size, ra)[source]

Blockify a single matrix or vector using the offset method

nengo_ocl.clra_nonlinearities.plan_elementwise_inc(queue, A, X, Y, alpha=None, outer=False, inc=True, tag=None)[source]

Implements an element-wise increment Y += alpha * A * X

alphanp.ndarray or None

Scalars to apply to each operation.


Perform an outer product. A and X must be vectors.


Whether to increment Y (True), or set it (False).

nengo_ocl.clra_nonlinearities.plan_linearfilter(queue, X, Y, A, B, Xbuf, Ybuf, tag=None)[source]

Implements a filter of the form

y[n+1] + a[0] y[n] + … + a[i] y[n-i] = b[0] x[n] + … + b[j] x[n-j]

nengo_ocl.clra_nonlinearities.plan_probes(queue, periods, X, Y, tag=None)[source]
Praggedarray of ints

The period (in time-steps) of each probe

nengo_ocl.clra_nonlinearities.plan_conv2d(queue, X, Y, filters, shape_in, shape_out, kernel_shape, conv=True, biases=None, padding=0, 0, strides=1, 1, channels_last=True, tag=None, transposed=False)[source]
filters = ch x size_i x size_j x nf # conv transposed

filters = ch x size_i x size_j x nf x ni x nj # local transposed biases = nf x ni x nj

conv : whether this is a convolution (true) or local filtering (false)


This contains kernels specific to the GEMV (matrix-vector multiply) operation, since it’s such an important and specialized operation. Most people will not need to know the details of this module.

OpenCL kernels for performing general matrix-vector multiplies (GEMV).

nengo_ocl.clra_gemv.ref_impl(p, items)[source]

Return an OCL function to calculate items of gemv operation p.

In this reference implementation, we create a work item per output number, or more specifically, a work grid of shape (max_y_len, len(items)). Each work item loops over the dot products and the elements within each dot product to compute the output value Y[global_id(1)][global_id(0)].

nengo_ocl.clra_gemv.plan_sparse_dot_inc(queue, A_indices, A_indptr, A_data, X, Y, inc=False, tag=None)[source]

Implements a sparse matrix-vector multiply: Y += A * X or Y = A * X

A_indices, A_indptrPyOpenCL array

Column sparse row index specifications

A_dataPyOpenCL array

Matrix values at those indices

X, YCLRaggedArrays of length 1

Input/output data.


Whether to increment Y (True), or set it (False).


This function crashes when there are >10M nonzero weights. A potential solution would be some way to tell each work item to do multiple rows.


You can think of a RaggedArray a list of arrays. Whereas NumPy allows lists of arrays of the same size to be made into one big array (e.g. if you’ve got five 3x2 arrays, you can make a 5x3x2 array), there’s no way to make arrays of different sizes into one big one. RaggedArray does this by making one big memory buffer where all the data is stored, and managing the reading and writing of that.

class nengo_ocl.raggedarray.RaggedArray(arrays, names=None, dtype=None, align=False)[source]

A linear buffer partitioned into sections of various lengths.

Can also be viewed as an efficient way of storing a list of arrays, in the same underlying buffer.

class nengo_ocl.clraggedarray.CLRaggedArray(queue, np_raggedarray)[source]

A linear device buffer partitioned into sections of various lengths.

Can also be viewed as an efficient way of storing a list of arrays on the device, in the same underlying buffer.


Copy the whole object to a host RaggedArray


Given an Array, get a Buffer that starts at the right offset

This fails unless array.offset is a multiple of queue.device.mem_base_addr_align, which is rare, so this isn’t really a good function.

nengo_ocl.clraggedarray.to_host(queue, data, dtype, start, shape, elemstrides, is_blocking=True)[source]

Copy memory off the device, into a Numpy array.

If the requested array is discontiguous, the whole block is copied off the device, and a view is created to show the appropriate part.


Additional operators, and functions for pruning/simplifying operators.

class nengo_ocl.operators.MultiDotInc(Y, Y_in, beta, gamma, tag=None)[source]

y <- gamma + beta * y_in + \sum_i dot(A_i, x_i)

property reads

Signals that are read and not modified by this operator.

Reads occur after increments, and before updates.

property incs

Signals incremented by this operator.

Increments will be applied after sets (if it is set), and before reads.

property sets

Signals set by this operator.

Sets occur first, before increments. A signal that is set here cannot be set or updated by any other operator.

property updates

Signals updated by this operator.

Updates are the last operation to occur to a signal.

make_step(self, signals, dt, rng)[source]

Returns a callable that performs the desired computation.

This method must be implemented by subclasses. To fully understand what an operator does, look at its implementation of make_step.


A mapping from signals to their associated live ndarrays.


Length of each simulation timestep, in seconds.


Random number generator for stochastic operators.


Organizes operators into dictionaries according to the signals they set/inc/read/update.

Copied from Nengo DL. See there for full documentation


Remove any Reset operators that are targeting a signal that is never modified.

If a signal is reset, but never inced/updated after that, we can just set the default signal value to the reset value and remove the reset. Note: this wouldn’t normally happen, but it can happen if we removed some of the incs (e.g. in remove_zero_incs).

operatorslist of Operator

Operators in the model

new_operatorslist of Operator

Modified list of operators


Remove any operators where we know the input (and therefore output) is zero.

If the input to a DotInc/ElementwiseInc/Copy/ConvInc is zero then we know that the output of the op will be zero, so we can just get rid of it.

operatorslist of Operator

Operators in the model

new_operatorslist of Operator

Modified list of operators


Copied from nengo_dl/


Apply simplifications to a list of operators, returning a simplified list.

Applies all the simplifications in nengo_ocl.operators.simplifications.

Python AST conversion

This module contains a parser to turn Python functions into OCL code.


Better testing, i.e., write test cases for all the test functions at the bottom of this file.


Get binary_and, or, xor, etc. functions working (priority = low)

  • this will require the ability to specify integer input variables

  • or perhaps just cast inputs to these functions to integers


A danger right now is that there is no check that the user uses all passed inputs in their function. For example, if the fn is meant to act on three arguments, and the user makes a mistake in their model that passes a 5-vector to the function, no warning is issued. There’s no obvious way to deal with this better, though.

class nengo_ocl.ast_conversion.Expression[source]

Represents a numerical expression

class nengo_ocl.ast_conversion.VarExp(name)[source]
class nengo_ocl.ast_conversion.NumExp(value)[source]
class nengo_ocl.ast_conversion.UnaryExp(op, right)[source]
class nengo_ocl.ast_conversion.BinExp(left, op, right)[source]
class nengo_ocl.ast_conversion.FuncExp(fn, *args)[source]
class nengo_ocl.ast_conversion.IfExp(cond, true, false)[source]
class nengo_ocl.ast_conversion.FunctionFinder[source]

Finds a FunctionDef or Lambda in an Abstract Syntax Tree

generic_visit(self, stmt)[source]

Called if no explicit visitor function exists for a node.

class nengo_ocl.ast_conversion.OclTranslator(source, globals_dict, closure_dict, in_dims=None, out_dim=None)[source]
visit(self, node)[source]

Visit a node.


Utility functions and compatibility imports.

nengo_ocl.utils.equal_strides(strides1, strides2, shape)[source]

Check whether two arrays have equal strides.

Code from

nengo_ocl.utils.split(iterator, criterion)[source]

Returns a list of objects that match criterion and those that do not.