NengoDL Simulator

This is the class that allows users to access the nengo_dl backend. This can be used as a drop-in replacement for nengo.Simulator (i.e., simply replace any instance of nengo.Simulator with nengo_dl.Simulator and everything will continue to function as normal).

In addition, the Simulator exposes features unique to the NengoDL backend, such as Simulator.train.

The full class documentation can be viewed in the API Reference; here we will explain the practical usage of the Simulator in more depth.

Simulator arguments

The NengoDL Simulator has a number of optional arguments, beyond those in nengo.Simulator, which control features specific to the nengo_dl backend.


This specifies the computational device on which the simulation will run. The default is None, which means that operations will be assigned according to TensorFlow’s internal logic (generally speaking, this means that things will be assigned to the GPU if tensorflow-gpu is installed, otherwise everything will be assigned to the CPU). The device can be set manually by passing the TensorFlow device specification to this parameter. For example, setting device="/cpu:0" will force everything to run on the CPU. This may be worthwhile for small models, where the extra overhead of communicating with the GPU outweighs the actual computations. On systems with multiple GPUs, device="/gpu:0"/"/gpu:1"/etc. will select which one to use.


This controls how many simulation iterations are executed each time through the outer simulation loop. That is, we could run 20 timesteps as

for i in range(20):
    <run 1 step>


for i in range(5):
    <run 1 step>
    <run 1 step>
    <run 1 step>
    <run 1 step>

This is an optimization process known as “loop unrolling”, and unroll_simulation controls how many simulation steps are unrolled. The first example above would correspond to unroll_simulation=1, and the second would be unroll_simulation=4.

Unrolling the simulation will result in faster simulation speed, but increased build time and memory usage.

In general, unrolling the simulation will have no impact on the output of a simulation. The only case in which unrolling may have an impact is if the number of simulation steps is not evenly divisible by unroll_simulation. In that case extra simulation steps will be executed, and then data will be truncated to the correct number of steps. However, those extra steps could still change the internal state of the simulation, which will affect any subsequent calls to So it is recommended that the number of steps always be evenly divisible by unroll_simulation.


nengo_dl allows a model to be simulated with multiple simultaneous inputs, processing those values in parallel through the network. For example, instead of executing a model three times with three different inputs, the model can be executed once with those three inputs in parallel. minibatch_size specifies how many inputs will be processed at a time. The default is None, meaning that this feature is not used and only one input will be processed at a time (as in standard Nengo simulators).

In order to take advantage of the parallel inputs, multiple inputs need to be passed to via the data argument. This is discussed in more detail below.

When using Simulator.train, this parameter controls how many items from the training data will be used for each optimization iteration.


This can be used to specify an output directory if you would like to export data from the simulation in a format that can be visualized in TensorBoard.

To view the collected data, run the command

tensorboard --logdir <tensorboard_dir>

(where tensorboard_dir is the directory name passed to tensorboard), then open a web browser and navigate to http://localhost:6006.

By default the TensorBoard output will only contain a visualization of the TensorFlow graph constructed for this Simulator. However, TensorBoard can also be used to track various aspects of the simulation throughout the training process; see the sim.train documentation for details.

Repeated Simulator calls with the same output directory will be organized into subfolders according to run number (e.g., <tensorboard_dir>/run_0). arguments (and its variations Simulator.step/ Simulator.run_steps) also have some optional parameters beyond those in the standard Nengo simulator.


This parameter can be used to override the value of any input Node in a model (an input node is defined as a node with no incoming connections). For example

n_steps = 5

with nengo.Network() as net:
    node = nengo.Node([0])
    p = nengo.Probe(node)

with nengo_dl.Simulator(net) as sim:

will execute the model in the standard way, and if we check the output of node

>>> [[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]

we see that it is all zero, as defined.

data is specified as a dictionary of {my_node: override_value} pairs, where my_node is the Node to be overridden and override_value is a numpy array with shape (minibatch_size, n_steps, my_node.size_out) that gives the Node output value on each simulation step. For example, if we instead run the model via

sim.run_steps(n_steps, data={node: np.ones((1, n_steps, 1))})
>>> [[ 1.] [ 1.] [ 1.] [ 1.] [ 1.]]

we see that the output of node is all ones, which is the override value we specified.

data is usually used in concert with the minibatching feature of nengo_dl (see above). nengo_dl allows multiple inputs to be processed simultaneously, but when we construct a Node we can only specify one value. For example, if we use minibatching on the above network

mini = 3
with nengo_dl.Simulator(net, minibatch_size=mini) as sim:
>>> [[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
     [[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
     [[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]]

we see that the output is an array of zeros with size (mini, n_steps, 1). That is, we simulated 3 inputs simultaneously, but those inputs all had the same value (the one we defined when the Node was constructed) so it wasn’t very useful. To take full advantage of the minibatching we need to override the node values, so that we can specify a different value for each item in the minibatch:

with nengo_dl.Simulator(net, minibatch_size=mini) as sim:
    sim.run_steps(n_steps, data={
        node: np.zeros((mini, n_steps, 1)) + np.arange(mini)[:, None, None]})
>>> [[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
     [[ 1.] [ 1.] [ 1.] [ 1.] [ 1.]]
     [[ 2.] [ 2.] [ 2.] [ 2.] [ 2.]]]

Here we can see that 3 independent inputs have been processed during the simulation. In a simple network such as this, minibatching will not make much difference. But for larger models it will be much more efficient to process multiple inputs in parallel rather than one at a time.


If set to True, profiling data will be collected while the simulation runs. This will significantly slow down the simulation, so it should be left on False (the default) in most cases. It is mainly used by developers, in order to help identify performance bottlenecks.

Profiling data will be saved to a file named nengo_dl_profile.json. It can be viewed by opening a Chrome browser, navigating to chrome://tracing and loading the nengo_dl_profile.json file. A filename can be passed instead of True, to change the output filename.

Note that in order for GPU profiling to work, you need to manually add <cuda>/extras/CUPTI/libx64 to the LD_LIBRARY_PATH environment variable (where <cuda> is your CUDA installation directory).