Optimization & machine learning with TensorFlow

Note

The content in this page is suited to more advanced users who already have an understanding of Strawberry Fields, e.g., those who have completed the teleportation tutorial. Some basic knowledge of TensorFlow is also helpful.

Note

This tutorial requires TensorFlow 2.0 and above. TensorFlow can be installed via pip:

pip install tensorflow

For more installation details and instructions, please refer to the TensorFlow documentation.

In this demonstration, we show how the user can carry out optimization and machine learning on quantum circuits in Strawberry Fields. This functionality is provided via the TensorFlow simulator backend. By leveraging TensorFlow, we have access to a number of additional functionalities, including GPU integration, automatic gradient computation, built-in optimization algorithms, and other machine learning tools.

Basic functionality

As usual, we can initialize a Strawberry Fields program:

import numpy as np
import strawberryfields as sf
from strawberryfields.ops import *

prog = sf.Program(2)

Replacing numbers with Tensors

When a circuit contains only numerical parameters, the TensorFlow backend works the same as the other backends. However, with the TensorFlow backend, we have the additional option to use TensorFlow tf.Variable and tf.tensor objects for the parameters of Blackbird states, gates, and measurements.

To construct the circuit to allow for TensorFlow objects, we must use symbolic parameters as the gate arguments. These can be created via the Program.params() method:

import tensorflow as tf

# we can create symbolic parameters one by one
alpha = prog.params("alpha")

# or create multiple at the same time
theta_bs, phi_bs = prog.params("theta_bs", "phi_bs")

with prog.context as q:
    # States
    Coherent(alpha) | q[0]
    # Gates
    BSgate(theta_bs, phi_bs) | (q[0], q[1])
    # Measurements
    MeasureHomodyne(0.0) | q[0]

To run a Strawberry Fields simulation with the TensorFlow backend, we need to specify 'tf' as the backend argument when initializing the engine:

eng = sf.Engine(backend="tf", backend_options={"cutoff_dim": 7})

We can now run our program using eng.run(). However, directly evaluating a circuit which contains symbolic parameters using eng.run() will produce errors. The reason for this is that eng.run() tries, by default, to numerically evaluate any measurement result. But we have not provided numerical values for the symbolic arguments yet!

Out:

ParameterError: {alpha}: unbound parameter with no default value.

To bind a numerical value to the free parameters, we pass a mapping from parameter name to value using the args keyword argument when running the engine:

mapping = {"alpha": tf.Variable(0.5), "theta_bs": tf.constant(0.4), "phi_bs": tf.constant(0.0)}

result = eng.run(prog, args=mapping)

This code will execute without error, and both the output results and the output samples will contain numeric values based on the given value for the angle phi:

Out:

tf.Tensor([[-0.1355+0.j]], shape=(1, 1), dtype=complex64)

We can select measurement results at other angles by supplying different values for phi in mapping. We can also return and perform processing on the underlying state:

state = result.state
print("Density matrix element [0,0,1,2]:", state.dm()[0, 0, 1, 2])

Out:

Density matrix element [0,0,1,2]: tf.Tensor((0.0050297542-1.4593575e-15j), shape=(), dtype=complex64)

Computing gradients

When used within a gradient tape context, any output generated from the state class can be differentiated natively using GradientTape().gradient(). For example, lets displace a one-mode program, and then compute the gradient of the mean photon number:

eng.reset()
prog = sf.Program(1)

alpha = prog.params("alpha")

with prog.context as q:
    Dgate(alpha) | q

# Assign our TensorFlow variables, so that we can
# refer to them later when differentiating/training.
a = tf.Variable(0.43)

with tf.GradientTape() as tape:
    # Here, we map our quantum free parameter `alpha`
    # to our TensorFlow variable `a` and pass it to the engine.

    result = eng.run(prog, args={"alpha": a})
    state = result.state

    # Note that all processing, including state-based post-processing,
    # must be done within the gradient tape context!
    mean, var = state.mean_photon(0)

# test that the gradient of the mean photon number is correct

grad = tape.gradient(mean, [a])
print("Gradient:", grad)

Out:

Gradient: [<tf.Tensor: shape=(), dtype=float32, numpy=0.85999966>]

The mean photon number of a displaced state \(|\alpha\rangle\) is given by \(\langle \hat{n} \rangle(\alpha) = |\alpha|^2\), and thus, for real-valued \(\alpha\), the gradient is given by \(\langle \hat{n}\rangle'(\alpha) = 2\alpha\):

print("Exact gradient:", 2 * a)
print("Exact and TensorFlow gradient agree:", np.allclose(grad, 2 * a))

Out:

Exact gradient: tf.Tensor(0.86, shape=(), dtype=float32)
Exact and TensorFlow gradient agree: True

Processing data

The parameters for Blackbird states, gates, and measurements may be more complex than just raw data or machine learning weights. These can themselves be the outputs from some learnable function like a neural network 1.

We can also use the sf.math module to allow arbitrary processing of measurement results within the TensorFlow backend, by manipulating the .par attribute of the measured register. Note that the math functions in sf.math will be automatically converted to the equivalent TensorFlow tf.math function:

eng.reset()
prog = sf.Program(2)

with prog.context as q:
    MeasureX | q[0]
    Dgate(sf.math.sin(q[0].par)) | q[1]

result = eng.run(prog)
print("Measured Homodyne sample from mode 0:", result.samples[0][0])

mean, var = result.state.mean_photon(0)
print("Mean photon number of mode 0:", mean)

mean, var = result.state.mean_photon(1)
print("Mean photon number of mode 1:", mean)

Out:

Measured Homodyne sample from mode 0: tf.Tensor((-0.53070533+0j), shape=(), dtype=complex64)
Mean photon number of mode 0: tf.Tensor(0.0, shape=(), dtype=float32)
Mean photon number of mode 1: tf.Tensor(0.2561794, shape=(), dtype=float32)

The mean photon number of mode 1 should be given by \(\sin(q[0])^2\), where \(q[0]\) is the measured Homodyne sample value:

q0 = result.samples[0][0]
print(np.sin(q0) ** 2)

Out:

(0.25617950329070993-0j)

Working with batches

It is common in machine learning to process data in batches. Strawberry Fields supports both unbatched and batched data when using the TensorFlow backend. Unbatched operation is the default behaviour (shown above). To enable batched operation, you should provide an extra batch_size argument within the backend_options dictionary 2, e.g.,

# run simulation in batched-processing mode
batch_size = 3
prog = sf.Program(1)
eng = sf.Engine("tf", backend_options={"cutoff_dim": 15, "batch_size": batch_size})

x = prog.params("x")

with prog.context as q:
    Thermal(x) | q[0]

x_val = tf.Variable([0.1, 0.2, 0.3])
result = eng.run(prog, args={"x": x_val})
print("Mean photon number of mode 0 (batched):", result.state.mean_photon(0)[0])

Out:

Mean photon number of mode 0 (batched): tf.Tensor([0.1 0.2 0.3], shape=(3,), dtype=float32)

Note

The batch size should be static, i.e., not changing over the course of a computation.

Parameters supplied to a circuit in batch-mode operation can either be scalars or vectors (of length batch_size). Scalars are automatically broadcast over the batch dimension.

alpha = tf.Variable([0.5] * batch_size)
theta = tf.constant(0.0)
phi = tf.Variable([0.1, 0.33, 0.5])

Measurement results will be returned as Tensors with shape (batch_size,). We can picture batch-mode operation as simulating multiple circuit configurations at the same time. Combined with appropriate parallelized hardware like GPUs, this can result in significant speedups compared to serial evaluation.

Example: variational quantum circuit optimization

A key element of machine learning is optimization. We can use TensorFlow’s automatic differentiation tools to optimize the parameters of variational quantum circuits. In this approach, we fix a circuit architecture where the states, gates, and/or measurements may have learnable parameters associated with them. We then define a loss function based on the output state of this circuit.

Warning

The state representation in the simulator can change from a ket (pure) to a density matrix (mixed) if we use certain operations (e.g., state preparations). We can check state.is_pure to determine which representation is being used.

In the example below, we optimize a Dgate to produce an output with the largest overlap with the Fock state \(n=1\). In this example, we compute the loss function imperatively within a gradient tape, and then apply the gradients using the chosen optimizer:

# initialize engine and program objects
eng = sf.Engine(backend="tf", backend_options={"cutoff_dim": 7})
circuit = sf.Program(1)

tf_alpha = tf.Variable(0.1)
tf_phi = tf.Variable(0.1)

alpha, phi = circuit.params("alpha", "phi")

with circuit.context as q:
    Dgate(alpha, phi) | q[0]

opt = tf.keras.optimizers.Adam(learning_rate=0.1)
steps = 50

for step in range(steps):

    # reset the engine if it has already been executed
    if eng.run_progs:
        eng.reset()

    with tf.GradientTape() as tape:
        # execute the engine
        results = eng.run(circuit, args={"alpha": tf_alpha, "phi": tf_phi})
        # get the probability of fock state |1>
        prob = results.state.fock_prob([1])
        # negative sign to maximize prob
        loss = -prob

    gradients = tape.gradient(loss, [tf_alpha, tf_phi])
    opt.apply_gradients(zip(gradients, [tf_alpha, tf_phi]))
    print("Probability at step {}: {}".format(step, prob))

Out:

Probability at step 0: 0.009900499135255814
Probability at step 1: 0.03843098133802414
Probability at step 2: 0.0808338075876236
Probability at step 3: 0.1331305354833603
Probability at step 4: 0.19035020470619202
Probability at step 5: 0.24677760899066925
Probability at step 6: 0.2965959310531616
Probability at step 7: 0.3348417580127716
Probability at step 8: 0.35853689908981323
Probability at step 9: 0.3676064908504486
Probability at step 10: 0.3649454414844513
Probability at step 11: 0.3552566468715668
Probability at step 12: 0.3432542383670807
Probability at step 13: 0.33238688111305237
Probability at step 14: 0.32456544041633606
Probability at step 15: 0.3205086290836334
Probability at step 16: 0.32019278407096863
Probability at step 17: 0.3231736719608307
Probability at step 18: 0.3287637233734131
Probability at step 19: 0.3361213207244873
Probability at step 20: 0.3443049192428589
Probability at step 21: 0.35233551263809204
Probability at step 22: 0.3592878580093384
Probability at step 23: 0.3644119203090668
Probability at step 24: 0.367265909910202
Probability at step 25: 0.3678153157234192
Probability at step 26: 0.36645036935806274
Probability at step 27: 0.36389175057411194
Probability at step 28: 0.36100494861602783
Probability at step 29: 0.35858556628227234
Probability at step 30: 0.35718992352485657
Probability at step 31: 0.35705265402793884
Probability at step 32: 0.3580944836139679
Probability at step 33: 0.35999566316604614
Probability at step 34: 0.36230403184890747
Probability at step 35: 0.3645508885383606
Probability at step 36: 0.3663528263568878
Probability at step 37: 0.36747899651527405
Probability at step 38: 0.3678767681121826
Probability at step 39: 0.36765244603157043
Probability at step 40: 0.3670196831226349
Probability at step 41: 0.366233766078949
Probability at step 42: 0.3655299246311188
Probability at step 43: 0.36507946252822876
Probability at step 44: 0.3649671971797943
Probability at step 45: 0.36519041657447815
Probability at step 46: 0.36567458510398865
Probability at step 47: 0.36629951000213623
Probability at step 48: 0.36692991852760315
Probability at step 49: 0.3674468994140625

After 50 or so iterations, the optimization should converge to the optimal probability value of \(e^{-1}\approx 0.3678\) at a parameter of \(\alpha=1\) 3.

In cases where we want to take advantage of TensorFlow’s graph mode, or do not need to extract/accumulate intermediate values used to calculate the cost function, we could also make use of TensorFlow’s Optimizer().minimize method, which requires the loss be defined as a function with no arguments:

def loss():
    # reset the engine if it has already been executed
    if eng.run_progs:
        eng.reset()

    # execute the engine
    results = eng.run(circuit, args={"alpha": tf_alpha, "phi": tf_phi})
    # get the probability of fock state |1>
    prob = results.state.fock_prob([1])
    # negative sign to maximize prob
    return -prob


tf_alpha = tf.Variable(0.1)
tf_phi = tf.Variable(0.1)


for step in range(steps):
    # In eager mode, calling Optimizer.minimize performs a single optimization step,
    # and automatically updates the parameter values.
    _ = opt.minimize(loss, [tf_alpha, tf_phi])
    parameter_vals = [tf_alpha.numpy(), tf_phi.numpy()]
    print("Parameter values at step {}: {}".format(step, parameter_vals))

Out:

Parameter values at step 0: [0.17085811, 0.1]
Parameter values at step 1: [0.2653996, 0.1]
Parameter values at step 2: [0.3756343, 0.1]
Parameter values at step 3: [0.49840698, 0.1]
Parameter values at step 4: [0.63179624, 0.1]
Parameter values at step 5: [0.77331054, 0.1]
Parameter values at step 6: [0.917955, 0.1]
Parameter values at step 7: [1.056465, 0.1]
Parameter values at step 8: [1.1767575, 0.1]
Parameter values at step 9: [1.2695966, 0.1]
Parameter values at step 10: [1.3319763, 0.1]
Parameter values at step 11: [1.3654183, 0.1]
Parameter values at step 12: [1.3732628, 0.1]
Parameter values at step 13: [1.3591938, 0.1]
Parameter values at step 14: [1.3267637, 0.1]
Parameter values at step 15: [1.2793745, 0.1]
Parameter values at step 16: [1.2204301, 0.1]
Parameter values at step 17: [1.1535463, 0.09980923]
Parameter values at step 18: [1.0827549, 0.09982842]
Parameter values at step 19: [1.0126098, 0.09984581]
Parameter values at step 20: [0.94803596, 0.10005598]
Parameter values at step 21: [0.8937802, 0.1002464]
Parameter values at step 22: [0.85358983, 0.10041891]
Parameter values at step 23: [0.8295507, 0.10077327]
Parameter values at step 24: [0.82193893, 0.10089472]
Parameter values at step 25: [0.8295212, 0.10100472]
Parameter values at step 26: [0.8500015, 0.10090252]
Parameter values at step 27: [0.8804064, 0.10080997]
Parameter values at step 28: [0.91736925, 0.100726165]
Parameter values at step 29: [0.9573659, 0.10065028]
Parameter values at step 30: [0.99695784, 0.10058158]
Parameter values at step 31: [1.0330575, 0.10072726]
Parameter values at step 32: [1.0631744, 0.100859135]
Parameter values at step 33: [1.0855811, 0.10076829]
Parameter values at step 34: [1.0993627, 0.100686066]
Parameter values at step 35: [1.1043644, 0.10061165]
Parameter values at step 36: [1.1010801, 0.1005443]
Parameter values at step 37: [1.090526, 0.10048336]
Parameter values at step 38: [1.0741205, 0.10042821]
Parameter values at step 39: [1.0535735, 0.1001612]
Parameter values at step 40: [1.0307757, 0.09991963]
Parameter values at step 41: [1.0076759, 0.09970108]
Parameter values at step 42: [0.9861408, 0.099723905]
Parameter values at step 43: [0.967804, 0.09952304]
Parameter values at step 44: [0.9539274, 0.09911885]
Parameter values at step 45: [0.9452985, 0.098753266]
Parameter values at step 46: [0.9421848, 0.09842261]
Parameter values at step 47: [0.94434804, 0.09812356]
Parameter values at step 48: [0.9511101, 0.09785311]
Parameter values at step 49: [0.9614545, 0.097608544]

Other high-level optimization interfaces, such as Keras, may also be used to train models built using the Strawberry Fields TensorFlow backend.

Warning

When optimizing circuits which contains energy-changing operations (displacement, squeezing, etc.), one should be careful to monitor that the state does not leak out of the given truncation level. This can be accomplished by regularizing or using tf.clip_by_value on the relevant parameters.

TensorFlow supports a large set of mathematical and machine learning operations which can be applied to a circuit’s output state to enable further processing. Examples include tf.norm, tf.linalg.eigh, tf.linalg.svd, and tf.linalg.inv 4.

Exercise: Hong-Ou-Mandel Effect

Use the optimization methods outlined above to find the famous Hong-Ou-Mandel effect, where photons bunch together in the same mode. Your circuit should contain two modes, each with a single-photon input state, and a beamsplitter with variable parameters. By optimizing the beamsplitter parameters, minimize the probability of the \(|1,1\rangle\) Fock-basis element of the output state.

Footnotes

1

Note that certain operations—in particular, measurements—may not have gradients defined within TensorFlow. When optimizing via gradient descent, we must be careful to define a circuit which is end-to-end differentiable.

2

Note that batch_size should not be set to 1. Instead, use batch_size=None, or just omit the batch_size argument.

3

In this tutorial, we have applied classical machine learning tools to learn a quantum optical circuit. Of course, there are many other possibilities for combining machine learning and quantum computing, e.g., using quantum algorithms to speed up machine learning subroutines, or fully quantum learning on unprocessed quantum data.

4

Remember that it might be necessary to reshape a ket or density matrix before using some of these functions.

Total running time of the script: ( 0 minutes 2.707 seconds)

Gallery generated by Sphinx-Gallery