r""" .. role:: html(raw) :format: html .. _qnn: Quantum neural network ====================== "Neural Network are not black boxes. They are a big pile of linear algebra." - Randall Munroe, xkcd _ Machine learning has a wide range of models for tasks such as classification, regression, and clustering. Neural networks are one of the most successful models, having experienced a resurgence in use over the past decade due to improvements in computational power and advanced software libraries. The typical structure of a neural network consists of a series of interacting layers that perform transformations on data passing through the network. An archetypal neural network structure is the feedforward neural network, visualized by the following example: :html:
.. image:: /tutorials/images/neural_network.svg :align: center :width: 85% :target: javascript:void(0); :html:
Here, the neural network depth is determined by the number of layers, while the maximum width is given by the layer with the greatest number of neurons. The network begins with an input layer of real-valued neurons, which feed forward onto a series of one or more hidden layers. Following the notation of [_], if the :math:n neurons at one layer are given by the vector :math:\mathbf{x} \in \mathbb{R}^{n}, the :math:m neurons of the next layer take the values .. math:: \mathcal{L}(\mathbf{x}) = \varphi (W \mathbf{x} + \mathbf{b}), where * :math:W \in \mathbb{R}^{m \times n} is a matrix, * :math:b \in \mathbb{R}^{m} is a vector, and * :math:\varphi is a nonlinear function (also known as the activation function). The matrix multiplication :math:W \mathbf{x} is a linear transformation on :math:\mathbf{x}, while :math:W \mathbf{x} + \mathbf{b} represents an **affine transformation**. In principle, any nonlinear function can be chosen for :math:\varphi, but often the choice is fixed from a standard set of activations _ that include the rectified linear unit (ReLU) and the sigmoid function acting on each neuron. Finally, the output layer enacts an affine transformation on the last hidden layer, but the activation function may be linear (including the identity), or a different nonlinear function such as softmax _ (for classification). Layers in the feedforward neural network above are called **fully connected** as every neuron in a given hidden layer or output layer can be connected to all neurons in the previous layer through the matrix :math:W. Over time, specialized versions of layers have been developed to focus on different problems. For example, convolutional layers have a restricted form of connectivity and are suited to machine learning with images. We focus here on fully connected layers as the most general type. Training of neural networks uses variations of the gradient descent _ algorithm on a cost function characterizing the similarity between outputs of the neural network and training data. The gradient of the cost function can be calculated using automatic differentiation _, with knowledge of the feedforward network structure. Quantum neural networks aim to encode neural networks into a quantum system, with the intention of benefiting from quantum information processing. There have been numerous attempts to define a quantum neural network, each with varying advantages and disadvantages. The quantum neural network detailed below, following the work of [_], has a CV architecture and is realized using standard CV gates from Strawberry Fields. One advantage of this CV architecture is that it naturally accommodates for the continuous nature of neural networks. Additionally, the CV model is able to easily apply non-linear transformations using the phase space picture - a task which qubit-based models struggle with, often relying on measurement postselection which has a probability of failure. Implementation -------------- A CV quantum neural network layer can be defined as .. math:: \mathcal{L} := \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1}, where * :math:\mathcal{U}_{k}=U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k}) is an :math:N mode interferometer, * :math:\mathcal{D}=\otimes_{i=1}^{N}D(\alpha_{i}) is a single mode displacement gate (:class:~strawberryfields.ops.Dgate) with complex displacement :math:\alpha_{i} \in \mathbb{C}, * :math:\mathcal{S}=\otimes_{i=1}^{N}S(r_{i}) is a single mode squeezing gate (:class:~strawberryfields.ops.Sgate) acting on each mode with squeezing parameter :math:r_{i} \in \mathbb{R}, and * :math:\Phi=\otimes_{i=1}^{N}\Phi(\lambda_{i}) is a non-Gaussian gate on each mode with parameter :math:\lambda_{i} \in \mathbb{R}. .. note:: Any non-Gaussian gate such as the cubic phase gate (:class:~strawberryfields.ops.Vgate) represents a valid choice, but we recommend the Kerr gate (:class:~strawberryfields.ops.Kgate) for simulations in Strawberry Fields. The Kerr gate is more accurate numerically because it is diagonal in the Fock basis. The layer is shown below as a circuit: :html:
.. image:: /tutorials/images/layer.svg :align: center :width: 70% :target: javascript:void(0); :html:
These layers can then be composed to form a quantum neural network. The width of the network can also be varied between layers [_]. Reproducing classical neural networks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's see how the quantum layer can embed the transformation :math:\mathcal{L}(\mathbf{x}) = \varphi (W \mathbf{x} + \mathbf{b}) of a classical neural network layer. Suppose :math:N-dimensional data is encoded in position eigenstates so that .. math:: \mathbf{x} \Leftrightarrow \ket{\mathbf{x}} := \ket{x_{1}} \otimes \ldots \otimes \ket{x_{N}}. We want to perform the transformation .. math:: \ket{\mathbf{x}} \Rightarrow \ket{\varphi (W \mathbf{x} + \mathbf{b})}. It turns out that the quantum circuit above can do precisely this! Consider first the affine transformation :math:W \mathbf{x} + \mathbf{b}. Leveraging the singular value decomposition, we can always write :math:W = O_{2} \Sigma O_{1} with :math:O_{k} orthogonal matrices and :math:\Sigma a positive diagonal matrix. These orthogonal transformations can be carried out using interferometers without access to phase, i.e., with :math:\boldsymbol{\phi}_{k} = 0: .. math:: U_{k}(\boldsymbol{\theta}_{k},\mathbf{0})\ket{\mathbf{x}} = \ket{O_{k} \mathbf{x}}. On the other hand, the diagonal matrix :math:\Sigma = {\rm diag}\left(\{c_{i}\}_{i=1}^{N}\right) can be achieved through squeezing: .. math:: \otimes_{i=1}^{N}S(r_{i})\ket{\mathbf{x}} \propto \ket{\Sigma \mathbf{x}}, with :math:r_{i} = \log (c_{i}). Finally, the addition of a bias vector :math:\mathbf{b} is done using position displacement gates: .. math:: \otimes_{i=1}^{N}D(\alpha_{i})\ket{\mathbf{x}} = \ket{\mathbf{x} + \mathbf{b}}, with :math:\mathbf{b} = \{\alpha_{i}\}_{i=1}^{N} and :math:\alpha_{i} \in \mathbb{R}. Putting this all together, we see that the operation :math:\mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1} with phaseless interferometers and position displacement performs the transformation :math:\ket{\mathbf{x}} \Rightarrow \ket{W \mathbf{x} + \mathbf{b}} on position eigenstates. .. warning:: The TensorFlow backend is the natural simulator for quantum neural networks in Strawberry Fields, but this backend cannot naturally accommodate position eigenstates, which require infinite squeezing. For simulation of position eigenstates in this backend, the best approach is to use a displaced squeezed state (:class:prepare_displaced_squeezed_state ) with high squeezing value r. However, to avoid significant numerical error, it is important to make sure that all initial states have negligible amplitude for Fock states :math:\ket{n} with :math:n\geq \texttt{cutoff_dim}, where :math:\texttt{cutoff_dim} is the cutoff dimension. Finally, the nonlinear function :math:\varphi can be achieved through a restricted type of non-Gaussian gates :math:\otimes_{i=1}^{N}\Phi(\lambda_{i}) acting on each mode (see [_] for more details), resulting in the transformation .. math:: \otimes_{i=1}^{N}\Phi(\lambda_{i})\ket{\mathbf{x}} = \ket{\varphi(\mathbf{x})}. The operation :math:\mathcal{L} = \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1} with phaseless interferometers, position displacements, and restricted non-Gaussian gates can hence be seen as enacting a classical neural network layer :math:\ket{\mathbf{x}} \Rightarrow \ket{\phi(W \mathbf{x} + \mathbf{b})} on position eigenstates. Extending to quantum neural networks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In fact, CV quantum neural network layers can be made more expressive than their classical counterparts. We can do this by lifting the above restrictions on :math:\mathcal{L}, i.e.: - Using arbitrary interferometers :math:U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k}) with access to phase and general displacement gates (i.e., not necessarily position displacement). This allows :math:\mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1} to represent a general Gaussian operation. - Using arbitrary non-Gaussian gates :math:\Phi(\lambda_{i}), such as the Kerr gate. - Encoding data outside of the position eigenbasis, for example using instead the Fock basis. In fact, gates in a single layer form a universal gate set, making the CV quantum neural network a model for universal quantum computing, i.e., a sufficient number of layers can carry out any quantum algorithm implementable on a CV quantum computer. CV quantum neural networks can be trained both through classical simulation and directly on quantum hardware. Strawberry Fields relies on classical simulation to evaluate cost functions of the CV quantum neural network and the resultant gradients with respect to parameters of each layer. However, this becomes an intractable task with increasing network depth and width. Ultimately, direct evaluation on hardware will likely be necessary to large scale networks; an approach for hardware-based training is mapped out in [_]. The PennyLane _ library provides tools for training hybrid quantum-classical machine learning models, using both simulators and real-world quantum hardware. Example CV quantum neural network layers are shown, for one to four modes, below: :html:
.. figure:: /tutorials/images/layer_1mode.svg :align: center :width: 31% :target: javascript:void(0); One mode layer :html:
.. figure:: /tutorials/images/layer_2mode.svg :align: center :width: 46% :target: javascript:void(0); Two mode layer :html:
.. figure:: /tutorials/images/layer_3mode.svg :align: center :width: 75% :target: javascript:void(0); Three mode layer :html:
.. figure:: /tutorials/images/layer_4mode.svg :align: center :width: 90% :target: javascript:void(0); Four mode layer :html:
Here, the multimode linear interferometers :math:U_{1} and :math:U_{2} have been decomposed into two-mode phaseless beamsplitters (:class:~strawberryfields.ops.BSgate) and single-mode phase shifters (:class:~strawberryfields.ops.Rgate) using the Clements decomposition [_]. The Kerr gate is used as the non-Gaussian gate. Code ---- First, we import Strawberry Fields, TensorFlow, and NumPy: """ import numpy as np import tensorflow as tf import strawberryfields as sf from strawberryfields import ops ###################################################################### # Before we begin defining our optimization problem, let's first create # some convenient utility functions. # # Utility functions # ~~~~~~~~~~~~~~~~~ # # The first step to writing a CV quantum neural network layer in Strawberry Fields is to define a # function for the two interferometers: def interferometer(params, q): """Parameterised interferometer acting on N modes. Args: params (list[float]): list of length max(1, N-1) + (N-1)*N parameters. * The first N(N-1)/2 parameters correspond to the beamsplitter angles * The second N(N-1)/2 parameters correspond to the beamsplitter phases * The final N-1 parameters correspond to local rotation on the first N-1 modes q (list[RegRef]): list of Strawberry Fields quantum registers the interferometer is to be applied to """ N = len(q) theta = params[:N*(N-1)//2] phi = params[N*(N-1)//2:N*(N-1)] rphi = params[-N+1:] if N == 1: # the interferometer is a single rotation ops.Rgate(rphi) | q return n = 0 # keep track of free parameters # Apply the rectangular beamsplitter array # The array depth is N for l in range(N): for k, (q1, q2) in enumerate(zip(q[:-1], q[1:])): # skip even or odd pairs depending on layer if (l + k) % 2 != 1: ops.BSgate(theta[n], phi[n]) | (q1, q2) n += 1 # apply the final local phase shifts to all modes except the last one for i in range(max(1, N - 1)): ops.Rgate(rphi[i]) | q[i] ###################################################################### # .. warning:: # # The :class:~strawberryfields.ops.Interferometer class in Strawberry Fields does not reproduce # the functionality above. Instead, :class:~strawberryfields.ops.Interferometer applies a given # input unitary matrix according to the Clements decomposition. # # Using the above interferometer function, an :math:N mode CV quantum neural network layer is # given by the function: def layer(params, q): """CV quantum neural network layer acting on N modes. Args: params (list[float]): list of length 2*(max(1, N-1) + N**2 + n) containing the number of parameters for the layer q (list[RegRef]): list of Strawberry Fields quantum registers the layer is to be applied to """ N = len(q) M = int(N * (N - 1)) + max(1, N - 1) int1 = params[:M] s = params[M:M+N] int2 = params[M+N:2*M+N] dr = params[2*M+N:2*M+2*N] dp = params[2*M+2*N:2*M+3*N] k = params[2*M+3*N:2*M+4*N] # begin layer interferometer(int1, q) for i in range(N): ops.Sgate(s[i]) | q[i] interferometer(int2, q) for i in range(N): ops.Dgate(dr[i], dp[i]) | q[i] ops.Kgate(k[i]) | q[i] ###################################################################### # Finally, we define one more utility function to help us initialize # the TensorFlow weights for our quantum neural network layers: def init_weights(modes, layers, active_sd=0.0001, passive_sd=0.1): """Initialize a 2D TensorFlow Variable containing normally-distributed random weights for an N mode quantum neural network with L layers. Args: modes (int): the number of modes in the quantum neural network layers (int): the number of layers in the quantum neural network active_sd (float): the standard deviation used when initializing the normally-distributed weights for the active parameters (displacement, squeezing, and Kerr magnitude) passive_sd (float): the standard deviation used when initializing the normally-distributed weights for the passive parameters (beamsplitter angles and all gate phases) Returns: tf.Variable[tf.float32]: A TensorFlow Variable of shape [layers, 2*(max(1, modes-1) + modes**2 + modes)], where the Lth row represents the layer parameters for the Lth layer. """ # Number of interferometer parameters: M = int(modes * (modes - 1)) + max(1, modes - 1) # Create the TensorFlow variables int1_weights = tf.random.normal(shape=[layers, M], stddev=passive_sd) s_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd) int2_weights = tf.random.normal(shape=[layers, M], stddev=passive_sd) dr_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd) dp_weights = tf.random.normal(shape=[layers, modes], stddev=passive_sd) k_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd) weights = tf.concat( [int1_weights, s_weights, int2_weights, dr_weights, dp_weights, k_weights], axis=1 ) weights = tf.Variable(weights) return weights ###################################################################### # Optimization # ~~~~~~~~~~~~ # # Now that we have our utility functions, lets begin defining our optimization problem # In this particular example, let's create a 1 mode CVQNN with 8 layers and a Fock-basis # cutoff dimension of 6. We will train this QNN to output a desired target state; # a single photon state. # set the random seed tf.random.set_seed(137) np.random.seed(137) # define width and depth of CV quantum neural network modes = 1 layers = 8 cutoff_dim = 6 # defining desired state (single photon state) target_state = np.zeros(cutoff_dim) target_state = 1 target_state = tf.constant(target_state, dtype=tf.complex64) ###################################################################### # Now, let's initialize an engine with the TensorFlow "tf" backend, # and begin constructing out QNN program. # initialize engine and program eng = sf.Engine(backend="tf", backend_options={"cutoff_dim": cutoff_dim}) qnn = sf.Program(modes) # initialize QNN weights weights = init_weights(modes, layers) # our TensorFlow weights num_params = np.prod(weights.shape) # total number of parameters in our model ###################################################################### # To construct the program, we must create and use Strawberry Fields symbolic # gate arguments. These will be mapped to the TensorFlow variables on engine # execution. # Create array of Strawberry Fields symbolic gate arguments, matching # the size of the weights Variable. sf_params = np.arange(num_params).reshape(weights.shape).astype(np.str) sf_params = np.array([qnn.params(*i) for i in sf_params]) # Construct the symbolic Strawberry Fields program by # looping and applying layers to the program. with qnn.context as q: for k in range(layers): layer(sf_params[k], q) ###################################################################### # where sf_params is a real array of size [layers, 2*(max(1, modes-1) + modes**2 + modes)] # containing the symbolic gate arguments for the quantum neural network. # # Now that our QNN program is defined, we can create our **cost function**. # Our cost function simply executes the QNN on our engine using the values of the # input weights. # # Since we want to maximize the fidelity :math:f(w) = \langle \psi(w) | \psi_t\rangle # between our QNN output state :math:|\psi(w)\rangle and our target state # :math:\psi_t\rangle, we compute the inner product between the two statevectors, # as well as the norm :math:\left\lVert \psi(w) - \psi_t\right\rVert. # # Finally, we also return the trace of the output QNN state. This should always # have a value close to 1. If it deviates significantly from 1, this is an # indication that we need to increase our Fock-basis cutoff. def cost(weights): # Create a dictionary mapping from the names of the Strawberry Fields # symbolic gate parameters to the TensorFlow weight values. mapping = {p.name: w for p, w in zip(sf_params.flatten(), tf.reshape(weights, [-1]))} # run the engine state = eng.run(qnn, args=mapping).state ket = state.ket() difference = tf.reduce_sum(tf.abs(ket - target_state)) fidelity = tf.abs(tf.reduce_sum(tf.math.conj(ket) * target_state)) ** 2 return difference, fidelity, ket, tf.math.real(state.trace()) ###################################################################### # We are now ready to minimize our cost function using TensorFlow: # set up the optimizer opt = tf.keras.optimizers.Adam() cost_before, fidelity_before, _, _ = cost(weights) # Perform the optimization for i in range(1000): # reset the engine if it has already been executed if eng.run_progs: eng.reset() with tf.GradientTape() as tape: loss, fid, ket, trace = cost(weights) # one repetition of the optimization gradients = tape.gradient(loss, weights) opt.apply_gradients(zip([gradients], [weights])) # Prints progress at every rep if i % 1 == 0: print("Rep: {} Cost: {:.4f} Fidelity: {:.4f} Trace: {:.4f}".format(i, loss, fid, trace)) print("\nFidelity before optimization: ", fidelity_before.numpy()) print("Fidelity after optimization: ", fid.numpy()) print("\nTarget state: ", target_state.numpy()) print("Output state: ", np.round(ket.numpy(), decimals=3)) ###################################################################### # For more applications of CV quantum neural networks, see the :doc:state learning  # and :doc:gate synthesis  demonstrations. # # References # ---------- # # ..  Nathan Killoran, Thomas R Bromley, Juan Miguel Arrazola, Maria Schuld, Nicolás Quesada, and # Seth Lloyd. Continuous-variable quantum neural networks. arXiv preprint arXiv:1806.06871, # 2018. # # ..  Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating # analytic gradients on quantum hardware. Physical Review A, 99(3):032331, 2019. # # ..  William R Clements, Peter C Humphreys, Benjamin J Metcalf, W Steven Kolthammer, and Ian A # Walsmley. Optimal design for universal multiport interferometers. Optica, 3(12):1460–1465, # 2016. doi:10.1364/OPTICA.3.001460.