r"""
.. role:: html(raw)
    :format: html

.. _qnn:

Quantum neural network
======================

    "Neural Network are not black boxes. They are a big pile of linear algebra." - `Randall Munroe,
    xkcd <https://xkcd.com/1838/>`_

Machine learning has a wide range of models for tasks such as classification, regression, and
clustering. Neural networks are one of the most successful models, having experienced a resurgence
in use over the past decade due to improvements in computational power and advanced software
libraries. The typical structure of a neural network consists of a series of interacting layers that
perform transformations on data passing through the network. An archetypal neural network structure
is the feedforward neural network, visualized by the following example:

:html:`<br>`

.. image:: /tutorials/images/neural_network.svg
    :align: center
    :width: 85%
    :target: javascript:void(0);

:html:`<br>`

Here, the neural network depth is determined by the number of layers, while the maximum width is
given by the layer with the greatest number of neurons. The network begins with an input layer of
real-valued neurons, which feed forward onto a series of one or more hidden layers. Following the
notation of [[1]_], if the :math:`n` neurons at one layer are given by the
vector :math:`\mathbf{x} \in \mathbb{R}^{n}`, the :math:`m` neurons of the next layer take the
values

.. math:: \mathcal{L}(\mathbf{x}) = \varphi (W \mathbf{x} + \mathbf{b}),

where

* :math:`W \in \mathbb{R}^{m \times n}` is a matrix,

* :math:`b \in \mathbb{R}^{m}` is a vector, and

* :math:`\varphi` is a nonlinear function (also known as the activation function).

The matrix multiplication :math:`W \mathbf{x}` is a linear transformation on :math:`\mathbf{x}`,
while :math:`W \mathbf{x} + \mathbf{b}` represents an **affine transformation**. In principle, any
nonlinear function can be chosen for :math:`\varphi`, but often the choice is fixed from a `standard
set of activations <https://en.wikipedia.org/wiki/Activation_function>`_ that include the rectified
linear unit (ReLU) and the sigmoid function acting on each neuron. Finally, the output layer enacts
an affine transformation on the last hidden layer, but the activation function may be linear
(including the identity), or a different nonlinear function such as `softmax
<https://en.wikipedia.org/wiki/Softmax_function>`_ (for classification).

Layers in the feedforward neural network above are called **fully connected** as every neuron in a
given hidden layer or output layer can be connected to all neurons in the previous layer through the
matrix :math:`W`. Over time, specialized versions of layers have been developed to focus on
different problems. For example, convolutional layers have a restricted form of connectivity and are
suited to machine learning with images. We focus here on fully connected layers as the most general
type.

Training of neural networks uses variations of the `gradient descent
<https://en.wikipedia.org/wiki/Gradient_descent>`_ algorithm on a cost function characterizing the
similarity between outputs of the neural network and training data. The gradient of the cost
function can be calculated using `automatic differentiation
<https://en.wikipedia.org/wiki/Automatic_differentiation>`_, with knowledge of the feedforward
network structure.

Quantum neural networks aim to encode neural networks into a quantum system, with the intention of
benefiting from quantum information processing. There have been numerous attempts to define a
quantum neural network, each with varying advantages and disadvantages. The quantum neural network
detailed below, following the work of [[1]_], has a CV architecture and is
realized using standard CV gates from Strawberry Fields. One advantage of this CV architecture is
that it naturally accommodates for the continuous nature of neural networks. Additionally, the CV
model is able to easily apply non-linear transformations using the phase space picture - a task
which qubit-based models struggle with, often relying on measurement postselection which has a
probability of failure.

Implementation
--------------

A CV quantum neural network layer can be defined as

.. math:: \mathcal{L} := \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1},

where

* :math:`\mathcal{U}_{k}=U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k})` is an :math:`N` mode
  interferometer,

* :math:`\mathcal{D}=\otimes_{i=1}^{N}D(\alpha_{i})` is a single mode displacement gate
  (:class:`~strawberryfields.ops.Dgate`) with complex displacement :math:`\alpha_{i} \in \mathbb{C}`,

* :math:`\mathcal{S}=\otimes_{i=1}^{N}S(r_{i})` is a single mode squeezing gate
  (:class:`~strawberryfields.ops.Sgate`)
  acting on each mode with squeezing parameter :math:`r_{i} \in \mathbb{R}`, and

* :math:`\Phi=\otimes_{i=1}^{N}\Phi(\lambda_{i})` is a non-Gaussian gate on each mode with parameter
  :math:`\lambda_{i} \in \mathbb{R}`.

.. note::

    Any non-Gaussian gate such as the cubic phase gate (:class:`~strawberryfields.ops.Vgate`)
    represents a valid choice, but we recommend the Kerr gate (:class:`~strawberryfields.ops.Kgate`)
    for simulations in Strawberry Fields. The Kerr gate is more accurate numerically because it is
    diagonal in the Fock basis.

The layer is shown below as a circuit:

:html:`<br>`

.. image:: /tutorials/images/layer.svg
    :align: center
    :width: 70%
    :target: javascript:void(0);

:html:`<br>`

These layers can then be composed to form a quantum neural network. The width of the network can
also be varied between layers [[1]_].

Reproducing classical neural networks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's see how the quantum layer can embed the transformation :math:`\mathcal{L}(\mathbf{x}) =
\varphi (W \mathbf{x} + \mathbf{b})` of a classical neural network layer. Suppose
:math:`N`-dimensional data is encoded in position eigenstates so that

.. math:: \mathbf{x} \Leftrightarrow \ket{\mathbf{x}} := \ket{x_{1}} \otimes \ldots \otimes \ket{x_{N}}.

We want to perform the transformation

.. math:: \ket{\mathbf{x}} \Rightarrow \ket{\varphi (W \mathbf{x} + \mathbf{b})}.

It turns out that the quantum circuit above can do precisely this! Consider first the affine
transformation :math:`W \mathbf{x} + \mathbf{b}`. Leveraging the singular value decomposition, we
can always write :math:`W = O_{2} \Sigma O_{1}` with :math:`O_{k}` orthogonal matrices and
:math:`\Sigma` a positive diagonal matrix. These orthogonal transformations can be carried out using
interferometers without access to phase, i.e., with :math:`\boldsymbol{\phi}_{k} = 0`:

.. math:: U_{k}(\boldsymbol{\theta}_{k},\mathbf{0})\ket{\mathbf{x}} = \ket{O_{k} \mathbf{x}}.

On the other hand, the diagonal matrix :math:`\Sigma = {\rm diag}\left(\{c_{i}\}_{i=1}^{N}\right)`
can be achieved through squeezing:

.. math:: \otimes_{i=1}^{N}S(r_{i})\ket{\mathbf{x}} \propto \ket{\Sigma \mathbf{x}},

with :math:`r_{i} = \log (c_{i})`. Finally, the addition of a bias vector :math:`\mathbf{b}` is done
using position displacement gates:

.. math:: \otimes_{i=1}^{N}D(\alpha_{i})\ket{\mathbf{x}} = \ket{\mathbf{x} + \mathbf{b}},

with :math:`\mathbf{b} = \{\alpha_{i}\}_{i=1}^{N}` and :math:`\alpha_{i} \in \mathbb{R}`. Putting
this all together, we see that the operation :math:`\mathcal{D} \circ \mathcal{U}_{2} \circ
\mathcal{S} \circ \mathcal{U}_{1}` with phaseless interferometers and position displacement performs
the transformation :math:`\ket{\mathbf{x}} \Rightarrow \ket{W \mathbf{x} + \mathbf{b}}` on position
eigenstates.

.. warning::

    The TensorFlow backend is the natural simulator for quantum neural networks in Strawberry
    Fields, but this backend cannot naturally accommodate position eigenstates, which require
    infinite squeezing. For simulation of position eigenstates in this backend, the best approach is
    to use a displaced squeezed state (:class:`prepare_displaced_squeezed_state
    <strawberryfields.backends.tfbackend.TFBackend.prepare_displaced_squeezed_state>`) with high
    squeezing value r. However, to avoid significant numerical error, it is important to make sure
    that all initial states have negligible amplitude for Fock states :math:`\ket{n}` with
    :math:`n\geq \texttt{cutoff_dim}`, where :math:`\texttt{cutoff_dim}` is the cutoff dimension.

Finally, the nonlinear function :math:`\varphi` can be achieved through a restricted type of
non-Gaussian gates :math:`\otimes_{i=1}^{N}\Phi(\lambda_{i})` acting on each mode (see
[[1]_] for more details), resulting in the transformation

.. math:: \otimes_{i=1}^{N}\Phi(\lambda_{i})\ket{\mathbf{x}} = \ket{\varphi(\mathbf{x})}.

The operation :math:`\mathcal{L} = \Phi \circ \mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S}
\circ \mathcal{U}_{1}` with phaseless interferometers, position displacements, and restricted
non-Gaussian gates can hence be seen as enacting a classical neural network layer
:math:`\ket{\mathbf{x}} \Rightarrow \ket{\phi(W \mathbf{x} + \mathbf{b})}` on position eigenstates.

Extending to quantum neural networks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In fact, CV quantum neural network layers can be made more expressive than their classical
counterparts. We can do this by lifting the above restrictions on :math:`\mathcal{L}`, i.e.:

- Using arbitrary interferometers :math:`U_{k}(\boldsymbol{\theta}_{k},\boldsymbol{\phi}_{k})` with
  access to phase and general displacement gates (i.e., not necessarily position displacement). This
  allows :math:`\mathcal{D} \circ \mathcal{U}_{2} \circ \mathcal{S} \circ \mathcal{U}_{1}` to
  represent a general Gaussian operation.
- Using arbitrary non-Gaussian gates :math:`\Phi(\lambda_{i})`, such as the Kerr gate.
- Encoding data outside of the position eigenbasis, for example using instead the Fock basis.

In fact, gates in a single layer form a universal gate set, making the CV quantum neural network a
model for universal quantum computing, i.e., a sufficient number of layers can carry out any quantum
algorithm implementable on a CV quantum computer.

CV quantum neural networks can be trained both through classical simulation and directly on quantum
hardware. Strawberry Fields relies on classical simulation to evaluate cost functions of the CV
quantum neural network and the resultant gradients with respect to parameters of each layer.
However, this becomes an intractable task with increasing network depth and width. Ultimately,
direct evaluation on hardware will likely be necessary to large scale networks; an approach for
hardware-based training is mapped out in [[2]_]. The `PennyLane
<https://pennylane.readthedocs.io/en/latest/>`_ library provides tools for training hybrid
quantum-classical machine learning models, using both simulators and real-world quantum hardware.

Example CV quantum neural network layers are shown, for one to four modes, below:

:html:`<br>`

.. figure:: /tutorials/images/layer_1mode.svg
    :align: center
    :width: 31%
    :target: javascript:void(0);

    One mode layer

:html:`<br>`


.. figure:: /tutorials/images/layer_2mode.svg
    :align: center
    :width: 46%
    :target: javascript:void(0);

    Two mode layer

:html:`<br>`



.. figure:: /tutorials/images/layer_3mode.svg
    :align: center
    :width: 75%
    :target: javascript:void(0);

    Three mode layer

:html:`<br>`

.. figure:: /tutorials/images/layer_4mode.svg
    :align: center
    :width: 90%
    :target: javascript:void(0);

    Four mode layer

:html:`<br>`

Here, the multimode linear interferometers :math:`U_{1}` and :math:`U_{2}` have been decomposed into
two-mode phaseless beamsplitters (:class:`~strawberryfields.ops.BSgate`) and single-mode phase shifters
(:class:`~strawberryfields.ops.Rgate`) using the Clements decomposition [[3]_]. The Kerr gate is used as
the non-Gaussian gate.

Code
----

First, we import Strawberry Fields, TensorFlow, and NumPy:
"""
import numpy as np
import tensorflow as tf
import strawberryfields as sf
from strawberryfields import ops

######################################################################
# Before we begin defining our optimization problem, let's first create
# some convenient utility functions.
#
# Utility functions
# ~~~~~~~~~~~~~~~~~
#
# The first step to writing a CV quantum neural network layer in Strawberry Fields is to define a
# function for the two interferometers:

def interferometer(params, q):
    """Parameterised interferometer acting on ``N`` modes.

    Args:
        params (list[float]): list of length ``max(1, N-1) + (N-1)*N`` parameters.

            * The first ``N(N-1)/2`` parameters correspond to the beamsplitter angles
            * The second ``N(N-1)/2`` parameters correspond to the beamsplitter phases
            * The final ``N-1`` parameters correspond to local rotation on the first N-1 modes

        q (list[RegRef]): list of Strawberry Fields quantum registers the interferometer
            is to be applied to
    """
    N = len(q)
    theta = params[:N*(N-1)//2]
    phi = params[N*(N-1)//2:N*(N-1)]
    rphi = params[-N+1:]

    if N == 1:
        # the interferometer is a single rotation
        ops.Rgate(rphi[0]) | q[0]
        return

    n = 0  # keep track of free parameters

    # Apply the rectangular beamsplitter array
    # The array depth is N
    for l in range(N):
        for k, (q1, q2) in enumerate(zip(q[:-1], q[1:])):
            # skip even or odd pairs depending on layer
            if (l + k) % 2 != 1:
                ops.BSgate(theta[n], phi[n]) | (q1, q2)
                n += 1

    # apply the final local phase shifts to all modes except the last one
    for i in range(max(1, N - 1)):
        ops.Rgate(rphi[i]) | q[i]

######################################################################
# .. warning::
#
#     The :class:`~strawberryfields.ops.Interferometer` class in Strawberry Fields does not reproduce
#     the functionality above. Instead, :class:`~strawberryfields.ops.Interferometer` applies a given
#     input unitary matrix according to the Clements decomposition.
#
# Using the above ``interferometer`` function, an :math:`N` mode CV quantum neural network layer is
# given by the function:

def layer(params, q):
    """CV quantum neural network layer acting on ``N`` modes.

    Args:
        params (list[float]): list of length ``2*(max(1, N-1) + N**2 + n)`` containing
            the number of parameters for the layer
        q (list[RegRef]): list of Strawberry Fields quantum registers the layer
            is to be applied to
    """
    N = len(q)
    M = int(N * (N - 1)) + max(1, N - 1)

    int1 = params[:M]
    s = params[M:M+N]
    int2 = params[M+N:2*M+N]
    dr = params[2*M+N:2*M+2*N]
    dp = params[2*M+2*N:2*M+3*N]
    k = params[2*M+3*N:2*M+4*N]

    # begin layer
    interferometer(int1, q)

    for i in range(N):
        ops.Sgate(s[i]) | q[i]

    interferometer(int2, q)

    for i in range(N):
        ops.Dgate(dr[i], dp[i]) | q[i]
        ops.Kgate(k[i]) | q[i]

######################################################################
# Finally, we define one more utility function to help us initialize
# the TensorFlow weights for our quantum neural network layers:

def init_weights(modes, layers, active_sd=0.0001, passive_sd=0.1):
    """Initialize a 2D TensorFlow Variable containing normally-distributed
    random weights for an ``N`` mode quantum neural network with ``L`` layers.

    Args:
        modes (int): the number of modes in the quantum neural network
        layers (int): the number of layers in the quantum neural network
        active_sd (float): the standard deviation used when initializing
            the normally-distributed weights for the active parameters
            (displacement, squeezing, and Kerr magnitude)
        passive_sd (float): the standard deviation used when initializing
            the normally-distributed weights for the passive parameters
            (beamsplitter angles and all gate phases)

    Returns:
        tf.Variable[tf.float32]: A TensorFlow Variable of shape
        ``[layers, 2*(max(1, modes-1) + modes**2 + modes)]``, where the Lth
        row represents the layer parameters for the Lth layer.
    """
    # Number of interferometer parameters:
    M = int(modes * (modes - 1)) + max(1, modes - 1)

    # Create the TensorFlow variables
    int1_weights = tf.random.normal(shape=[layers, M], stddev=passive_sd)
    s_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd)
    int2_weights = tf.random.normal(shape=[layers, M], stddev=passive_sd)
    dr_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd)
    dp_weights = tf.random.normal(shape=[layers, modes], stddev=passive_sd)
    k_weights = tf.random.normal(shape=[layers, modes], stddev=active_sd)

    weights = tf.concat(
        [int1_weights, s_weights, int2_weights, dr_weights, dp_weights, k_weights], axis=1
    )

    weights = tf.Variable(weights)

    return weights

######################################################################
# Optimization
# ~~~~~~~~~~~~
#
# Now that we have our utility functions, lets begin defining our optimization problem
# In this particular example, let's create a 1 mode CVQNN with 8 layers and a Fock-basis
# cutoff dimension of 6. We will train this QNN to output a desired target state;
# a single photon state.

# set the random seed
tf.random.set_seed(137)
np.random.seed(137)


# define width and depth of CV quantum neural network
modes = 1
layers = 8
cutoff_dim = 6


# defining desired state (single photon state)
target_state = np.zeros(cutoff_dim)
target_state[1] = 1
target_state = tf.constant(target_state, dtype=tf.complex64)


######################################################################
# Now, let's initialize an engine with the TensorFlow ``"tf"`` backend,
# and begin constructing out QNN program.

# initialize engine and program
eng = sf.Engine(backend="tf", backend_options={"cutoff_dim": cutoff_dim})
qnn = sf.Program(modes)

# initialize QNN weights
weights = init_weights(modes, layers) # our TensorFlow weights
num_params = np.prod(weights.shape)   # total number of parameters in our model

######################################################################
# To construct the program, we must create and use Strawberry Fields symbolic
# gate arguments. These will be mapped to the TensorFlow variables on engine
# execution.


# Create array of Strawberry Fields symbolic gate arguments, matching
# the size of the weights Variable.
sf_params = np.arange(num_params).reshape(weights.shape).astype(np.str)
sf_params = np.array([qnn.params(*i) for i in sf_params])


# Construct the symbolic Strawberry Fields program by
# looping and applying layers to the program.
with qnn.context as q:
    for k in range(layers):
        layer(sf_params[k], q)


######################################################################
# where ``sf_params`` is a real array of size ``[layers, 2*(max(1, modes-1) + modes**2 + modes)]``
# containing the symbolic gate arguments for the quantum neural network.
#
# Now that our QNN program is defined, we can create our **cost function**.
# Our cost function simply executes the QNN on our engine using the values of the
# input weights.
#
# Since we want to maximize the fidelity :math:`f(w) = \langle \psi(w) | \psi_t\rangle`
# between our QNN output state :math:`|\psi(w)\rangle` and our target state
# :math:`\psi_t\rangle`, we compute the inner product between the two statevectors,
# as well as the norm :math:`\left\lVert \psi(w) - \psi_t\right\rVert`.
#
# Finally, we also return the trace of the output QNN state. This should always
# have a value close to 1. If it deviates significantly from 1, this is an
# indication that we need to increase our Fock-basis cutoff.

def cost(weights):
    # Create a dictionary mapping from the names of the Strawberry Fields
    # symbolic gate parameters to the TensorFlow weight values.
    mapping = {p.name: w for p, w in zip(sf_params.flatten(), tf.reshape(weights, [-1]))}

    # run the engine
    state = eng.run(qnn, args=mapping).state
    ket = state.ket()

    difference = tf.reduce_sum(tf.abs(ket - target_state))
    fidelity = tf.abs(tf.reduce_sum(tf.math.conj(ket) * target_state)) ** 2
    return difference, fidelity, ket, tf.math.real(state.trace())


######################################################################
# We are now ready to minimize our cost function using TensorFlow:

# set up the optimizer
opt = tf.keras.optimizers.Adam()
cost_before, fidelity_before, _, _ = cost(weights)

# Perform the optimization
for i in range(1000):
    # reset the engine if it has already been executed
    if eng.run_progs:
        eng.reset()

    with tf.GradientTape() as tape:
        loss, fid, ket, trace = cost(weights)

    # one repetition of the optimization
    gradients = tape.gradient(loss, weights)
    opt.apply_gradients(zip([gradients], [weights]))

    # Prints progress at every rep
    if i % 1 == 0:
        print("Rep: {} Cost: {:.4f} Fidelity: {:.4f} Trace: {:.4f}".format(i, loss, fid, trace))


print("\nFidelity before optimization: ", fidelity_before.numpy())
print("Fidelity after optimization: ", fid.numpy())
print("\nTarget state: ", target_state.numpy())
print("Output state: ", np.round(ket.numpy(), decimals=3))


######################################################################
# For more applications of CV quantum neural networks, see the :doc:`state learning </demos/run_state_learner>`
# and :doc:`gate synthesis </demos/run_gate_synthesis>` demonstrations.
#
# References
# ----------
#
# .. [1] Nathan Killoran, Thomas R Bromley, Juan Miguel Arrazola, Maria Schuld, Nicolás Quesada, and
#        Seth Lloyd. Continuous-variable quantum neural networks. arXiv preprint arXiv:1806.06871,
#        2018.
#
# .. [2] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating
#        analytic gradients on quantum hardware. Physical Review A, 99(3):032331, 2019.
#
# .. [3] William R Clements, Peter C Humphreys, Benjamin J Metcalf, W Steven Kolthammer, and Ian A
#        Walsmley. Optimal design for universal multiport interferometers. Optica, 3(12):1460–1465,
#        2016. doi:10.1364/OPTICA.3.001460.