{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n\nTraining variational GBS distributions\n======================================\n\n*Technical details are available in the API documentation:*\n:doc:`code/api/strawberryfields.apps.train`\n\nMany quantum algorithms rely on the ability to train the parameters of quantum circuits, a\nstrategy inspired by the success of neural networks in machine learning. Training is often\nperformed by evaluating gradients of a cost function with respect to circuit parameters,\nthen employing gradient-based optimization methods. In this demonstration, we outline the\ntheoretical principles for training Gaussian Boson Sampling (GBS) circuits, first\nintroduced in Ref. [[#banchi2020training]_]. We then explain how to employ Strawberry Fields\nto perform the training by looking at basic examples in stochastic optimization and unsupervised\nlearning. Let's go! \ud83d\ude80\n\nTheory\n------\n\nAs explained in more detail in :doc:`/concepts/gbs`, for a GBS device, the probability\n$\\Pr(S)$ of observing an output $S=(s_1, s_2, \\ldots, s_m)$, where $s_i$ is the\nnumber of photons detected in the $i$-th mode, is given by\n\n\\begin{align}\\Pr(S) = \\frac{1}{\\mathcal{N}} \\frac{|\\text{Haf}(A_S)|^2}{\n s_1!\\ldots s_m!},\\end{align}\n\nwhere $\\mathcal{N}$ is a normalization constant and $A$ is an arbitrary symmetric\nmatrix with eigenvalues bounded between $-1$ and $1$. The matrix\n$A$ can also be rescaled by a constant factor, which is equivalent to fixing a total\nmean photon number in the distribution.\n\nWe want to *train* this distribution to perform a specific task. For example, we may wish to\nreproduce the statistical properties of a given dataset, or to optimize the circuit to sample\nspecific patterns with high probability. The strategy is to identify a parametrization of\nthe distribution and compute gradients that allow us to optimize the parameters using\ngradient-based techniques. We refer to these circuits as variational GBS circuits, or VGBS for short.\n\nGradient formulas can be generally challenging to calculate, but there is a particular\nstrategy known as the WAW parametrization (wow! what a great name) that leads to gradients that\nare simpler to compute. It involves transforming the symmetric matrix $A$ as\n\n\\begin{align}A \\rightarrow A_W = W A W,\\end{align}\n\nwhere $W = \\text{diag}(\\sqrt{w_1}, \\sqrt{w_2}, \\ldots, \\sqrt{w_m})$ is a diagonal weight matrix and $m$\nis the number of modes. This parametrization is useful because the hafnian of $A_W$\nfactorizes into two separate components\n\n\\begin{align}\\text{Haf}(A_W) = \\text{Haf}(A)\\text{det}(W),\\end{align}\n\na property that can be cleverly exploited to compute gradients more efficiently. More broadly,\nit is convenient to embed trainable parameters $\\theta = (\\theta_1, \\ldots, \\theta_d)$ into\nthe weights $w_k$. Several choices are possible, but here we focus on an exponential embedding\n\n\\begin{align}w_k = \\exp(-\\theta^T f^{(k)}),\\end{align}\n\nwhere each $f^{(k)}$ is a $d$-dimensional vector. The simplest case occurs by setting\n$d=m$ and choosing these vectors to satisfy $\\theta^T f^{(k)} = \\theta_k$ such that\n$w_k = \\exp(-\\theta_k)$.\n\nTraining tasks\n^^^^^^^^^^^^^^\n\nIn stochastic optimization, we are given a function $h(S)$ and the goal is optimize the \nparameters to sample from a distribution $P_{\\theta}(S)$ that minimizes the \nexpectation value\n\n\\begin{align}C (\\theta) = \\sum_{S} h(S) P_{\\theta}(S).\\end{align}\nAs shown in [[#banchi2020training]_], the gradient of the cost function $C (\\theta)$ is given\nby\n\n\\begin{align}\\partial_{\\theta} C (\\theta) = \\sum_{S} h(S) P_{\\theta}(S)\n \\sum_{k=1}^{m} (s_k - \\langle s_{k} \\rangle) \\partial_{\\theta} \\log w_{k},\\end{align}\n\nwhere $\\langle s_k\\rangle$ denotes the average photon number in mode $k$. This gradient\nis an\nexpectation value with respect to the GBS distribution, so it can be estimated by generating\nsamples from the device. Convenient!\n\nIn a standard unsupervised learning scenario, data are assumed to be sampled from an unknown\ndistribution and a common goal is to learn that distribution. More precisely, the goal is\nto use the data to train a model that can sample from a similar distribution, thus being able to\ngenerate new data. Training can be performed by minimizing the Kullback-Leibler (KL) divergence,\nwhich up to additive constants can be written as:\n\n\\begin{align}KL(\\theta) = -\\frac{1}{T}\\sum_S \\log[P_{\\theta}(S)].\\end{align}\n\nIn this case $S$ is an element of the data, $P(S)$ is the probability of observing that\nelement when sampling from the GBS distribution, and $T$ is the total number of elements\nin the data. For the GBS distribution in the WAW parametrization, the gradient of the KL\ndivergence can be written as\n\n\\begin{align}\\partial_\\theta KL(\\theta) = - \\sum_{k=1}^m\\frac{1}{w_k}(\\langle s_k\\rangle_{\\text{data}}-\n \\langle s_k\\rangle_{\\text{GBS}})\\partial_\\theta w_k.\\end{align}\n\nRemarkably, this gradient can be evaluated without a quantum computer because the\nmean photon numbers $\\langle s_k\\rangle_{\\text{GBS}}$ over the GBS distribution can be\nefficiently computed classically [[#banchi2020training]_]. As we'll show later, this leads to\nvery fast training. This is true even if sampling the distribution remains classically\nintractable! \ud83e\udd2f\n\nStochastic optimization\n-----------------------\nWe're ready to start using Strawberry Fields to train GBS distributions. The main functions needed\ncan be found in the :mod:`~strawberryfields.apps.train` module, so let's start by importing it.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from strawberryfields.apps import train"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We explore a basic example where the goal is to optimize the distribution to favour photons\nappearing in a specific subset of modes, while minimizing the number of photons in the\nremaining modes. This can be achieved with the following cost function\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n\ndef h(s):\n not_subset = [k for k in range(len(s)) if k not in subset]\n return sum(s[not_subset]) - sum(s[subset])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function is defined with respect to the subset of modes for which we want to\nobserve many photons. This subset can be specified later on. Then, for a given sample ``s``,\nwe want to *maximize* the total number of photons in the subset. This can be achieved by\nminimizing its negative value, hence the term ``-sum(s[subset])``. Similarly, for modes\noutside of the specified subset, we want to minimize their total sum, which explains the\nappearance of ``sum(s[not_subset])``.\n\nIt's now time to define the variational circuit. We'll train a distribution based on\na simple lollipop \ud83c\udf6d graph with five nodes:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import networkx as nx\nfrom strawberryfields.apps import plot\n\ngraph = nx.lollipop_graph(3, 2)\nA = nx.to_numpy_array(graph)\nplot.graph(graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining a variational GBS circuit consists of three steps: (i) specifying the embedding,\n(ii) building the circuit, and (iii) defining the cost function with respect to the circuit and\nembedding. We'll go through each step one at a time. For the embedding of trainable parameters,\nwe'll use the simple form $w_k = \\exp(-\\theta_k)$ outlined above, which can be accessed\nthrough the :class:`~strawberryfields.apps.train.Exp` class. Its only input is the number of\nmodes in the device, which is equal to the number of nodes in the graph.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"nr_modes = len(A)\nweights = train.Exp(nr_modes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Easy! The GBS distribution is determined by the symmetric matrix $A$ --- which we\ntrain using the WAW parametrization --- and by the total mean photon number. There is freedom\nin choosing $A$, but here we'll just use the graph's adjacency matrix. The total\nmean photon number is a hyperparameter of the distribution: in general, different choices may\nlead to different results in training. In fact, the mean photon number may change during\ntraining as a consequence of the weights being optimized. Finally, GBS devices can operate either\nwith photon number-resolving detectors or threshold detectors, so there is an option to specify\nwhich one we intend to use. We'll stick to detectors that can count photons.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n_mean = 6\nvgbs = train.VGBS(A, n_mean, weights, threshold=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last step before training is to define the cost function with respect to our\nprevious choices. Since this is a stochastic optimization task, we employ the\n:class:`~strawberryfields.apps.train.Stochastic` class and input our previously defined cost\nfunction ``h``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cost = train.Stochastic(h, vgbs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"During training, we'll calculate gradients and evaluate the average of this cost function with\nrespect to the GBS distribution. Both of these actions require estimating expectation values, so\nthe number of samples in the estimation also needs to be specified. The parameters also need to be\ninitialized. There is freedom in this choice, but here we'll set them all to zero. Finally, we'll\naim to increase the number of photons in the \"candy\" part of the lollipop graph,\nwhich corresponds to the subset of modes ``[0, 1, 2]``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"np.random.seed(1969) # for reproducibility\nd = nr_modes\nparams = np.zeros(d)\nsubset = [0, 1, 2]\nnr_samples = 100\n\nprint('Initial mean photon numbers = ', vgbs.mean_photons_by_mode(params))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If training is successful, we should see the mean photon numbers of the first three modes\nincreasing, while those of the last two modes become close to zero. We perform training over\n200 steps of gradient descent with a learning rate of 0.01:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"nr_steps = 200\nrate = 0.01\n\nfor i in range(nr_steps):\n params -= rate * cost.grad(params, nr_samples)\n if i % 50 == 0:\n print('Cost = {:.3f}'.format(cost.evaluate(params, nr_samples)))\n\nprint('Final mean photon numbers = ', vgbs.mean_photons_by_mode(params))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! The cost function decreases smoothly and there is a clear increase in the mean photon\nnumbers of the target modes, with a corresponding decrease in the remaining modes.\n\nThe transformed matrix $A_W = W A W$ also needs to have eigenvalues bounded between\n-1 and 1, so continuing training indefinitely can lead to unphysical distributions when\nthe weights become too large. It's important to monitor this behaviour. We can confirm that the\ntrained model is behaving according to plan by generating some samples. Although we still\nobserve a couple of photons in the last two modes, most of the detections happen in the first\nthree modes that we are targeting, just as intended.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Aw = vgbs.A(params)\nsamples = vgbs.generate_samples(Aw, n_samples=10)\nprint(samples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unsupervised learning\n---------------------\nWe are going to train a circuit based on the pre-generated datasets in\nthe :mod:`~strawberryfields.apps.data` module. Gradients in this setting can be calculated\nefficiently, so we can study a larger graph with 30 nodes.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from strawberryfields.apps import data\n\ndata_pl = data.Planted()\ndata_samples = data_pl[:1000] # we use only the first one thousand samples\nA = data_pl.adj\nnr_modes = len(A)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As before, we use the exponential embedding, but this time we define the VGBS circuit in\nthreshold mode, since this is how the data samples were generated. The cost function is the\nKullback-Liebler divergence, which depends on the data samples and can be accessed\nusing :class:`~strawberryfields.apps.train.KL`:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"weights = train.Exp(nr_modes)\nn_mean = 1\nvgbs = train.VGBS(A, n_mean, weights, threshold=True)\ncost = train.KL(data_samples, vgbs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We initialize parameters to zero and perform a longer optimization over one thousand\nsteps with a learning rate of 0.15. This will allow us to reach a highly-trained model.\nGradients can be computed efficiently, but evaluating the cost function is challenging\nbecause it requires calculating GBS probabilities, which generally take exponential time.\nInstead, we'll keep track of the differences in mean photon numbers per mode for the data and\nmodel distributions.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from numpy.linalg import norm\n\nparams = np.zeros(nr_modes)\nsteps = 1000\nrate = 0.15\n\nfor i in range(steps):\n params -= rate * cost.grad(params)\n if i % 100 == 0:\n diff = cost.mean_n_data - vgbs.mean_clicks_by_mode(params)\n print('Norm of difference = {:.5f}'.format(norm(diff)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow! WAW! We reach almost perfect agreement between the data and the trained model. We can also\ngenerate a few samples, this time creating photon patterns that, in general,\nare not originally in the training data.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Aw = vgbs.A(params)\nsamples = vgbs.generate_samples(Aw, n_samples=10)\nprint(samples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The examples we have covered are introductory tasks aimed at mastering the basics of training\nvariational GBS circuits. These ideas are new and there is much left to explore in terms of\ntheir scope and extensions. For example, the original paper [[#banchi2020training]_] studies how\nGBS devices can be trained to find solutions to max clique problems. What new applications come\nto your mind?\n\nReferences\n----------\n\n.. [#banchi2020training]\n\n Leonardo Banchi, Nicol\u00e1s Quesada, and Juan Miguel Arrazola. Training Gaussian Boson\n Sampling Distributions. arXiv:2004.04770. 2020.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}