{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\nPoint processes\n===============\n\n*Technical details are available in the API documentation:* :doc:`code/api/strawberryfields.apps.points`\n\nThis section shows how to generate GBS point process samples and use them to detect outlier\npoints in a data set. Point processes are models for generating random point patterns and can be\nuseful in machine learning, providing a source of randomness with\npreference towards both diversity [[#kulesza2012determinantal]_] and similarity in data. GBS\ndevices can be programmed to operate as special types of point processes that generate clustered\nrandom point patterns [[#jahangiri2019point]_].\n\nThe probability of generating a specific pattern of points in GBS point processes depends on\nmatrix functions of a kernel matrix $K$ that describes the similarity between the points.\nMatrix functions that appear in GBS point processes are typically\n`permanents <https://en.wikipedia.org/wiki/Permanent_(mathematics)>`__ and\n`hafnians <https://the-walrus.readthedocs.io/en/latest/hafnian.html>`__. Here we use\nthe permanental point process, in which the probability of observing a pattern of points $S$\ndepends on the permanent of their corresponding kernel submatrix $K_S$ as\n[[#jahangiri2019point]_]:\n\n\\begin{align}\\mathcal{P}(S) = \\frac{1}{\\alpha(S)}\\text{per}(K_S),\\end{align}\n\nwhere $\\alpha$ is a normalization function that depends on $S$ and the average number\nof points. Let's look at a simple example to better understand the permanental point process.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We first import the modules we need. Note that the :mod:`~strawberryfields.apps.points` module has most of\nthe core functionalities exploring point processes.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport plotly\nfrom sklearn.datasets import make_blobs\nfrom strawberryfields.apps import points, plot"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We define a space where the GBS point process patterns are generated. This\nspace is referred to as the state space and is defined by a set of points. The\npoint process selects a subset of these points in each sample. Here we create\na 20 $\\times$ 20 square grid of points.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "R = np.array([(i, j) for i in range(20) for j in range(20)])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The rows of R are the coordinates of the points.\n\nNext step is to create the kernel matrix for the points of this discrete space. We call the\n:func:`~strawberryfields.apps.points.rbf_kernel` function which uses the *radial basis function*\n(RBF) kernel defined as:\n\n\\begin{align}K_{i,j} = e^{-\\|\\bf{r}_i-\\bf{r}_j\\|^2/2\\sigma^2},\\end{align}\n\nwhere $\\bf{r}_i$ are the coordinates of point $i$ and $\\sigma$ is a kernel\nparameter that determines the scale of the kernel.\n\nIn the RBF kernel, points that are much further than a distance $\\sigma$ from each other\nlead to small entries of the kernel matrix, whereas points much closer than $\\sigma$\ngenerate large entries. Now consider a specific point pattern in which all points\nare close to each other, which simply means that their matrix elements have larger entries. The\npermanent of a matrix is a sum over the product of some matrix entries. Therefore,\nthe submatrix that corresponds to those points has a large permanent and the probability of\nobserving them in a sample is larger.\n\nFor kernel matrices that are positive-semidefinite, such as the RBF kernel, there exist efficient\nquantum-inspired classical algorithms for permanental point process sampling\n[[#jahangiri2019point]_]. In this tutorial we use positive-semidefinite kernels and the\nquantum-inspired classical algorithm.\n\nLet's construct the RBF kernel with the parameter $\\sigma$ set to 2.5.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "K = points.rbf_kernel(R, 2.5)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We generate 10 samples with an average number of 50 points per sample by calling\nthe :func:`~strawberryfields.apps.points.sample` function of the :mod:`~strawberryfields.apps.points` module.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "samples = points.sample(K, 50.0, 10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We visualize the first sample by using the :func:`~strawberryfields.apps.plot.points` function of\nthe :mod:`~strawberryfields.apps.plot` module. The point patterns generated by the permanental point process\nusually have a higher degree of clustering compared to a uniformly random pattern.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plot.points(R, samples[0], point_size=10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Outlier Detection\n-----------------\n\nWhen the distribution of points in a given space is inhomogeneous, GBS point processes\nsample points from the dense regions with higher probability. This feature of the GBS point\nprocesses can be used to detect outlier points in a data set. In this example, we create two\ndense clusters and place them in a two-dimensional space containing some randomly distributed\npoints in the background. We consider the random background points as outliers to the clustered\npoints and show that the permanental point process selects points from the dense clusters with\na higher probability.\n\nWe first create the data points. The clusters have 50 points each and the points have a\nstandard deviation of 0.3. The clusters are centered at $[x = 2, y = 2]$ and $[x = 4,\ny = 4]$, respectively. We also add 25 randomly generated points to the data set.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "clusters = make_blobs(n_samples=100, centers=[[2, 2], [4, 4]], cluster_std=0.3)[0]\n\nnoise = np.random.rand(25, 2) * 6.0\n\nR = np.concatenate((clusters, noise))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then construct the kernel matrix and generate 10000 samples.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "K = points.rbf_kernel(R, 1.0)\n\nsamples = points.sample(K, 10.0, 10000)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We obtain the indices of 100 points that appear most frequently in the permanental point\nprocess samples and visualize them. The majority of the commonly appearing points belong\nto the clusters and the points that do not appear frequently are the outlier points. Note that\nsome of the background points might overlap with the clusters.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "gbs_frequent_points = np.argsort(np.sum(samples, axis=0))[-100:]\n\nplot.points(\n    R, [1 if i in gbs_frequent_points else 0 for i in range(len(samples[0]))], point_size=10\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The two-dimensional examples considered here can be easily extended to higher dimensions. The\nGBS point processes retain their clustering property in higher dimensions but visual inspection\nof this clustering feature might not be very straightforward.\n\nGBS point processes can potentially be used in other applications such as clustering data points\nand finding correlations in time series data. Can you design your own example for using GBS point\nprocesses in a new application?\n\nReferences\n----------\n\n.. [#kulesza2012determinantal]\n\n    Alex Kulesza, Ben Taskar, and others. Determinantal point processes for machine learning.\n    Foundations and Trends\u00ae in Machine Learning, 5(2\u20133):123\u2013286, 2012.\n\n.. [#jahangiri2019point]\n\n    Soran Jahangiri, Juan Miguel Arrazola, Nicol\u00e1s Quesada, and Nathan Killoran. Point processes with\n    gaussian boson sampling. 2019. arXiv:1906.11972.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.17"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}