# pylint: disable=wrong-import-position,wrong-import-order,ungrouped-imports,invalid-name
"""
.. _apps-sim-tutorial:
Graph similarity
================
*Technical details are available in the API documentation:*
:doc:`code/api/strawberryfields.apps.similarity`
This page looks at how to use GBS to construct a similarity measure between graphs,
known as a graph kernel [[#schuld2019quantum]_]. Kernels can be applied to graph-based
data for machine learning tasks such as classification using a support vector machine.
Graph data
----------
We begin by fixing a dataset of graphs to consider and loading GBS samples from these graphs,
which will be needed in the following.
Let's use the MUTAG dataset of graphs [[#debnath1991structure]_][[#kriege2012subgraph]_]. This is a
dataset of 188 different graphs that each correspond to the structure of a chemical compound. Our
goal is to use GBS samples from these graphs to measure their similarity.
The :mod:`~strawberryfields.apps.data` module provides pre-calculated GBS samples for selected
graphs in the MUTAG dataset. Each set of samples is generated by encoding the graph into a GBS
device, and collecting photon click events. We'll start by loading four sets of samples and
visualizing the corresponding graphs.
"""
from strawberryfields.apps import data, plot, similarity
m0 = data.Mutag0()
m1 = data.Mutag1()
m2 = data.Mutag2()
m3 = data.Mutag3()
##############################################################################
# These datasets contain both the adjacency matrix of the graph and the samples generated through
# GBS. We can access the adjacency matrix through:
m0_a = m0.adj
m1_a = m1.adj
m2_a = m2.adj
m3_a = m3.adj
##############################################################################
# Samples from these graphs can be accessed by indexing:
print(m0[0])
##############################################################################
# We can now plot the four graphs using the :mod:`~strawberryfields.apps.plot` module. To use
# this module, we need to convert the adjacency matrices into NetworkX Graphs:
import networkx as nx
import plotly
##############################################################################
# Plotting ``m0_a``:
plot.graph(nx.Graph(m0_a))
##############################################################################
# Plotting ``m1_a``:
plot.graph(nx.Graph(m1_a))
##############################################################################
# Plotting ``m2_a``:
plot.graph(nx.Graph(m2_a))
##############################################################################
# Plotting ``m3_a``:
plot.graph(nx.Graph(m3_a))
###############################################################################
# The graphs of ``m1_a`` and ``m2_a`` look very similar. In fact,
# it turns out that they are *isomorphic* to each other, which means that the graphs can be made
# identical by permuting their node labels.
##############################################################################
# Creating a feature vector
# -------------------------
#
# Following [[#schuld2019quantum]_], we can create a *feature vector* to describe each graph.
# These feature vectors contain information about the graphs and can be viewed as a mapping to a
# high-dimensional feature space, a technique often used in machine learning that allows us to
# employ properties of the feature space to separate and classify the vectors.
#
# The feature vector of a graph can be composed in a variety of ways. One approach is to
# associate features with the relative frequencies of certain types of measurements being
# recorded from a GBS device configured to sample from the graph, as we now discuss.
#
# We begin by defining the concept of an *orbit*, which is the set of all GBS samples that are
# equivalent under permutation of the modes. A sample can be converted to its corresponding orbit
# using the :func:`~strawberryfields.apps.similarity.sample_to_orbit` function. For example, the
# first sample of ``m0`` is ``[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]`` and has orbit:
print(similarity.sample_to_orbit(m0[0]))
##############################################################################
# Here, ``[1, 1]`` means that two photons were detected, each in a separate mode. Other samples
# can be randomly generated from the ``[1, 1]`` orbit using:
print(similarity.orbit_to_sample([1, 1], modes=m0.modes))
##############################################################################
# Orbits provide a useful way to coarse-grain the samples from GBS into outcomes that are
# statistically more likely to be observed. However, we are interested in coarse-graining further
# into *events*, which correspond to a combination of orbits with the same photon number such
# that the number of photons counted in each mode does not exceed a fixed value
# ``max_count_per_mode``. To understand this, let's look at all of the orbits with a photon
# number of 5:
print(list(similarity.orbits(5)))
##############################################################################
# All 5-photon samples belong to one of the orbits above. A 5-photon event with
# ``max_count_per_mode = 3`` means that we include the orbits: ``[[1, 1, 1, 1, 1], [2, 1, 1, 1],
# [3, 1, 1], [2, 2, 1], [3, 2]]`` and ignore the orbits ``[[4, 1], [5]]``. For example,
# the sample ``[0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0]`` is a 5-photon event:
print(similarity.sample_to_event([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0], 3))
##############################################################################
# Samples with more than ``max_count_per_mode`` in any mode are not counted as part of the event:
print(similarity.sample_to_event([0, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3))
##############################################################################
# Now that we have mastered orbits and events, how can we make a feature vector? It was shown in
# [[#schuld2019quantum]_] that one way of making a feature vector of a graph is through the
# frequencies of orbits or events. For example, for a :math:`k` photon event :math:`E_{k, n_{\max}}`
# with maximum count per mode :math:`n_{\max}` and corresponding probability :math:`p_{k,
# n_{\max}}:=p_{E_{k, n_{\max}}}(G)` with respect to a graph :math:`G`, a feature vector can be
# written as
#
# .. math::
# f_{\mathbf{k}, n_{\max}} = (p_{k_{1}, n_{\max}}, p_{k_{2}, n_{\max}}, \ldots , p_{k_{K},
# n_{\max}}),
#
# where :math:`\mathbf{k} := (k_{1}, k_{2}, \ldots , k_{K})` is a list of different total photon
# numbers.
#
# For example, if :math:`\mathbf{k} := (2, 4, 6, 8)` and :math:`n_{\max} = 2`, we have
#
# .. math::
# f_{(2, 4, 6, 8), 2} = (p_{2, 2}, p_{4, 2}, p_{6, 2}, p_{8, 2}).
#
# In this case, we are interested in the probabilities of events :math:`E_{2, 2}`, :math:`E_{4,
# 2}`, and :math:`E_{6, 2}`. Suppose we are sampling from a four-mode device and have the samples
# ``[0, 3, 0, 1]`` and ``[1, 2, 0, 1]``. These samples are part of the orbits ``[3, 1]`` and
# ``[2, 1, 1]``, respectively. However, ``[3, 1]`` is not part of the :math:`E_{4, 2}` event while
# ``[2, 1, 1]`` is.
#
# Calculating a feature vector
# ----------------------------
#
# We provide three methods for calculating a feature vector in the
# :mod:`~strawberryfields.apps.similarity` module of Strawberry Fields:
#
# 1. Through sampling.
# 2. Using exact probability calculations.
# 3. Using a Monte Carlo estimate of the probability.
#
# The first method uses statistical inference. Hence, all one needs to do is generate some GBS
# samples from the graph of interest and fix the composition of the feature vector. For example,
# to obtain feature vector :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}` for the first MUTAG graph,
# we use:
print(similarity.feature_vector_events_sampling(m0, [2, 4], 2))
##############################################################################
# We can also use any orbits of our choice instead of events:
print(similarity.feature_vector_orbits_sampling(m0, [[1, 1], [2], [1, 1, 1, 1], [2, 1, 1]]))
##############################################################################
# For the second method, we calculate the orbit probabilities exactly rather than
# through sampling. Considering a feature vector of orbit probabilities,
# the probability of a single orbit :math:`p(O)` is given by:
#
# .. math::
# p(O) = \sum_{S \in O} p(S)
#
# where :math:`S` represents a GBS output click pattern. Calculating each :math:`p(S)` requires
# computing a `hafnian `__, which gets
# exponentially difficult with increasing photon number. The exact probability of an event
# :math:`p_{k,n_{\max}}` can be calculated in a similar manner.
#
# Built-in functions :func:`~strawberryfields.apps.similarity.feature_vector_orbits` and
# :func:`~strawberryfields.apps.similarity.feature_vector_events`
# can be used to get exact feature vectors. These functions
# use a keyword argument ``samples`` to signal producing either exact or Monte Carlo estimated
# probabilities, as shown later. ``samples`` is set to ``None`` to get an exact feature vector by
# default. To use Monte Carlo estimation, ``samples`` can be set to the number of samples desired
# to be used in the estimation. For example, to get the exact event probabilities in the feature
# vector example :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}` seen previously, we use:
print(similarity.feature_vector_events(nx.Graph(m0_a), [2, 4], 2))
##############################################################################
# Although they are precise, exact calculations for large matrices can be tough to evaluate.
# Additionally, what makes calculating :math:`p_{k, n_{\max}}` really challenging is the number
# of GBS samples the corresponding event contains. For example, a 6-photon event over 17 modes
# :math:`E_{k=6, n_{\max}=2}` contains the following number of samples:
print(similarity.event_cardinality(6, 2, 17))
##############################################################################
# To avoid calculating a large number of sample probabilities, an alternative is to perform
# Monte Carlo estimation. Here, samples within an orbit or event are selected uniformly
# at random and their exact probabilities are calculated. For example, for an event
# :math:`E_{k, n_{\max}}`, if :math:`N` samples :math:`\{S_{1}, S_{2}, \ldots , S_{N}\}`
# are randomly selected from all possible samples in this event, then the event probability can
# be approximated as
#
# .. math::
# p(E_{k, n_{\max}}) \approx \frac{1}{N}\sum_{i=1}^N p(S_i) |E_{k, n_{\max}}|,
#
# with :math:`|E_{k, n_{\max}}|` denoting the cardinality of the event.
#
# This method can be accessed using the
# :func:`~strawberryfields.apps.similarity.feature_vector_events` function
# with ``samples`` set to the number of samples desired to be used in the estimation.
# For example, to get MC-estimated probabilities for our example feature vector
# :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}`, we use:
print(similarity.feature_vector_events(nx.Graph(m0_a), [2, 4], 2, samples=1000))
##############################################################################
#
# .. note::
# The results of using Monte Carlo estimation with
# :func:`~strawberryfields.apps.similarity.feature_vector_orbits` and
# :func:`~strawberryfields.apps.similarity.feature_vector_events` are probabilistic and may
# vary between runs. Increasing the ``samples`` parameter will increase the precision but
# slow down the calculation.
#
# Machine learning with GBS graph kernels
# ---------------------------------------
#
# The power of feature vectors that embed graphs in a vector space of real numbers is that we can
# now measure similarities between graphs. This is very useful in machine learning, where similar
# labels are assigned to graphs that are close to each other. GBS feature vectors therefore give
# rise to a similarity measure between graphs!
#
# Let's build this up a bit more. The MUTAG dataset we are considering contains not only graphs
# corresponding to the structure of chemical compounds, but also a *label* of each
# compound based upon its mutagenic effect. The four graphs we consider here have labels:
#
# - MUTAG0: Class 1
# - MUTAG1: Class 0
# - MUTAG2: Class 0
# - MUTAG3: Class 1
classes = [1, 0, 0, 1]
##############################################################################
# We can use GBS feature vectors in a `support vector machine
# `__ (SVM) that finds a separating
# hyperplane between classes in the feature space. We start by defining two-dimensional feature
# vectors:
events = [8, 10]
max_count = 2
f1 = similarity.feature_vector_events_sampling(m0, events, max_count)
f2 = similarity.feature_vector_events_sampling(m1, events, max_count)
f3 = similarity.feature_vector_events_sampling(m2, events, max_count)
f4 = similarity.feature_vector_events_sampling(m3, events, max_count)
import numpy as np
R = np.array([f1, f2, f3, f4])
print(R)
##############################################################################
# The choice of what ``events`` to use for the feature vectors can be significant and we
# encourage the reader to explore different combinations. We can also use any orbits of our
# choice above instead of events. Note, however, that GBS samples with odd total number of
# photons have zero probability when using ideal GBS, which only generates and outputs pairs of
# photons.
#
# Given our points in the feature space and their target labels, we can use
# scikit-learn's Support Vector Machine `LinearSVC `__ as our model to train:
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
R_scaled = StandardScaler().fit_transform(R) # Transform data to zero mean and unit variance
classifier = LinearSVC()
classifier.fit(R_scaled, classes)
##############################################################################
# Here, the term "linear" refers to the *kernel* function used to calculate inner products
# between vectors in the space. We can use a linear SVM because we have already embedded the
# graphs in a feature space based on GBS. We have also rescaled the feature vectors so that they
# zero mean and unit variance using scikit-learn's ``StandardScaler``, a technique
# `often used `__ in machine learning.
#
# We can then visualize the trained SVM by plotting the decision boundary with respect to the
# points:
w = classifier.coef_[0]
i = classifier.intercept_[0]
m = -w[0] / w[1] # finding the values for y = mx + b
b = -i / w[1]
xx = [-1, 1]
yy = [m * x + b for x in xx]
fig = plot.points(R_scaled, classes)
fig.add_trace(plotly.graph_objects.Scatter(x=xx, y=yy, mode="lines"))
fig
##############################################################################
# This plot shows the two classes (grey points for class 0 and red points for class 1)
# successfully separated by the linear hyperplane using the GBS feature space. Moreover,
# recall that the two MUTAG1 and MUTAG2 graphs of class 0 are actually isomorphic. Reassuringly,
# their corresponding feature vectors are very similar. In fact, the feature vectors of
# isomorphic graphs should always be identical [[#bradler2018graph]_] - the small discrepancy
# in this plot is due to the statistical approximation from sampling.
#
# References
# ----------
#
# .. [#schuld2019quantum]
#
# Maria Schuld, Kamil Brádler, Robert Israel, Daiqin Su, and Brajesh Gupt. A quantum hardware-induced
# graph kernel based on gaussian boson sampling. arXiv preprint arXiv:1905.12646, 2019.
#
# .. [#debnath1991structure]
#
# Asim Kumar Debnath, Rosa L Lopez de Compadre, Gargi Debnath, Alan J Shusterman, and Corwin Hansch.
# Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds.
# correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry,
# 34(2):786–797, 1991.
#
# .. [#kriege2012subgraph]
#
# Nils Kriege and Petra Mutzel. Subgraph matching kernels for attributed graphs. In Proceedings of the
# 29th International Conference on Machine Learning. 2012.
#
# .. [#bradler2018graph]
#
# Kamil Bradler, Shmuel Friedland, Josh Izaac, Nathan Killoran, and Daiqin Su. Graph isomorphism and
# gaussian boson sampling. arXiv preprint arXiv:1810.10644, 2018.