Advanced time-domain photonic circuits

*Authors: Fabian Laudenbach and Theodor Isacsson

TODO: * Inlcude schematic figure (Fabian/Theo) * Include figure explaining vacuum business (Fabian/Theo) * Create mock device_spec (Fabian/Theo/Leo) * Include introduction (Fabian) * Compare remote with local simulation (Fabian)

Introduction

demos/images/delorean_schematic.pdf:alt:delorean_schematic

Schematic of Xanadu’s Gaussian-BosonSampling device DeLorean

The demonstration of quantum-computational advantage, i.e. the ability of a quantum processor to accomplish a task exponentially faster than any super computer could, is considered an important milestone towards useful quantum computers. One of these tasks that are proven to be computationally hard is to sample the photon numbers of multimode squeezed states that have travelled through a non-trivial interferometer – a problem which is referred to as Gaussian BosonSampling (GBS). In our publication [REFERENCE TO SUPREMACY PAPER], we showcase a programmable loop interferometer that successfully sampled GBS instances orders of magnitudes faster than any known classical algorithm could. The experiment is based on temporal-division multiplexing (TMD) which allows for a comparatively simple experimental setup in which the number of optical parts and devices is independent from the number of optical modes. A single squeezed-light source emits batches of 216 subsequent squeezed-light pulses. These pulses are interfered with one another by the help of optical delay lines (acting as memory buffer), programmable beamsplitters and phase shifters.

The architecture of our GBS machine – which we call DeLorean – is layed out as follows: A squeezed-light source injects pulse trains of 216 modes, temporally spaced by \(\tau = 167~\text{ns}\), into the interferometer which consists of three delay loops, each one of them mainly characterized by the roundtrip time it takes a mode to travel trough it. The first loop has a roundtrip time of \(1 \tau\), the second and third loop have \(6 \tau\) and \(36 \tau\), respectively. Each loop is associated with two programmable Gaussian gates: a rotation gate right before the loop and a variable beamsplitter that interferes modes approaching the loop with modes coming from within the loop.

In this tutorial, we will discuss how a GBS circuit on DeLorean can be programmed using the sf.TDMProgram class. This class allows us to define and run time-multiplexed photonic circuits in a very efficient way, even if these circuits include large numbers of modes. For more background on the sf.TDMProgram, please refer to the beginner and advanced tutorial on time-domain photonic circuits.

After defining our GBS circuit, we can either run a simulation or submit the program to the hardware.

Defining a GBS circuit

import json

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors

import strawberryfields as sf
from strawberryfields.tdm.tdmprogram import get_mode_indices
from strawberryfields.ops import Sgate, Rgate, BSgate, LossChannel

from thewalrus.quantum import (
    reduced_gaussian,
    photon_number_mean_vector,
    photon_number_covmat,
)

A time-domain program is defined by the sequences of arguments each quantum gate applies at each time bin within the length of the program. As DeLorean is an interferometer with a total of six Gaussian gates (3 rotation gates, 3 tunable beamsplitters), we require six lists of arguments:

  • phi_0, phi_1, phi_2: sequences of arguments applied by the rotation gates

  • alpha_0, alpha_1, alpha_2: sequences of arguments applied by the beamsplitters

For GBS, these gates need to be set randomly within the range that the devices in our lab are comfortable with. Our rotation gates can apply phases between \(-\pi / 2\) and \(\pi / 2\). The beamsplitter intensity transmission can be set anywhere between 0 and 1. And given some intensity transmission \(T\), the argument passed to our beamsplitter will be \(\alpha=\text{arccos}(\sqrt{T})\). Considering this conversion, the allowed intervals, and the fact that our source emits batches of 216 squeezed-light modes, we can define our random sequences by:

modes = 216
min_phi, max_phi = -np.pi / 2, np.pi / 2
min_T, max_T = 0, 1

# rotation gate parameters
phi_0 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_1 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_2 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()

# beamsplitter parameters
alpha_0 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_1 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_2 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()

Now here comes a tricky part. Although these gate sequences are of length 216 (the number of squeezed-light modes in our GBS instance) the actual time-domain program needs to be longer than that. Why is that the case? Simply put, we want to make sure that the GBS unitary acts on our 216 computational modes exclusively and not on any vacuum modes. This is not as easy as it sounds because there is a bunch of vacuum already in the delay lines before the first squeezed-light mode enters the interferometer.

If a squeezed-light mode interferes with a vacuum mode at a beamsplitter it will effectively undergo loss which we want to avoid. Therefore, we need to set the first arguments of each loop to \(T=1\) (\(\alpha=0\)). This will couple an incoming computational mode entirely into the loop, and a vacuum mode that was inside the loop will be coupled out — no interference between the two will occur. For each loop this needs to be repeated until the respective delay line is completely emptied from vacuum modes and filled with computational modes.

delay_0, delay_1, delay_2 = 1, 6, 36

alpha_0[:delay_0] = [0] * delay_0
alpha_1[:delay_1] = [0] * delay_1
alpha_2[:delay_2] = [0] * delay_2

This still doesn’t explain why the arrays need to be longer than 216.

We just imposed that the first computational mode is fully coupled into the first loop. This means that it will arrive at the second loop after exactly one time bin (i.e. one roundtrip in the first delay line). We also imposed that the first mode should be completely coupled into the second loop as well, which means it will arrive at the third loop no earlier 7 time bins after the program has started. Therefore, the gates associated to the second and third loop need some idle time bins, awaiting the first computational mode.

We thus need to append 0-valued parameters (i.e., full transmission) to the gates in the first two loops. This will make sure that the non-trivial random gates are applied to the computational modes exclusively.

# second loop needs to await the first mode for 1 time bin (delay_0 = 1)
phi_1 = [0] * delay_0 + phi_1
alpha_1 = [0] * delay_0 + alpha_1

# second loop needs to await the first mode for 7 time bins (delay_0 + delay_1 = 1 + 6)
phi_2 = [0] * (delay_0 + delay_1) + phi_2
alpha_2 = [0] * (delay_0 + delay_1) + alpha_2

Towards the end of the time-domain program, we need to make sure that all computational mode in the loops is entirely coupled out again, avoiding interfering with the vacuum modes that will follow the pulse train of 216 squeezed-light modes. This means we need to set the beamsplitters to full transmission again, and, on top of this, since all gate sequences should be of same length, the gates of the first and second loop need to remain idle until the very last computational mode has exited the third loop.

This requires us to add 0-valued parameters corresponding to the number of modes required to exit through the current, and all coming, loop(s); i.e., 43 (1 + 6 + 36) for the first loop, 42 (6 + 36) for the second loop, and 36 for the third loop.

# empty first loop (delay_0) and await the emptying of second and third loop (delay_1 + delay_2)
phi_0 += [0] * (delay_0 + delay_1 + delay_2)
alpha_0 += [0] * (delay_0 + delay_1 + delay_2)

# empty second loop (delay_1) and await the emptying of third loop (delay_2)
phi_1 += [0] * (delay_1 + delay_2)
alpha_1 += [0] * (delay_1 + delay_2)

# empty third loop
phi_2 += [0] * delay_2
alpha_2 += [0] * delay_2
demos/images/td3_vacuum_padding.png:alt:vacuum_padding

Preamble and epilogue for each programmable gate to account for pre-existing vacuum modes and loop delays.

Unfortunately, all this padding with idle time bins before and after the random gates is necessary to avoid interference of computational modes with vacuum. This simply reflects the reality that we need a total unitary that fits the size of the total input state: 216 computational modes and 43 vacuum modes (1, 6, 36 of them in the first, second and third loop, respectively). Our total unitary thus needs to be of size 259 \(\times\) 259.

However, the reward of all precautions taken above is that the unitary that we care about is now a block matrix within the total unitary, acting on the 216 computational modes exclusively. Moreover, the length of the argument lists submitted to the six quantum gates is now exactly 259, matching the size of the total unitary.

[len(parameters) for parameters in (phi_0, phi_1, phi_2, alpha_0, alpha_1, alpha_2)]

Out:

[259, 259, 259, 259, 259, 259]

After definition of the programmable gates, we need to take care of the parameters that we cannot program. In particular, these are the squeezing parameters of the 216 modes (not programmable yet) and the intrinsic phase offsets of the three loops. If we want to run an accurate simulation of the GBS instance, we also need various efficiency parameters.

If you have an API key, you can retrieve these non-programmable parameters from the platform upon connecting to DeLorean, calling the eng.device_spec property on the RemoteEngine object. If not, you can simply download the dummy device specification from here. This file can then be loaded directly into the tutorial.

with open("td3_device_spec.json") as file_:
    device_spec = json.load(file_)

The non-programmable parameters can now be retrieved from the device specification loaded above. There are 5 different sets of parameters that we need to apply to the circuit (with some more detailed explanations below):

  • r: the squeezing parameters set at the start of the circuit (259 values)

  • phi_loop: list with the three offsets used in the rotation gates in each respective loop (3 values)

  • eta_comm: the common efficiency for the full circuit (1 value)

  • eta_loop: list with three efficiencies for each individual loop (3 values)

  • eta_ch_rel: the efficiencies for the connections to the 16 photon-number

    detectors (259 values)

# squeezing parameters
r = device_spec["r"]

# list of three loop offsets
phi_loop = device_spec["phi_loop"]

# common efficiency
eta_comm = device_spec["eta_comm"]

# list of three loop efficiencies
eta_loop = device_spec["eta_loop"]

# relative demux/PNR efficiencies
eta_ch_rel = device_spec["eta_ch_rel"]

Some notes on the paramters we just loaded:

  • r: Our squeezed-light source emits batches of 216 subsequent pulses, so why is r of length 259? As established in the previous section, the time-domain program does not stop after 216 time bins. Rather, it is over as soon as the last computational mode has left the third loop. After 216 time bins, the light source will not emit squeezed-light pulses anymore and vacuum modes will be injected into the interferometer instead. This is why the squeezing array r is a concatenation of 216 non-zero values and 43 zeros.

  • phi_loop: Each loop applies a static phase rotation to the modes in it. Currently, this phase rotation cannot be programmed. All we can do is to measure this phase and take it into account whenever we run a program on the machine. This is why the three loop offsets phi_loop` are part of the device spec. Even though we have no direct influence on them, we need to add them to the TDMProgram in order to obtain a correct representation of the circuit that is applied.

  • eta_comm: The total loss experienced by the different modes depends on which loops they have travelled through and on which detector they ended up in. So it will be highly inhomogeneous within the 216 computational modes. eta_comm, on the other hand, describes their common efficiency. It is experienced by a mode that bypasses all three loops and ends up in our most efficient detector. In other words 1 - eta_comm is the minimal loss that every mode will suffer. On top of that, most modes will suffer additional loss imposed by the loops and less efficient detectors.

  • eta_loop: This is simply a list containing the respective efficiencies of the three loops.

  • eta_ch_rel: In the experiment, two consecutive pulses are only 167 ns apart which exceeds the rate that one single photon-number resolving (PNR) detector can handle. So although our batches of 216 squeezed-light modes travel towards the detection module in one single fibre, they need to be demultiplexed into 16 spatial modes, each one of them probed by its one PNR detector. See Figure [ADD FIG REFERENCE] for a schematic. Now, instead of one PNR operating at 6 MHz, we have 16 PNRs operating at 6/16 MHz. The demultiplexing pattern is repetitive, meaning that detector \(i\) will detect all temporal modes whose index satisfies \(i + 16j\) where \(j\) is an integer in the interval \([0, 17]\). Since the 16 spatial modes and their respective detectors do not share the exact same efficiency, eta_ch_rel keeps track of the loss suffered by each one of our 259 modes individually. If you look at it closer, you will see that it is just a list of 16 different relative efficiencies that is repeated until 259 time bins are filled.

Excecuting the circuit

Let’s collect all the (programmable and non-programmable) gate sequences in one list that we can submit to the sf.TDMProgram. The list, aptly named gate_args, contains all gate parameters in the order in which the corresponding gates will be applied. Note that the channel efficiencies are contained in the gate arguments, while the other two are kept as external lists and do not take part in the circuit unrolling. This is because common efficiency and loop efficiencies remain static throughout the program whereas the relative detection efficiency experienced by the modes needs to be updated at each time bin.

# gate arguments for the TDM program
gate_args = [r, phi_0, alpha_0, phi_1, alpha_1, phi_2, alpha_2, eta_ch_rel]

Next, we declare the number of non-vacuum modes used in the circuit — in this case 216 — as well as the delays in the loops, i.e., the number of modes that will fit in each loop at the same time. We can then use a helper function called get_mode_indices to get the correct mode indices n for the circuit gates as well as the total number of modes N used to initialize the program.

Note: the number of modes plus the number of vacuum modes (i.e., the sum of the delays) equals the length of the gate parameter arrays. This is the total number of modes that will pass through each gate, and is thus also the number of gate parameters needed.

modes = 216
delays = [1, 6, 36]
vac_modes = sum(delays)

n, N = get_mode_indices(delays)

We can now finally construct the circuit. The layout of this circuit is set by the hardware device and cannot be changed. If the wrong gates are used or they are put in the wrong order, the circuit validation will fail.

prog = sf.TDMProgram(N)

with prog.context(*gate_args) as (p, q):
    Sgate(p[0]) | q[n[0]]
    LossChannel(eta_comm) | q[n[0]]

    for i in range(len(delays)):
        Rgate(p[2 * i + 1]) | q[n[i]]
        BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
        Rgate(phi_loop[i]) | q[n[i]]
        LossChannel(eta_loop[i]) | q[n[i]]

    LossChannel(p[7]) | q[0]

The circuit above will be rolled-up, using symbolic placeholder variables for the gate parameters. The circuit must be unrolled with each mode only being measured once, a process we’ve deciced to call space-unrolling.

Out:

<strawberryfields.tdm.tdmprogram.TDMProgram object at 0x7f10acbb8588>

Space-unrolling might seem slightly confusing, but it’s actually fairly straight-forward. Instead of having several modes going through the same path, we interpret the TDM loop interactions as spatially separate. This way, each measurement is made on a different spatial mode, as can be seen in the right-hand side of the image below.

The left-handside depicts how an actual single loop TDM circuit would work, while the right-hand side is how Strawberry Fields simulates it. Since each mode can only be measured once (i.e., modes cannot be reused when using Fock measurements, opposite to the TD2 case, which uses homodyne measurements), the circuit needs to be unrolled. To get an understanding for how this space-unrolling works, try to follow the interactions happening in the left-hand side single loop setup, and try to spot them in the right-hand side circuit diagram. They are in practice equal!

space-unrolling

Space-unrolling the TD3 circuit

Runing the circuit locally on the gaussian backend will return a results object, which contains the samples that are returned by the TD3 program. The samples returned by a sf.TDMProgram are of shape (shots, spatial_modes, temporal_modes), so in our case whe should have gotten (shots, 1, 259).

eng = sf.Engine(backend="gaussian")

results = eng.run(prog)

Note: If you have an API key, and have used it to configure Strawberry Fields correctly, you can try running it on our TD3 hardware device using eng = sf.RemoteEngine("TD3").

If executing the program on a simulator device, we can also extract the total covariance matrix by calling the cov() method on the returned state. This is the covariance matrix for all 259 modes, i.e., the 216 computational modes along with the 43 vacuum modes that we added above. We can then reduce this down to the covariance matrix over the 216 computational modes using the reduced_gaussian method, imported from the The Walrus package.

cov_tot = results.state.cov()

mu_tot = np.zeros(len(cov_tot))
modes = range(vac_modes, vac_modes + modes)

mu, cov = reduced_gaussian(mu_tot, cov_tot, modes)

The covariance matrix describes the quadrature relations of the individual modes and, in the absence of displacements, contains the full information on the Gaussian state. However, if we want to be able to compare the simulation against experimental data, we would like to obtain the photon-number statistics. The following two functions will return an array of mean photon numbers for the 216 modes and a \(216 \times 216\) photon-number covariance matrix whose elements are defined by \(\text{Cov}_{i,j} = E(n_{i} n_{j}) - E(n_{i}) E(n_{j})\).

mean_n = photon_number_mean_vector(mu, cov)
cov_n = photon_number_covmat(mu, cov)

Let’s visualize these two results in a figure.

fig, ax = plt.subplots(1, 2, figsize=(14, 5))

ax[0].bar(range(len(mean_n)), mean_n, width=0.75, align="center")
ax[0].set_title(
    f"<n> = {round(np.mean(mean_n), 3)} // <n_tot> = {round(np.sum(mean_n), 3)}"
)
ax[0].set_xlabel("pulse index")
ax[0].set_ylabel("mean photon number")
ax[0].grid()

ax[1].imshow(
    cov_n,
    norm=matplotlib.colors.SymLogNorm(
        linthresh=10e-6, linscale=1e-4, vmin=-cov_n.max(), vmax=cov_n.max()
    ),
    cmap="RdBu_r",
)
ax[1].set_title("Cov(n_i, n_j)")

plt.show()
../_images/sphx_glr_run_td3_001.png

Out:

/home/circleci/project/photonics/tutorials/run_td3.py:507: MatplotlibDeprecationWarning: default base may change from np.e to 10.  To suppress this warning specify the base keyword argument.

We can obtain the same kind of statistics in a very straightforward way from the experimental data and put them into direct comparison.

By design, DeLorean is an experiment that is difficult to sample from classically. So what the simulator devices cannot do is return actual samples as the remote engine does; at least not in a reasonable amount of time. Instead, we can obtain the covariance matrix which provides a full description of the Gaussian state we produced.

Calculating the transfer matrix

The circuit above can be described as a combination of initial state preparation, followed by an interferometer over the 216 modes used in the circuit. This interferometer setup can be described by a so-called transfer matrix. By having access to the tranfer matrix it’s possible to recreate the same circuit using different initial states. Instead of starting with a squeezed state, as we did above, we could e.g. apply the transfer matrix to a coherent state or GKP states.

There’s a way to quickly calculate the transfer matrix of a circuit — as long as it only contains beamsplitters, rotation gates and loss channels — without having to execute it. This is done by using the “passive” compiler to compile the circuit into a single transfer gate. We thus remove the squeezing from the circuit and simply compile the program using the “passive” compiler, extracting the parameter from the restulting PassiveGate.

prog = sf.TDMProgram(N)

with prog.context(*gate_args) as (p, q):
    LossChannel(eta_comm) | q[n[0]]

    for i in range(len(delays)):
        Rgate(p[2 * i + 1]) | q[n[i]]
        BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
        Rgate(phi_loop[i]) | q[n[i]]
        LossChannel(eta_loop[i]) | q[n[i]]

    LossChannel(p[7]) | q[0]

prog.space_unroll()

prog = prog.compile(compiler="passive")

Note that the circuit now only consist of a single PassiveGate, which has a single matrix parameter, which is the circuits transfer matrix. This could then be used to play around with the same setup, but different state preparations.

transfer_matrix = prog.circuit[0].op.p[0]

print(transfer_matrix.shape)

Out:

(259, 259)

Total running time of the script: ( 0 minutes 58.894 seconds)

Gallery generated by Sphinx-Gallery