Advanced time-domain photonic circuits

Authors: Fabian Laudenbach and Theodor Isacsson

The demonstration of quantum-computational advantage, i.e. the ability of a quantum processor to accomplish a task exponentially faster than any super computer could, is considered an important milestone towards useful quantum computers. One of these tasks that are proven to be computationally hard is to sample the photon numbers of multimode squeezed states that have travelled through a non-trivial interferometer — a problem which is referred to as Gaussian Boson Sampling (GBS). In our publication [1], we showcase a programmable loop interferometer that successfully samples GBS instances orders of magnitudes faster than any known classical algorithm could. The experiment is based on temporal-division multiplexing (TDM) — also known as time-domain multiplexing — which allows for a comparatively simple experimental setup in which the number of optical parts and devices is independent from the number of optical modes. A single squeezed-light source emits batches of 216 subsequent squeezed-light pulses. These pulses are interfered with one another by the help of optical delay lines (acting as memory buffers), programmable beamsplitters and phase shifters. A simple schematic of the setup can be seen below.

tutorials/images/delorean_schematic.png

The architecture of our GBS machine – which we call DeLorean – is layed out as follows. A squeezed-light source injects pulse trains of 216 modes, temporally spaced by \(\tau = 167~\text{ns}\), into the interferometer which consists of three delay loops, each one of them mainly characterized by the roundtrip time it takes a mode to travel trough it. The first loop has a roundtrip time of \(1 \tau\), the second and third loop have \(6 \tau\) and \(36 \tau\), respectively. Each loop is associated with two programmable Gaussian gates; a rotation gate right before the loop and a variable beamsplitter that interferes modes approaching the loop with modes coming from within the loop.

In this tutorial, we will discuss how a GBS circuit on DeLorean can be programmed using the strawberryfields.TDMProgram class. This class allows us to define and run time-multiplexed photonic circuits in a very efficient way, even if these circuits include large numbers of modes. For more background on the TDMProgram, please refer to the beginner and advanced tutorials on time-domain photonic circuits.

After defining our GBS circuit, we can either run a simulation or submit the program to the hardware.

Defining a GBS circuit

We begin by importing NumPy and the Strawberry Fields package, which will be used both to connect to the TD3 hardware as well as to run the local TDM simulator.

import numpy as np
import strawberryfields as sf

A time-domain program is defined by the sequences of arguments each quantum gate applies at each time bin within the length of the program. As DeLorean is an interferometer with a total of six Gaussian gates (3 rotation gates, 3 tunable beamsplitters), we require six lists of arguments:

  • phi_0, phi_1, phi_2: sequences of arguments applied by the rotation gates

  • alpha_0, alpha_1, alpha_2: sequences of arguments applied by the beamsplitters

For GBS, these gates need to be set randomly within the range that the devices in our lab are comfortable with. Our rotation gates can apply phases between \(-\pi / 2\) and \(\pi / 2\). The beamsplitter intensity transmission can be set anywhere between 0 and 1. And given some intensity transmission \(T\), the argument passed to our beamsplitter will be \(\alpha=\text{arccos}(\sqrt{T})\). Considering this conversion, the allowed intervals, and the fact that our source emits batches of 216 squeezed-light modes, we can define our random sequences by:

modes = 216
min_phi, max_phi = -np.pi / 2, np.pi / 2
min_T, max_T = 0, 1

# rotation gate parameters
phi_0 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_1 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_2 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()

# beamsplitter parameters
alpha_0 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_1 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_2 = np.arccos(
    np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()

Now here comes a tricky part. Although these gate sequences are of length 216 (the number of squeezed-light modes in our GBS instance) the actual time-domain program needs to be longer than that. Why is that the case? Simply put, we want to make sure that the GBS unitary acts on our 216 computational modes exclusively and not on any vacuum modes. This is not as easy as it sounds because there is a bunch of vacuum already in the delay lines before the first squeezed-light mode enters the interferometer.

If a squeezed-light mode interferes with a vacuum mode at a beamsplitter it will effectively undergo loss which we want to avoid. Therefore, we need to set the first arguments of each loop to \(T=1\) (\(\alpha=0\)). This will couple an incoming computational mode entirely into the loop, and a vacuum mode that was inside the loop will be coupled out — no interference between the two will occur. For each loop this needs to be repeated until the respective delay line is completely emptied from vacuum modes and filled with computational modes.

delay_0, delay_1, delay_2 = 1, 6, 36

alpha_0[:delay_0] = [0] * delay_0
alpha_1[:delay_1] = [0] * delay_1
alpha_2[:delay_2] = [0] * delay_2

This still doesn’t explain why the arrays need to be longer than 216.

We just imposed that the first computational mode is fully coupled into the first loop. This means that it will arrive at the second loop after exactly one time bin (i.e. one roundtrip in the first delay line). We also imposed that the first mode should be completely coupled into the second loop as well, which means it will arrive at the third loop no earlier 7 time bins after the program has started. Therefore, the gates associated to the second and third loop need some idle time bins, awaiting the first computational mode.

We thus need to append 0-valued parameters (i.e., full transmission) to the gates in the first two loops. This will make sure that the non-trivial random gates are applied to the computational modes exclusively.

# second loop needs to await the first mode for 1 time bin (delay_0 = 1)
phi_1 = [0] * delay_0 + phi_1
alpha_1 = [0] * delay_0 + alpha_1

# second loop needs to await the first mode for 7 time bins (delay_0 + delay_1 = 1 + 6)
phi_2 = [0] * (delay_0 + delay_1) + phi_2
alpha_2 = [0] * (delay_0 + delay_1) + alpha_2

Towards the end of the time-domain program, we need to make sure that all computational modes in the loops are entirely coupled out again, avoiding interfering with the vacuum modes that will follow the pulse train of 216 squeezed-light modes. This means we need to set the beamsplitters to full transmission again, and, on top of this, since all gate sequences should be of same length, the gates of the first and second loop need to remain idle until the very last computational mode has exited the third loop.

This requires us to add 0-valued parameters corresponding to the number of modes required to exit through the current, and all coming, loop(s); i.e., 43 (1 + 6 + 36) for the first loop, 42 (6 + 36) for the second loop, and 36 for the third loop.

# empty first loop (delay_0) and await the emptying of second and third loop (delay_1 + delay_2)
phi_0 += [0] * (delay_0 + delay_1 + delay_2)
alpha_0 += [0] * (delay_0 + delay_1 + delay_2)

# empty second loop (delay_1) and await the emptying of third loop (delay_2)
phi_1 += [0] * (delay_1 + delay_2)
alpha_1 += [0] * (delay_1 + delay_2)

# empty third loop
phi_2 += [0] * delay_2
alpha_2 += [0] * delay_2
tutorials/images/td3_vacuum_padding.png

Unfortunately, all this padding with idle time bins before and after the random gates is necessary to avoid interference of computational modes with vacuum. This simply reflects the reality that we need a total unitary that fits the size of the total input state: 216 computational modes and 43 vacuum modes (1, 6, 36 of them in the first, second and third loop, respectively). Our total unitary thus needs to be of size 259 \(\times\) 259. Check out the image above for an overview of the necessary vacuum mode padding.

However, the reward of all precautions taken above is that the unitary that we care about is now a block matrix within the total unitary, acting on the 216 computational modes exclusively. Moreover, the length of the argument lists submitted to the six quantum gates is now exactly 259, matching the size of the total unitary.

[len(parameters) for parameters in (phi_0, phi_1, phi_2, alpha_0, alpha_1, alpha_2)]

Out:

[259, 259, 259, 259, 259, 259]

After declaring the programmable gates, we need to take care of the parameters that we cannot program. In particular, these are the squeezing parameters of the 216 modes (not programmable yet) and the intrinsic phase offsets of the three loops. If we want to run an accurate simulation of the GBS instance, we also need various efficiency parameters.

If you have an API key, you can retrieve these non-programmable parameters from the platform upon connecting to DeLorean, calling the eng.device_spec property on the RemoteEngine object. If not, you can simply download the dummy device specification from here. This file can then be loaded directly into the tutorial.

import json

with open("td3_device_spec.json") as file_:
    device_spec = json.load(file_)

The non-programmable parameters can now be retrieved from the device specification loaded above. There are 5 different sets of parameters that we need to apply to the circuit (with some more detailed explanations below):

  • r: the squeezing parameters set at the start of the circuit (259 values)

  • phi_loop: list with the three offsets used in the rotation gates in each respective loop (3 values)

  • eta_comm: the common efficiency for the full circuit (1 value)

  • eta_loop: list with three efficiencies for each individual loop (3 values)

  • eta_ch_rel: the efficiencies for the connections to the 16 photon-number detectors (259 values)

# squeezing parameters
r = device_spec["r"]

# list of three loop offsets
phi_loop = device_spec["phi_loop"]

# common efficiency
eta_comm = device_spec["eta_comm"]

# list of three loop efficiencies
eta_loop = device_spec["eta_loop"]

# relative demux/PNR efficiencies
eta_ch_rel = device_spec["eta_ch_rel"]

Note

  • r: Our squeezed-light source emits batches of 216 subsequent pulses, so why is r of length 259? As established in the previous section, the time-domain program does not stop after 216 time bins. Rather, it is over as soon as the last computational mode has left the third loop. After 216 time bins, the light source will not emit squeezed-light pulses anymore and vacuum modes will be injected into the interferometer instead. This is why the squeezing array r is a concatenation of 216 non-zero values and 43 zeros.

  • phi_loop: Each loop applies a static phase rotation to the modes in it. Currently, this phase rotation cannot be programmed. All we can do is to measure this phase and take it into account whenever we run a program on the machine. This is why the three loop offsets phi_loop` are part of the device spec. Even though we have no direct influence on them, we need to add them to the TDMProgram in order to obtain a correct representation of the circuit that is applied.

  • eta_comm: The total loss experienced by the different modes depends on which loops they have travelled through and on which detector they ended up in. So it will be highly inhomogeneous within the 216 computational modes. eta_comm, on the other hand, describes their common efficiency. It is experienced by a mode that bypasses all three loops and ends up in our most efficient detector. In other words 1 - eta_comm is the minimal loss that every mode will suffer. On top of that, most modes will suffer additional loss imposed by the loops and less efficient detectors.

  • eta_loop: This is simply a list containing the respective efficiencies of the three loops.

  • eta_ch_rel: In the experiment, two consecutive pulses are only 167 ns apart which exceeds the rate that one single photon-number resolving (PNR) detector can handle. So although our batches of 216 squeezed-light modes travel towards the detection module in one single fibre, they need to be demultiplexed into 16 spatial modes, each one of them probed by its one PNR detector. (You can find a schematic of our demultiplexing-and-detection module illustrated in the top figure of this tutorial.) Now, instead of one PNR operating at 6 MHz, we have 16 PNRs operating at 6/16 MHz. The demultiplexing pattern is repetitive, meaning that detector \(i\) will detect all temporal modes whose index satisfies \(i + 16j\) where \(j\) is an integer in the interval \([0, 17]\). Since the 16 spatial modes and their respective detectors do not share the exact same efficiency, eta_ch_rel keeps track of the loss suffered by each one of our 259 modes individually. If you look at it closer, you will see that it is just a list of 16 different relative efficiencies that is repeated until 259 time bins are filled.

Excecuting the circuit

Let’s collect all the (programmable and non-programmable) gate sequences in one list that we can submit to the sf.TDMProgram. The list, aptly named gate_args, contains all gate parameters in the order in which the corresponding gates will be applied. Note that the channel efficiencies are contained in the gate arguments, while the other two are kept as external lists and do not take part in the circuit unrolling. This is because common efficiency and loop efficiencies remain static throughout the program whereas the relative detection efficiency experienced by the modes needs to be updated at each time bin.

gate_args = [r, phi_0, alpha_0, phi_1, alpha_1, phi_2, alpha_2, eta_ch_rel]

Next, we declare the number of non-vacuum modes used in the circuit — in this case 216 — as well as the delays in the loops, i.e., the number of modes that will fit in each loop at the same time. We can then use a helper function called get_mode_indices to get the correct mode indices n for the circuit gates as well as the total number of modes N used to initialize the program. We also import the gates needed to construct the TD3 circuit.

Note

The number of modes plus the number of vacuum modes (i.e., the sum of the delays) equals the length of the gate parameter arrays. This is the total number of modes that will pass through each gate, and is thus also the number of gate parameters needed.

from strawberryfields.tdm.tdmprogram import get_mode_indices
from strawberryfields.ops import Sgate, Rgate, BSgate, LossChannel

modes = 216
delays = [1, 6, 36]
vac_modes = sum(delays)

n, N = get_mode_indices(delays)

We can now finally construct the circuit. The layout of this circuit is set by the hardware device and cannot be changed. If the wrong gates are used or they are put in the wrong order, the circuit validation will fail.

prog = sf.TDMProgram(N)

with prog.context(*gate_args) as (p, q):
    Sgate(p[0]) | q[n[0]]
    LossChannel(eta_comm) | q[n[0]]

    for i in range(len(delays)):
        Rgate(p[2 * i + 1]) | q[n[i]]
        BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
        Rgate(phi_loop[i]) | q[n[i]]
        LossChannel(eta_loop[i]) | q[n[i]]

    LossChannel(p[7]) | q[0]

The circuit above will be rolled-up, using symbolic placeholder variables for the gate parameters. The circuit must be unrolled with each mode only being measured once, a process we’ve deciced to call space-unrolling.

prog.space_unroll()

Space-unrolling might seem slightly confusing, but it’s actually fairly straight-forward. Instead of having several modes going through the same path, we interpret the TDM loop interactions as spatially separate. This way, each measurement is made on a different spatial mode, as can be seen in the right-hand side of the image below.

The left-handside depicts how an actual single loop TDM circuit would work, while the right-hand side is how Strawberry Fields simulates it. Since each mode can only be measured once (i.e., modes cannot be reused when using Fock measurements, opposite to the TD2 case, which uses homodyne measurements), the circuit needs to be unrolled. To get an understanding for how this space-unrolling works, try to follow the interactions happening in the left-hand side single loop setup, and try to spot them in the right-hand side circuit diagram. They are in practice equal!

tutorials/images/td3_space_unrolling.png

Runing the circuit remotely on the DeLorean will return a results object, which contains the samples that are obtained by the the machine. The samples returned by any sf.TDMProgram are of shape (shots, spatial_modes, temporal_modes), so in our case whe should have gotten (shots, 1, 259). But since the first 43 modes that arrived at the detectors are the vacuum modes that had lived in the loop before the program started, let’s crop the samples array and only focus on the 216 computational modes.

Note

If you wish to run the circuit on our TD3 hardware deive, you need an API key, and must have used it to configure Strawberry Fields correctly.

shots = 1_000_000

eng = sf.RemoteEngine("TD3")
results = eng.run(prog, shots=shots)

samples = results.samples[..., 43:]

The following two functions will return an array of mean photon numbers for the 216 modes and a \(216 \times 216\) photon-number covariance matrix whose elements are defined by \(\text{Cov}(n_{i}, n_{j}) = E(n_{i} n_{j}) - E(n_{i}) E(n_{j})\).

mean_n = np.mean(samples, axis=(0, 1))
cov_n = np.cov(samples[:, 0, :].T)

Let’s visualize these two results in a figure. For this, we will use the Matplotlib.

import matplotlib.pyplot as plt
import matplotlib.colors


def plot_photon_number_stats(mean_n, cov_n):

    fig, ax = plt.subplots(1, 2, figsize=(14, 5))

    ax[0].bar(range(len(mean_n)), mean_n, width=0.75, align="center")
    ax[0].set_title(
        f"<n> = {round(np.mean(mean_n), 3)} // <n_tot> = {round(np.sum(mean_n), 3)}"
    )
    ax[0].set_xlabel("pulse index")
    ax[0].set_ylabel("mean photon number")
    ax[0].grid()

    ax[1].imshow(
        cov_n,
        norm=matplotlib.colors.SymLogNorm(
            linthresh=10e-6, linscale=1e-4, vmin=0, vmax=2
        ),
        cmap="rainbow",
    )
    ax[1].set_title("Cov(n_i, n_j)")

    plt.show()


plot_photon_number_stats(mean_n, cov_n)
tutorials/images/td3_plot_experiment.png

Now, let us switch to a local simulation and see how the results compare against the experimental data. By design, DeLorean is an experiment that is difficult to sample from classically. So what the simulator devices cannot do is return actual samples as the remote engine does; at least not in a reasonable amount of time. Still, running the circuit locally on the gaussian backend returns a results objects with many other interesting properties.

eng = sf.Engine(backend="gaussian")

results = eng.run(prog)

If executing the program on a simulator device, we can also extract the total covariance matrix by calling the cov() method on the returned state. This is the covariance matrix for all 259 modes, i.e., the 216 computational modes along with the 43 vacuum modes that we added above. We can then reduce this down to the covariance matrix over the 216 computational modes using the reduced_gaussian method from the The Walrus package. We will also need to import two helper functions to calculate the photon number statistics.

from thewalrus.quantum import (
    reduced_gaussian,
    photon_number_mean_vector,
    photon_number_covmat,
)

cov_tot = results.state.cov()

mu_tot = np.zeros(len(cov_tot))
modes = range(vac_modes, vac_modes + modes)

mu, cov = reduced_gaussian(mu_tot, cov_tot, modes)

The covariance matrix describes the quadrature relations of the individual modes and, in the absence of displacements, contains the full information on the Gaussian state. However, if we want to be able to compare the simulation against experimental data, we would like to obtain the same photon-number statistics as above; the array of mean photon numbers for the 216 modes and a \(216 \times 216\) photon-number covariance matrix

mean_n_sim = photon_number_mean_vector(mu, cov)
cov_n_sim = photon_number_covmat(mu, cov)

plot_photon_number_stats(mean_n_sim, cov_n_sim)
tutorials/images/td3_plot_simulation.png

Hopefully, the figures obtained from experiment and simulation look alike to a reasonable degree. Now, on top of a visual comparison, let us compare the first and second-order statistics numerically and see how well they match. An appropriate measure to do so is the total variation distance (TVD), given by \(1/2 \sum_{i} \|x_{i} - y_{i} \|\) where \(x_{i}\) and \(y_{i}\) are the elements of the normalized arrays we want to compare.

def tvd(x, y):
    x /= np.sum(x)
    y /= np.sum(y)
    return 1 / 2 * np.sum(np.abs(x - y))


print("TVD 1st moment:", tvd(mean_n, mean_n_sim))
print("TVD 2nd moment:", tvd(cov_n, cov_n_sim))

Out:

TVD 1st moment: 0.0027349155377118474
TVD 2nd moment: 0.07165464626893124

Calculating the transfer matrix

The circuit above can be described as a combination of initial state preparation, followed by an interferometer over the 216 modes used in the circuit. This interferometer setup can be described by a so-called transfer matrix. By having access to the tranfer matrix it’s possible to recreate the same circuit using different initial states. Instead of starting with a squeezed state, as we did above, we could e.g. apply the transfer matrix to a coherent state or GKP states.

There’s a way to quickly calculate the transfer matrix of a circuit — as long as it only contains beamsplitters, rotation gates and loss channels — without having to execute it. This is done by using the “passive” compiler to compile the circuit into a single transfer gate. We thus remove the squeezing from the circuit and simply compile the program using the “passive” compiler, extracting the parameter from the restulting PassiveGate.

prog = sf.TDMProgram(N)

with prog.context(*gate_args) as (p, q):
    LossChannel(eta_comm) | q[n[0]]

    for i in range(len(delays)):
        Rgate(p[2 * i + 1]) | q[n[i]]
        BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
        Rgate(phi_loop[i]) | q[n[i]]
        LossChannel(eta_loop[i]) | q[n[i]]

    LossChannel(p[7]) | q[0]

prog.space_unroll()

prog = prog.compile(compiler="passive")

Keep in mind that the circuit now only consist of a single PassiveGate, which has a single matrix parameter. This parameter is the circuits transfer matrix, for the full 259 modes, including the vacuum modes. The transfer matrix then allows us to experiment using the same setup, but with different state preparations.

transfer_matrix = prog.circuit[0].op.p[0]

print(transfer_matrix.shape)

Out:

(259, 259)

Conclusion

This tutorial covered how to submit jobs to our temporal-division multiplexing device, the DeLorean. It’s rather exciting to be able to demonstrate the concept of quantum-computational advantage by examples running on actual hardware. As you now have seen, there are quite a few details to keep track of, but, as long as the circuit follows the correct layout, and the parameters are correctly declared, anyone with Xanadu cloud access can submit jobs to be excecuted on the TD3 hardware, and get back samples that would be difficult, if not outright impossible, to retrieve using a local simulator.

We hope that you’ve enjoyed this demonstration, and that you’re as excited as we are about the possibilities. If you wish to dig deeper into the science behind all of this, we recommend you to read our paper [1].

References

1(1,2)

Quantum computational advantage with a programmable photonic processor

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery