Note

Click here to download the full example code

# Advanced time-domain photonic circuits¶

*Authors: Fabian Laudenbach and Theodor Isacsson

TODO: * Inlcude schematic figure (Fabian/Theo) * Include figure explaining vacuum business (Fabian/Theo) * Create mock device_spec (Fabian/Theo/Leo) * Include introduction (Fabian) * Compare remote with local simulation (Fabian)

## Introduction¶

The demonstration of quantum-computational advantage, i.e. the ability of a quantum processor to accomplish a task exponentially faster than any super computer could, is considered an important milestone towards useful quantum computers. One of these tasks that are proven to be computationally hard is to sample the photon numbers of multimode squeezed states that have travelled through a non-trivial interferometer – a problem which is referred to as Gaussian BosonSampling (GBS). In our publication [REFERENCE TO SUPREMACY PAPER], we showcase a programmable loop interferometer that successfully sampled GBS instances orders of magnitudes faster than any known classical algorithm could. The experiment is based on temporal-division multiplexing (TMD) which allows for a comparatively simple experimental setup in which the number of optical parts and devices is independent from the number of optical modes. A single squeezed-light source emits batches of 216 subsequent squeezed-light pulses. These pulses are interfered with one another by the help of optical delay lines (acting as memory buffer), programmable beamsplitters and phase shifters.

The architecture of our GBS machine – which we call *DeLorean* – is layed
out as follows: A squeezed-light source injects pulse trains of 216 modes,
temporally spaced by \(\tau = 167~\text{ns}\), into the interferometer which
consists of three delay loops, each one of them mainly characterized by the
roundtrip time it takes a mode to travel trough it. The first loop has a
roundtrip time of \(1 \tau\), the second and third loop have \(6
\tau\) and \(36 \tau\), respectively. Each loop is associated with two
programmable Gaussian gates: a rotation gate right before the loop and a
variable beamsplitter that interferes modes approaching the loop with modes
coming from *within* the loop.

In this tutorial, we will discuss how a GBS circuit on *DeLorean* can be
programmed using the `sf.TDMProgram`

class. This class allows us to define
and run time-multiplexed photonic circuits in a very efficient way, even if
these circuits include large numbers of modes. For more background on the
`sf.TDMProgram`

, please refer to the beginner and
advanced
tutorial on time-domain photonic circuits.

After defining our GBS circuit, we can either run a simulation or submit the program to the hardware.

## Defining a GBS circuit¶

```
import json
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import strawberryfields as sf
from strawberryfields.tdm.tdmprogram import get_mode_indices
from strawberryfields.ops import Sgate, Rgate, BSgate, LossChannel
from thewalrus.quantum import (
reduced_gaussian,
photon_number_mean_vector,
photon_number_covmat,
)
```

A time-domain program is defined by the sequences of arguments each
quantum gate applies at each time bin within the length of the program.
As *DeLorean* is an interferometer with a total of six Gaussian gates (3
rotation gates, 3 tunable beamsplitters), we require six lists of
arguments:

`phi_0`

,`phi_1`

,`phi_2`

: sequences of arguments applied by the rotation gates`alpha_0`

,`alpha_1`

,`alpha_2`

: sequences of arguments applied by the beamsplitters

For GBS, these gates need to be set randomly within the range that the devices in our lab are comfortable with. Our rotation gates can apply phases between \(-\pi / 2\) and \(\pi / 2\). The beamsplitter intensity transmission can be set anywhere between 0 and 1. And given some intensity transmission \(T\), the argument passed to our beamsplitter will be \(\alpha=\text{arccos}(\sqrt{T})\). Considering this conversion, the allowed intervals, and the fact that our source emits batches of 216 squeezed-light modes, we can define our random sequences by:

```
modes = 216
min_phi, max_phi = -np.pi / 2, np.pi / 2
min_T, max_T = 0, 1
# rotation gate parameters
phi_0 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_1 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
phi_2 = np.random.uniform(low=min_phi, high=max_phi, size=modes).tolist()
# beamsplitter parameters
alpha_0 = np.arccos(
np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_1 = np.arccos(
np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
alpha_2 = np.arccos(
np.sqrt(np.random.uniform(low=min_T, high=max_T, size=modes))
).tolist()
```

Now here comes a tricky part. Although these gate sequences are of length 216
(the number of squeezed-light modes in our GBS instance) the actual
time-domain program needs to be longer than that. Why is that the case? Simply
put, we want to make sure that the GBS unitary acts on our 216 computational
modes exclusively and *not* on any vacuum modes. This is not as easy as it
sounds because there is a bunch of vacuum already in the delay lines before
the first squeezed-light mode enters the interferometer.

If a squeezed-light mode interferes with a vacuum mode at a beamsplitter it will effectively undergo loss which we want to avoid. Therefore, we need to set the first arguments of each loop to \(T=1\) (\(\alpha=0\)). This will couple an incoming computational mode entirely into the loop, and a vacuum mode that was inside the loop will be coupled out — no interference between the two will occur. For each loop this needs to be repeated until the respective delay line is completely emptied from vacuum modes and filled with computational modes.

```
delay_0, delay_1, delay_2 = 1, 6, 36
alpha_0[:delay_0] = [0] * delay_0
alpha_1[:delay_1] = [0] * delay_1
alpha_2[:delay_2] = [0] * delay_2
```

This still doesn’t explain why the arrays need to be longer than 216.

We just imposed that the first computational mode is fully coupled into the first loop. This means that it will arrive at the second loop after exactly one time bin (i.e. one roundtrip in the first delay line). We also imposed that the first mode should be completely coupled into the second loop as well, which means it will arrive at the third loop no earlier 7 time bins after the program has started. Therefore, the gates associated to the second and third loop need some idle time bins, awaiting the first computational mode.

We thus need to append 0-valued parameters (i.e., full transmission) to the gates in the first two loops. This will make sure that the non-trivial random gates are applied to the computational modes exclusively.

```
# second loop needs to await the first mode for 1 time bin (delay_0 = 1)
phi_1 = [0] * delay_0 + phi_1
alpha_1 = [0] * delay_0 + alpha_1
# second loop needs to await the first mode for 7 time bins (delay_0 + delay_1 = 1 + 6)
phi_2 = [0] * (delay_0 + delay_1) + phi_2
alpha_2 = [0] * (delay_0 + delay_1) + alpha_2
```

Towards the end of the time-domain program, we need to make sure that all computational mode in the loops is entirely coupled out again, avoiding interfering with the vacuum modes that will follow the pulse train of 216 squeezed-light modes. This means we need to set the beamsplitters to full transmission again, and, on top of this, since all gate sequences should be of same length, the gates of the first and second loop need to remain idle until the very last computational mode has exited the third loop.

This requires us to add 0-valued parameters corresponding to the number of modes required to exit through the current, and all coming, loop(s); i.e., 43 (1 + 6 + 36) for the first loop, 42 (6 + 36) for the second loop, and 36 for the third loop.

```
# empty first loop (delay_0) and await the emptying of second and third loop (delay_1 + delay_2)
phi_0 += [0] * (delay_0 + delay_1 + delay_2)
alpha_0 += [0] * (delay_0 + delay_1 + delay_2)
# empty second loop (delay_1) and await the emptying of third loop (delay_2)
phi_1 += [0] * (delay_1 + delay_2)
alpha_1 += [0] * (delay_1 + delay_2)
# empty third loop
phi_2 += [0] * delay_2
alpha_2 += [0] * delay_2
```

Unfortunately, all this padding with idle time bins before and after the random gates is necessary to avoid interference of computational modes with vacuum. This simply reflects the reality that we need a total unitary that fits the size of the total input state: 216 computational modes and 43 vacuum modes (1, 6, 36 of them in the first, second and third loop, respectively). Our total unitary thus needs to be of size 259 \(\times\) 259.

However, the reward of all precautions taken above is that the unitary
that we *care* about is now a block matrix within the total unitary,
acting on the 216 computational modes exclusively. Moreover, the length
of the argument lists submitted to the six quantum gates is now exactly
259, matching the size of the total unitary.

```
[len(parameters) for parameters in (phi_0, phi_1, phi_2, alpha_0, alpha_1, alpha_2)]
```

Out:

```
[259, 259, 259, 259, 259, 259]
```

After definition of the programmable gates, we need to take care of the
parameters that we *cannot* program. In particular, these are the
squeezing parameters of the 216 modes (not programmable *yet*) and the
intrinsic phase offsets of the three loops. If we want to run an
accurate *simulation* of the GBS instance, we also need various
efficiency parameters.

If you have an API key, you can retrieve these non-programmable
parameters from the platform upon connecting to *DeLorean*, calling the
`eng.device_spec`

property on the `RemoteEngine`

object. If not, you
can simply download the dummy device specification from
here. This file can then be loaded
directly into the tutorial.

```
with open("td3_device_spec.json") as file_:
device_spec = json.load(file_)
```

The non-programmable parameters can now be retrieved from the device specification loaded above. There are 5 different sets of parameters that we need to apply to the circuit (with some more detailed explanations below):

`r`

: the squeezing parameters set at the start of the circuit (259 values)`phi_loop`

: list with the three offsets used in the rotation gates in each respective loop (3 values)`eta_comm`

: the common efficiency for the full circuit (1 value)`eta_loop`

: list with three efficiencies for each individual loop (3 values)`eta_ch_rel`

: the efficiencies for the connections to the 16 photon-numberdetectors (259 values)

```
# squeezing parameters
r = device_spec["r"]
# list of three loop offsets
phi_loop = device_spec["phi_loop"]
# common efficiency
eta_comm = device_spec["eta_comm"]
# list of three loop efficiencies
eta_loop = device_spec["eta_loop"]
# relative demux/PNR efficiencies
eta_ch_rel = device_spec["eta_ch_rel"]
```

Some notes on the paramters we just loaded:

`r`

: Our squeezed-light source emits batches of 216 subsequent pulses, so why is`r`

of length 259? As established in the previous section, the time-domain program does not stop after 216 time bins. Rather, it is over as soon as the last computational mode has left the third loop. After 216 time bins, the light source will not emit squeezed-light pulses anymore and vacuum modes will be injected into the interferometer instead. This is why the squeezing array`r`

is a concatenation of 216 non-zero values and 43 zeros.`phi_loop`

: Each loop applies a static phase rotation to the modes in it. Currently, this phase rotation cannot be programmed. All we can do is to*measure*this phase and take it into account whenever we run a program on the machine. This is why the three loop offsets`phi_loop``

are part of the device spec. Even though we have no direct influence on them, we need to add them to the`TDMProgram`

in order to obtain a correct representation of the circuit that is applied.`eta_comm`

: The total loss experienced by the different modes depends on which loops they have travelled through and on which detector they ended up in. So it will be highly inhomogeneous within the 216 computational modes.`eta_comm`

, on the other hand, describes their*common*efficiency. It is experienced by a mode that bypasses all three loops and ends up in our most efficient detector. In other words`1 - eta_comm`

is the minimal loss that every mode will suffer. On top of that, most modes will suffer additional loss imposed by the loops and less efficient detectors.`eta_loop`

: This is simply a list containing the respective efficiencies of the three loops.`eta_ch_rel`

: In the experiment, two consecutive pulses are only 167 ns apart which exceeds the rate that one single photon-number resolving (PNR) detector can handle. So although our batches of 216 squeezed-light modes travel towards the detection module in one single fibre, they need to be demultiplexed into 16 spatial modes, each one of them probed by its one PNR detector. See Figure [ADD FIG REFERENCE] for a schematic. Now, instead of one PNR operating at 6 MHz, we have 16 PNRs operating at 6/16 MHz. The demultiplexing pattern is repetitive, meaning that detector \(i\) will detect all temporal modes whose index satisfies \(i + 16j\) where \(j\) is an integer in the interval \([0, 17]\). Since the 16 spatial modes and their respective detectors do not share the exact same efficiency,`eta_ch_rel`

keeps track of the loss suffered by each one of our 259 modes individually. If you look at it closer, you will see that it is just a list of 16 different relative efficiencies that is repeated until 259 time bins are filled.

### Excecuting the circuit¶

Let’s collect all the (programmable and non-programmable) gate sequences in
one list that we can submit to the `sf.TDMProgram`

. The list, aptly named
`gate_args`

, contains all gate parameters in the order in which the
corresponding gates will be applied. Note that the channel efficiencies are
contained in the gate arguments, while the other two are kept as external
lists and do not take part in the circuit unrolling. This is because common
efficiency and loop efficiencies remain static throughout the program whereas
the relative detection efficiency experienced by the modes needs to be updated
at each time bin.

```
# gate arguments for the TDM program
gate_args = [r, phi_0, alpha_0, phi_1, alpha_1, phi_2, alpha_2, eta_ch_rel]
```

Next, we declare the number of non-vacuum modes used in the circuit — in
this case 216 — as well as the delays in the loops, i.e., the number of
modes that will fit in each loop at the same time. We can then use a
helper function called `get_mode_indices`

to get the correct mode
indices `n`

for the circuit gates as well as the total number of modes
`N`

used to initialize the program.

**Note**: the number of modes plus the number of vacuum modes (i.e., the
sum of the delays) equals the length of the gate parameter arrays. This
is the total number of modes that will pass through each gate, and is
thus also the number of gate parameters needed.

```
modes = 216
delays = [1, 6, 36]
vac_modes = sum(delays)
n, N = get_mode_indices(delays)
```

We can now finally construct the circuit. The layout of this circuit is set by the hardware device and cannot be changed. If the wrong gates are used or they are put in the wrong order, the circuit validation will fail.

```
prog = sf.TDMProgram(N)
with prog.context(*gate_args) as (p, q):
Sgate(p[0]) | q[n[0]]
LossChannel(eta_comm) | q[n[0]]
for i in range(len(delays)):
Rgate(p[2 * i + 1]) | q[n[i]]
BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
Rgate(phi_loop[i]) | q[n[i]]
LossChannel(eta_loop[i]) | q[n[i]]
LossChannel(p[7]) | q[0]
```

The circuit above will be rolled-up, using symbolic placeholder variables for the gate parameters. The circuit must be unrolled with each mode only being measured once, a process we’ve deciced to call space-unrolling.

Out:

```
<strawberryfields.tdm.tdmprogram.TDMProgram object at 0x7f10acbb8588>
```

Space-unrolling might seem slightly confusing, but it’s actually fairly straight-forward. Instead of having several modes going through the same path, we interpret the TDM loop interactions as spatially separate. This way, each measurement is made on a different spatial mode, as can be seen in the right-hand side of the image below.

The left-handside depicts how an actual single loop TDM circuit would work, while the right-hand side is how Strawberry Fields simulates it. Since each mode can only be measured once (i.e., modes cannot be reused when using Fock measurements, opposite to the TD2 case, which uses homodyne measurements), the circuit needs to be unrolled. To get an understanding for how this space-unrolling works, try to follow the interactions happening in the left-hand side single loop setup, and try to spot them in the right-hand side circuit diagram. They are in practice equal!

Runing the circuit locally on the gaussian backend will return a results
object, which contains the samples that are returned by the TD3 program.
The samples returned by a `sf.TDMProgram`

are of shape
`(shots, spatial_modes, temporal_modes)`

, so in our case whe should
have gotten `(shots, 1, 259)`

.

**Note**: If you have an API key, and have used it to configure
Strawberry
Fields
correctly, you can try running it on our TD3 hardware device using
`eng = sf.RemoteEngine("TD3")`

.

If executing the program on a simulator device, we can also extract the total
covariance matrix by calling the `cov()`

method on the returned state. This
is the covariance matrix for all 259 modes, i.e., the 216 computational modes
along with the 43 vacuum modes that we added above. We can then reduce this
down to the covariance matrix over the 216 computational modes using the
`reduced_gaussian`

method, imported from the The Walrus package.

```
cov_tot = results.state.cov()
mu_tot = np.zeros(len(cov_tot))
modes = range(vac_modes, vac_modes + modes)
mu, cov = reduced_gaussian(mu_tot, cov_tot, modes)
```

The covariance matrix describes the quadrature relations of the individual modes and, in the absence of displacements, contains the full information on the Gaussian state. However, if we want to be able to compare the simulation against experimental data, we would like to obtain the photon-number statistics. The following two functions will return an array of mean photon numbers for the 216 modes and a \(216 \times 216\) photon-number covariance matrix whose elements are defined by \(\text{Cov}_{i,j} = E(n_{i} n_{j}) - E(n_{i}) E(n_{j})\).

```
mean_n = photon_number_mean_vector(mu, cov)
cov_n = photon_number_covmat(mu, cov)
```

Let’s visualize these two results in a figure.

```
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
ax[0].bar(range(len(mean_n)), mean_n, width=0.75, align="center")
ax[0].set_title(
f"<n> = {round(np.mean(mean_n), 3)} // <n_tot> = {round(np.sum(mean_n), 3)}"
)
ax[0].set_xlabel("pulse index")
ax[0].set_ylabel("mean photon number")
ax[0].grid()
ax[1].imshow(
cov_n,
norm=matplotlib.colors.SymLogNorm(
linthresh=10e-6, linscale=1e-4, vmin=-cov_n.max(), vmax=cov_n.max()
),
cmap="RdBu_r",
)
ax[1].set_title("Cov(n_i, n_j)")
plt.show()
```

Out:

```
/home/circleci/project/photonics/tutorials/run_td3.py:507: MatplotlibDeprecationWarning: default base may change from np.e to 10. To suppress this warning specify the base keyword argument.
```

We can obtain the same kind of statistics in a very straightforward way from the experimental data and put them into direct comparison.

By design, *DeLorean* is an experiment that is difficult to sample from
classically. So what the simulator devices *cannot* do is return actual
samples as the remote engine does; at least not in a reasonable amount
of time. Instead, we can obtain the covariance matrix which provides a
full description of the Gaussian state we produced.

### Calculating the transfer matrix¶

The circuit above can be described as a combination of initial state preparation, followed by an interferometer over the 216 modes used in the circuit. This interferometer setup can be described by a so-called transfer matrix. By having access to the tranfer matrix it’s possible to recreate the same circuit using different initial states. Instead of starting with a squeezed state, as we did above, we could e.g. apply the transfer matrix to a coherent state or GKP states.

There’s a way to quickly calculate the transfer matrix of a circuit — as
long as it only contains beamsplitters, rotation gates and loss channels
— without having to execute it. This is done by using the “passive”
compiler to compile the circuit into a single transfer gate. We thus
remove the squeezing from the circuit and simply compile the program
using the “passive” compiler, extracting the parameter from the
restulting `PassiveGate`

.

```
prog = sf.TDMProgram(N)
with prog.context(*gate_args) as (p, q):
LossChannel(eta_comm) | q[n[0]]
for i in range(len(delays)):
Rgate(p[2 * i + 1]) | q[n[i]]
BSgate(p[2 * i + 2], np.pi / 2) | (q[n[i]], q[n[i + 1]])
Rgate(phi_loop[i]) | q[n[i]]
LossChannel(eta_loop[i]) | q[n[i]]
LossChannel(p[7]) | q[0]
prog.space_unroll()
prog = prog.compile(compiler="passive")
```

Note that the circuit now only consist of a single `PassiveGate`

,
which has a single matrix parameter, which is the circuits transfer
matrix. This could then be used to play around with the same setup, but
different state preparations.

```
transfer_matrix = prog.circuit[0].op.p[0]
print(transfer_matrix.shape)
```

Out:

```
(259, 259)
```

**Total running time of the script:** ( 0 minutes 58.894 seconds)

## Contents

## Downloads

## Related tutorials