Note

Click here to download the full example code

# Dense Subgraphs¶

*Technical details are available in the API documentation:* sf.apps.subgraph

Graphs can be used to model a wide variety of concepts: social networks, financial markets, biological networks, and many others. A common problem of interest is to find subgraphs that contain a large number of connections between their nodes. These subgraphs may correspond to communities in social networks, correlated assets in a market, or mutually influential proteins in a biological network.

Mathematically, this task is known as the dense subgraph problem. The density of a \(k\)-node subgraph is equal to the number of its edges divided by the maximum possible number of edges. Identifying the densest graph of a given size, known as the densest-\(k\) subgraph problem, is NP-Hard.

As shown in [1], a defining feature of GBS is that when we encode a graph into a GBS device, it samples dense subgraphs with high probability. This property can be used to find dense subgraphs by sampling from a GBS device and postprocessing the outputs. Let’s take a look!

## Finding dense subgraphs¶

The first step is to import all required modules. We’ll need the `data`

module to load pre-generated samples, the `sample`

module to postselect samples, the
`subgraph`

module to search for dense subgraphs, and the `plot`

module to
visualize the graphs. We’ll also use Plotly which is required for the `plot`

module and
NetworkX for graph operations.

```
from strawberryfields.apps import data, sample, subgraph, plot
import plotly
import networkx as nx
```

Here we’ll study a 30-node graph with a planted 10-node graph, as considered in [1]. The graph is generated by joining two Erdős–Rényi random graphs. The first graph of 20 nodes is created with edge probability of 0.5. The second planted graph is generated with edge probability of 0.875. The planted nodes are the last ten nodes in the graph. The two graphs are joined by selecting 8 nodes at random from both graphs and adding an edge between them. This graph has the sneaky property that even though the planted subgraph is the densest of its size, its nodes have a lower average degree than the nodes in the rest of the graph.

The `data`

module has pre-generated GBS samples from this graph. Let’s load them,
postselect on samples with a large number of clicks, and convert them to subgraphs:

```
planted = data.Planted()
postselected = sample.postselect(planted, 16, 30)
pl_graph = nx.to_networkx_graph(planted.adj)
samples = sample.to_subgraphs(postselected, pl_graph)
print(len(samples))
```

Out:

```
2181
```

Not bad! We have more than 2000 samples to play with 😎. The planted subgraph is actually easy to identify; it even appears clearly from the force-directed Kamada-Kawai algorithm that is used to plot graphs in Strawberry Fields:

```
sub = list(range(20, 30))
plot.graph(pl_graph, sub)
```

A more interesting challenge is to find dense subgraphs of different sizes; it is often useful to
identify many high-density subgraphs, not just the densest ones. This is the purpose of the
`search()`

function in the
`subgraph`

module: to identify collections of dense subgraphs for a
range of sizes. The output of this function is a dictionary whose keys correspond to subgraph
sizes within the specified range. The values in the dictionary are the top subgraphs of that size
and their corresponding density.

```
dense = subgraph.search(samples, pl_graph, 8, 16, max_count=3) # we look at top 3 densest subgraphs
for k in range(8, 17):
print(dense[k][0]) # print only the densest subgraph of each size
```

Out:

```
(1.0, [21, 22, 24, 25, 26, 27, 28, 29])
(0.9722222222222222, [21, 22, 23, 24, 25, 26, 27, 28, 29])
(0.9333333333333333, [20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
(0.7818181818181819, [17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
(0.696969696969697, [0, 2, 3, 5, 6, 8, 9, 10, 14, 16, 17, 18])
(0.6666666666666666, [2, 3, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18])
(0.6483516483516484, [0, 3, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
(0.6285714285714286, [0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
(0.6083333333333333, [0, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
```

From the results of the search we learn that, depending on their size, the densest subgraphs belong to different regions of the graph: dense subgraphs of less than ten nodes are contained within the planted subgraph, whereas larger dense subgraphs appear outside of the planted subgraph. Smaller dense subgraphs can be cliques, characterized by having maximum density of 1, while larger subgraphs are less dense. Let’s see what the smallest and largest subgraphs look like.

Smallest subgraph:

```
plot.graph(pl_graph, dense[8][0][1])
```

Largest subgraph:

```
plot.graph(pl_graph, dense[12][0][1])
```

In principle there are different methods to postprocess GBS outputs to identify dense
subgraphs. For example, techniques for finding maximum cliques, included in the
`clique`

module could help provide initial subgraphs that can be resized to find
larger dense subgraphs. Such methods are hybrid algorithms combining the ability of GBS to
sample dense subgraphs with clever classical techniques. Can you think of your own hybrid
algorithm? 🤔

## Contents

## Downloads

## Related tutorials