or how we accidentally reinvented deep sets and forgot about graph neural networks
We proposed the idea of mean embeddings in my first JMLR paper, Deep Reinforcement Learning for Swarm Systems
Instead, we were looking for a more compact representation that was permutation invariant to the ordering and of a fixed size, independent of the number of agents. In the context of swarms, where agents are assumed to be homogeneous and therefore interchangeable, we treated the observation items an agent has on neighboring agents as independent samples from a probability distribution. We then calculated an empirical estimate of the expected feature map (the mean embedding) to represent the distribution of observations as an element in a reproducing kernel Hilbert space. This feature map provided an effective and well working representation in the case of a fully connected graph
In this post, I will quickly reintroduce the mean embedding and connect it to modern graph neural networks. An implementation using pytorch can be found here.
Before we start, let’s recap some definitions from
As mentioned before, simply concatenating the items in $O^i$ and $o^i_\text{loc}$ has various drawbacks as it ignores the permutation invariance inherent to a homogeneous agent network. Furthermore, it grows linearly with the number of agents in the swarm and is therefore limited to a fixed number of neighbors when used in combination with neural network policies. A simple way to achieve permutation invariance of the elements of $O^i$ as well as flexibility to the size of $O^i$ is to use a mean feature embedding, i.e.,
\[\hat{\mu}_{O^i} = \frac{1}{|O^i|} \sum_{o^{i,j} \in O^i} \phi(o^{i, j}),\]where $\phi$ defines the feature space of the mean embedding, realized by a standard feed forward neural network. This embedding can then further be processed by the policy neural network $\pi$. The resulting policy model
\[\pi(O^i, o^i_\text{loc}) = \rho(o^i_\text{loc}, \hat{\mu}_{O^i})\]is an instance of the invariant model proposed in
Earlier, we already defined the swarm as a graph where the agents are the nodes and information can flow between them along edges. More precisely, the local features $o^i_\text{loc}$ describe the node features and $o^{i, j}$ the edge features in a directed graph. These features are usually denoted as $x_i$ and $e_{i, j}$. Using graph terminology, the action for agent $i$ is determined as a function
\[\pi(\mathcal{G}, i) = \rho(x_i, \bigoplus_{j \in \mathcal{N}_\mathcal{G}(i)} \phi(e_{i, j}))\]where choosing $\bigoplus$ to be the mean results exactly in the mean embedding policy defined in the previous section.
If we assume certain communication abilities between agents
is updated based on the latent node and edge features using learned linear embeddings $x^0_v = M_x x_v$. After $K$ steps, the action is determined as a function $\pi(x^K_i)$ where information has flown over up to $K$ hops.