publications | Maximilian Hüttenrauch

2024

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning

Maximilian Hüttenrauch and Gerhard Neumann

Journal of Machine Learning Research, 2024

HTML

2023

Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization

Philipp Dahlinger, Philipp Becker, Maximilian Hüttenrauch, and 1 more author

In OPT 2023: Optimization for Machine Learning, 2023

HTML

2021

Deep Reinforcement Learning for Attacking Wireless Sensor Networks

Juan Parras, Maximilian Hüttenrauch, Santiago Zazo, and 1 more author

Sensors, 2021

Abs DOI HTML

Recent advances in Deep Reinforcement Learning allow solving increasingly complex problems. In this work, we show how current defense mechanisms in Wireless Sensor Networks are vulnerable to attacks that use these advances. We use a Deep Reinforcement Learning attacker architecture that allows having one or more attacking agents that can learn to attack using only partial observations. Then, we subject our architecture to a test-bench consisting of two defense mechanisms against a distributed spectrum sensing attack and a backoff attack. Our simulations show that our attacker learns to exploit these systems without having a priori information about the defense mechanism used nor its concrete parameters. Since our attacker requires minimal hyper-parameter tuning, scales with the number of attackers, and learns only by interacting with the defense mechanism, it poses a significant threat to current defense procedures.
Coordinate ascent MORE with adaptive entropy control for population-based regret minimization

Maximilian Hüttenrauch and Gerhard Neumann

In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, France, 2021

Abs DOI HTML

Model-based Relative Entropy Policy Search (MORE) is a population-based stochastic search algorithm with desirable properties such as a well defined policy search objective, i.e., it optimizes the expected return, and exact closed form information theoretic update rules. This is in contrast with existing population-based methods, that are often referred to as evolutionary strategies, such as CMA-ES. While these methods work very well in practice, the updates of the search distribution are often based on heuristics and they do not optimize the expected return of the population but instead implicitly optimize the return of elite samples, which may yield a poor expected return and unreliable or risky solutions. We show that the MORE algorithm can be improved with distinct updates based on coordinate ascent on the mean and covariance of the search distribution, which considerably improves the convergence speed while maintaining the exact closed form updates. In this way, we can match the performance of elite samples of CMA-ES while also showing a considerably improved performance of the sample average. We evaluate our new algorithm on simulated robotic tasks and compare to the state of the art CMA-ES.

2019

Deep Reinforcement Learning for Swarm Systems

Maximilian Hüttenrauch, Adrian Šošić, and Gerhard Neumann

Journal of Machine Learning Research, 2019

HTML

2018

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Maximilian Hüttenrauch, Adrian Šošić, and Gerhard Neumann

In Swarm Intelligence, 2018

Abs

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

2017

Guided Deep Reinforcement Learning for Swarm Systems

Maximilian Hüttenrauch, Adrian Šošić, and Gerhard Neumann

In Autonomous Robots and Multirobot Systems (ARMS) Workshop, AAMAS 2017, São Paulo, Brazil, 2017

arXiv