Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)

arXiv cs.RO / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Reinforcement learning often underperforms in real-world, long-horizon, high-dimensional problems when rewards are sparse or poorly designed, leading to slow exploration and local optima.
  • The paper proposes FARS, a fuzzy-logic-based adaptive reward shaping approach that encodes human intuition as interpretable fuzzy rules.
  • FARS dynamically adjusts how reward components contribute depending on the agent’s state, improving training stability and reducing sensitivity to hyperparameters.
  • Experiments on autonomous drone racing benchmarks indicate faster convergence and lower performance variance, with success rates improving by up to about 5% versus non-fuzzy reward designs.
  • Overall, the method targets robust navigation behaviors, including smoother switching between fast motion and precise control in increasingly difficult scenarios.

Abstract

Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.