Designing effective reward functions is a central challenge in reinforcement learning for both autonomous driving and robotics. Hand-crafted rewards often rely on task-specific knowledge, privileged information, or tedious engineering, making them difficult to scale to complex, real-world settings. Recent approaches learn rewards from demonstrations or auxiliary objectives such as video prediction, but these methods can struggle in uncertain scenarios and may fail to generalize beyond expert behaviour. To address these limitations, this project advocates for data-driven reward learning: models that directly infer the quality of behaviour from raw observations across a diverse range of scenarios, including both successful and suboptimal trajectories. By capturing preferences such as safety, efficiency, and goal completion without requiring manual specification, learned reward models can provide richer supervision for planning and control. Beyond driving, this approach naturally extends to robotic manipulation, where specifying all desired behaviours through hand-coded objectives is similarly impractical. Learned reward functions hold promise not only as objective signals for reinforcement learning, but also as tools for evaluating behaviour, annotating data, and enabling interpretable decision-making in autonomous systems.
Principal Investigator, Company and Country
Fatma Guney, Koc University, Turkey