In this post, we will see how to resolve Why is the mean reward per episode of my PPO and DQN decreasing over time? Question: I am training an RL agent to optimise dispatching in a job shop manufacturing system. ...