10 5 the use of reinforcement learning reduces the dynamic scheduling problem to that of learning a stochastic approximation of an unknown average error surface