Gaussian Process Q-Learning for Finite-Horizon Markov Decision Process