In the context of Galerkin discretizations of a partial differential equation (PDE), the modes of the classical method of proper orthogonal decomposition (POD) can be interpreted as the ansatz and trial functions of a low-dimensional Galerkin scheme. If one also considers a Galerkin method for the time integration, one can similarly define a POD reduction of the temporal component. This has been described earlier but not expanded upon—probably because the reduced time discretization globalizes time, which is computationally inefficient. However, in finite-time optimal control systems, time is a global variable and there is no disadvantage from using a POD reduced Galerkin scheme in time. In this paper, we provide a newly developed generalized theory for space-time Galerkin POD, prove its optimality in the relevant function spaces, show its application for the optimal control of nonlinear PDEs, and, by means of a numerical example with Burgers’ equation, discuss the competitiveness by comparing to standard approaches.