Data-driven Convex Policy Optimization in an Assemble-to-order System

Release time: 2023-10-11 clicks:

Reporter：TianhuDeng

Abstract：This paper investigates the optimization of periodic-review assemble-to-order (ATO) production systems with multiple products assembled from multiple components, under the data-driven setting where only historical demand data is available and demand distributions are unknown. To address this challenge, we propose a semi-model-based fitted Q iteration (S-FQI) algorithm framework that leverages the known transition dynamics. We provide a proof of the statistical convergence rate of the proposed algorithm concerning the number of iterations, the number of demand samples, and the number of generated trajectories.

Additionally, we introduce the convex-TD3 (CTD3) algorithm to tackle practical challenges by incorporating the convex property of ATO systems and utilizing an input convex neural network (ICNN) to improve efficiency and effectiveness.

Prev article：Data-driven Piecewise Affine Decision Rule Methods for Stochastic Optimization with Covariate Information Next article：Decent work and marital satisfaction among Chinese workers: Is there a north-south divide?