A Minibatch-SGD-based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy, Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou, Management Science, to appear
Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands, Boxiao Chen, Yining Wang, Yuan Zhou, Management Science, 70(5), pp. 3362–3380 (2024)
Network Revenue Management with Demand Learning and Fair Resource-Consumption Balancing, Xi Chen, Jiameng Lyu, Yining Wang, Yuan Zhou, Production and Operations Management 33(2), pp. 494–511 (2024)
-
Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits, Yingkai Li, Yining Wang, Yuan Zhou, IEEE Transactions on Information Theory 70(1), pp. 372–388 (2024); prelimineary version appeared in COLT 2019
Robust Situational Reinforcement Learning in face of Context Disturbances, Jinpeng Zhang, Yufeng Zheng, Chuheng Zhang, Li Zhao, Lei Song, Yuan Zhou, Jiang Bian, ICML 2023
Learning Sparse Group Models Through Boolean Relaxation, Yijie Wang, Yuan Zhou, Xiaoqing Huang, Kun Huang, Jie Zhang, Jianzhu Ma, ICLR 2023
Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information, Boxiao Chen, David Simchi-Levi, Yining Wang, Yuan Zhou, Management Science, 68(8), pp. 5684–5703 (2022)
Near-optimal Regret Bounds for Multi-batch Reinforcement Learning, Zihan Zhang, Yuhang Jiang, Yuan Zhou, Xiangyang Ji, NeurIPS 2022
Off-policy Reinforcement Learning with Delayed Rewards, Beining Han, Zhizhou Ren, Zuofan Wu, Yuan Zhou, Jian Peng, ICML 2022
Proximal Exploration for Model-guided Protein Sequence Design, Zhizhou Ren, Jiahan Li, Fan Ding, Yuan Zhou, Jianzhu Ma, Jian Peng, ICML 2022
Learning Long-term Reward Redistribution via Randomized Return Decomposition, Zhizhou Ren, Ruihan Guo, Yuan Zhou, Jian Peng, ICLR 2022
Imitation Learning from Observations under Transition Model Disparity, Tanmay Gangwani, Yuan Zhou, Jian Peng, ICLR 2022
-
Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design, Yufei Ruan, Jiaqi Yang, Yuan Zhou, STOC 2021
Model-free Reinforcement Learning: from Clipped Pseudo-regret to Sample Complexity, Zihan Zhang, Yuan Zhou, Xiangyang Ji, ICML 2021
Optimal Policy for Dynamic Assortment Planning under Multinomial Logit Models, Xi Chen, Yining Wang, Yuan Zhou, Mathematics of Operations Research, 46–4, pp. 1639–1657 (2021)
-
Dynamic Assortment Planning under Nested Logit Models, Xi Chen, Chao Shi, Yining Wang, Yuan Zhou, Production and Operations Management, 30–1, pp. 85–102 (January 2021)
-
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition, Zihan Zhang, Yuan Zhou, Xiangyang Ji, NeurIPS 2020
Learning Guidance Rewards with Trajectory-space Smoothing, Tanmay Gangwani, Yuan Zhou, Jian Peng, NeurIPS 2020
-
Dynamic Assortment Optimization with Changing Contextual Information, Xi Chen, Yining Wang, Yuan Zhou, Journal of Machine Learning Research, 21(216), pp. 1–44 (2020)
-
Collaborative Top Distribution Identifications with Limited Interaction, Nikolai Karpov, Qin Zhang, Yuan Zhou, FOCS 2020
Multinomial Logit Bandit with Low Switching Cost, Kefan Dong, Yingkai Li, Qin Zhang, Yuan Zhou, ICML 2020
-
Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank, Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou, COLT 2020
-
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits, Chao Tao, Qin Zhang, Yuan Zhou, FOCS 2019
Exploration via Hindsight Goal Generation, Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng, NeurIPS 2019
Thresholding Bandit with Optimal Aggregate Regret, Chao Tao, Saúl Blanco, Jian Peng, and Yuan Zhou, NeurIPS 2019
Optimal Design of Process Flexibility for General Production Systems, Xi Chen, Tengyu Ma, Jiawei Zhang, Yuan Zhou, Operations Research 67–2, pp. 516–531 (2019)
Off-policy Evaluation and Learning from Logged Bandit Feedback: Error reduction via surrogate policy, Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, Jian Peng, ICLR 2019
Near-optimal Policies for Dynamic Multinomial Logit Assortment Selection Models, Yining Wang, Xi Chen, Yuan Zhou, NeurIPS 2018
Tight Bounds for Collaborative PAC Learning via Multiplicative Weights, Jiecao Chen, Qin Zhang, Yuan Zhou, NeurIPS 2018
Best Arm Identification in Linear Bandits with Linear Dimension Dependency, Chao Tao, Saúl Blanco, and Yuan Zhou, ICML 2018
Adaptive Multiple-arm Identification, Jiecao Chen, Xi Chen, Qin Zhang, Yuan Zhou, ICML 2017
-
Parameterized Algorithms for Constraint Satisfaction Problems Above Average with Global Cardinality Constraints, Xue Chen, Yuan Zhou, SODA 2017
-
Satisfiability of Ordering CSPs Above Average Is Fixed-Parameter Tractable, Konstantin Makarychev, Yury Makarychev, Yuan Zhou, FOCS 2015
Optimal Sparse Designs for Process Flexibility via Probabilistic Expanders, Xi Chen, Jiawei Zhang, Yuan Zhou, Operations Research 63–5, pp. 1159–1176 (2015)
-
Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing, Yuan Zhou, Xi Chen, Jian Li, ICML 2014
Constant Factor Lasserre Gaps for Graph Partitioning Problems, Venkatesan Guruswami, Ali Kemal Sinop, Yuan Zhou, SIAM Journal on Optimization 24–4, pp. 1698–1717 (2014)
-
Hardness of Robust Graph Isomorphism, Lasserre Gaps, and Asymmetry of Random Graphs, Ryan O’Donnell, John Wright, Chenggang Wu, Yuan Zhou, SODA 2014
-
Hypercontractive inequalities via SOS, with an application to Vertex-Cover, Manuel Kauers, Ryan O’Donnell, Li-Yang Tan, Yuan Zhou, SODA 2014
-
Approximability and proof complexity, Ryan O’Donnell, Yuan Zhou, SODA 2013
Hypercontractivity, Sum-of-Squares Proofs, and their Applications, Boaz Barak, Fernando Brandao, Aram Harrow, Jonathan Kelner, David Steurer, Yuan Zhou, STOC 2012
-
Polynomial integrality gaps for strong SDP relaxations of Densest k-Subgraph, Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, Yuan Zhou, SODA 2012
-
Approximation Algorithms and Hardness of the k-Route Cut Problem, Julia Chuzhoy, Yury Makarychev, Aravindan Vijayaraghavan, Yuan Zhou, SODA 2012
-
Tight Inapproximability Bounds for Almost-satisfiable Horn SAT and Exact Hitting Set, Venkatesan Guruswami, Yuan Zhou, SODA 2011