Algorithms for Solving Bilevel Optimization Problems
Implicit Function Theorem (in Stackelberg Games)
Introduction
- The theorem plays a pivotal role in defining solutions for the Stackelberg game.
- Provides intuition for finding the strict local Stackelberg equilibrium using gradient descent.
Key Insights
- The follower
y responds optimally to the leader’s decision, making y in the upper-level problem a function of x.
- The objective function in the upper level can be expressed as
g(x), independent of y.
- Using the theorem, the derivative of
g w.r.t. x can be expressed in terms of derivatives of f w.r.t. x and y.
- The theorem’s application allows the leader to update its state using the derivative of
g derived from the theorem, while the follower updates with gradient descent.
Research Findings
- In zero-sum games with deterministic updates, the dynamics converge only to Stackelberg equilibria with a local convergence rate.
- For general-sum games, convergence to local Stackelberg equilibrium isn’t guaranteed.
- The theorem’s direct implementation involves the leader updating its state using the derivative of
g derived from the theorem.
References
- “Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study”
- “On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach”
- “Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower”
Penalty Based Method
Overview of the Penalty-Based Method
- Definition:
- A method that transforms a bilevel optimization problem into a single-objective minimization problem.
- Details of the Method:
- Initial Approach:
- Explained based on the Stackelberg game.
- Transformation:
- First, it’s converted into a constrained optimization problem. Then, using the Lagrange multiplier, it’s transformed into an unconstrained problem.
- Optimality Condition:
- The function
h describes the violation of the optimality condition. f2* represents the optimal f2 value.
- Updates:
- The values of
x and y are updated by a significant delta (Δ). The choice of Δ should not deviate significantly from the gradient of f1.
- Advantages:
- Stabilizes the learning dynamics in games.
- Allows the application of adaptive optimization methods.
- Disadvantages:
- Inefficient: Requires many gradient descent steps for each update.
- In cases with poor loss landscapes, the optimal
f2* found in each iteration can vary significantly.
Notations:
f1: The primary objective function in the bilevel optimization problem.
f2: The secondary objective function, which is nested within the primary f1 function and is dependent on the decisions made in f1.
Two Time Scale Update Rule (TTUR)
Application
Hyperparameter Optimization
Model-based Reinforcement Learning
GANs
Latent Diffusion Models
IRM
Misc