Algorithms for Solving Bilevel Optimization Problems
Implicit Function Theorem (in Stackelberg Games)
Introduction
- The theorem plays a pivotal role in defining solutions for the Stackelberg game.
- Provides intuition for finding the strict local Stackelberg equilibrium using gradient descent.
Key Insights
- The follower
y
responds optimally to the leader’s decision, making y
in the upper-level problem a function of x
.
- The objective function in the upper level can be expressed as
g(x)
, independent of y
.
- Using the theorem, the derivative of
g
w.r.t. x
can be expressed in terms of derivatives of f
w.r.t. x
and y
.
- The theorem’s application allows the leader to update its state using the derivative of
g
derived from the theorem, while the follower updates with gradient descent.
Research Findings
- In zero-sum games with deterministic updates, the dynamics converge only to Stackelberg equilibria with a local convergence rate.
- For general-sum games, convergence to local Stackelberg equilibrium isn’t guaranteed.
- The theorem’s direct implementation involves the leader updating its state using the derivative of
g
derived from the theorem.
References
- “Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study”
- “On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach”
- “Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower”
Penalty Based Method
Overview of the Penalty-Based Method
- Definition:
- A method that transforms a bilevel optimization problem into a single-objective minimization problem.
- Details of the Method:
- Initial Approach:
- Explained based on the Stackelberg game.
- Transformation:
- First, it’s converted into a constrained optimization problem. Then, using the Lagrange multiplier, it’s transformed into an unconstrained problem.
- Optimality Condition:
- The function
h
describes the violation of the optimality condition. f2*
represents the optimal f2
value.
- Updates:
- The values of
x
and y
are updated by a significant delta (Δ). The choice of Δ should not deviate significantly from the gradient of f1
.
- Advantages:
- Stabilizes the learning dynamics in games.
- Allows the application of adaptive optimization methods.
- Disadvantages:
- Inefficient: Requires many gradient descent steps for each update.
- In cases with poor loss landscapes, the optimal
f2*
found in each iteration can vary significantly.
Notations:
f1
: The primary objective function in the bilevel optimization problem.
f2
: The secondary objective function, which is nested within the primary f1
function and is dependent on the decisions made in f1
.
Two Time Scale Update Rule (TTUR)
Application
Hyperparameter Optimization
Model-based Reinforcement Learning
GANs
Latent Diffusion Models
IRM
Misc