🚀 Deep Reinforcement Learning Explainability

Exploring AI decision-making through Integrated Gradients in RL environments

📖 How This Works

This application demonstrates the application of Integrated Gradients to Deep Reinforcement Learning scenarios. We use PyTorch's Captum library for interpretability and Gymnasium for the continuous Lunar Lander environment.

🧠 Training Algorithm: DDPG

The agent is trained using Deep Deterministic Policy Gradients and achieves an average reward of 260.8 per episode (successful landings).

🎯 How to Use This Space

Select Environment: Choose the Lunar Lander environment
Choose Baseline: Select between zero tensor or running average baseline
Generate Attributions: Click "ATTRIBUTE" and wait ~20-25 seconds
Explore Results: Use the slider to examine attributions at different timesteps

The attributions are normalized using Softmax to provide interpretable probability distributions.

🌙 Environment Setup

Initialize Environment

Click to initialize the training environment

Environment Specification

Select the RL environment to analyze

Environment Spec

⚙️ Attribution Configuration

Baseline Method

Choose the baseline for Integrated Gradients

Number of Baseline Iterations

Number of baseline inputs to collect for averaging

0 100

📊 Results Visualization

Real-time Attribution Analysis

🎬 Key Frame Selector

Navigate through different timesteps to see attributions

0 1000

🎮 Environment State

📈 Feature Attributions

🛠️ Local Usage & Installation

Required Packages

pip install torch gymnasium 'gymnasium[box2d]'

Box2D Installation (macOS)

brew install swig
pip install box2d

🎯 Lunar Lander Environment Details

Reward Structure

Position: Increased/decreased based on distance to landing pad
Velocity: Increased/decreased based on speed (slower is better)
Angle: Decreased when lander is tilted (horizontal is ideal)
Landing: +10 points for each leg touching ground
Fuel: -0.03 points per frame for side engine, -0.3 for main engine
Episode End: -100 for crash, +100 for safe landing

Success Threshold: 200+ points per episode

Training Functions

load_trained(): Loads pre-trained model (1000 episodes)
train(): Trains from scratch
Set render_mode=False for faster training

Built with ❤️ using Gradio, PyTorch, and Captum