1/11/2025

fsdp difference between fsdp_config.activation_checkpointing and fsdp_config.activation_checkpointing_reentrant

explain the key differences between these two FSDP (Fully Sharded Data Parallel) configuration parameters:

`fsdp_config.activation_checkpointing`:

- This is the main switch that enables/disables activation checkpointing

- When set to `true`, it saves memory by discarding intermediate activations during the forward pass and recomputing them during the backward pass

- In your command, it's set to `false`, meaning no activation checkpointing will be performed


`fsdp_config.activation_checkpointing_reentrant`:

- This is a more specific setting that controls HOW activation checkpointing is implemented

- When set to `true` (as in your command), it uses a reentrant approach which is more memory efficient but potentially slower

- Reentrant implementation makes nested activation checkpointing possible and handles complex model architectures better

- This setting only has an effect if `activation_checkpointing` is enabled


In your specific case, since `activation_checkpointing=false`, the `activation_checkpointing_reentrant=true` setting won't have any actual effect on the training process.


A typical memory-optimized configuration would be:

```yaml

fsdp_config:

  activation_checkpointing: true

  activation_checkpointing_reentrant: true

```


This would give you maximum memory efficiency at the cost of some computation overhead. However, your configuration seems to be optimized for speed rather than memory usage, which makes sense for a performance-focused training setup (as suggested by your YAML filename containing "performance").

No comments:

Post a Comment