Selaa lähdekoodia

minor update to readme.md

foksly 3 vuotta sitten
vanhempi
commit
1eb7522b89
1 muutettua tiedostoa jossa 2 lisäystä ja 2 poistoa
  1. 2 2
      examples/ppo/README.md

+ 2 - 2
examples/ppo/README.md

@@ -1,6 +1,6 @@
 # Training PPO with decentralized averaging
 
-This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
+This tutorial will walk you through the steps to set up collaborative training of an on-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
 
 ## Preparation
 
@@ -42,4 +42,4 @@ Logging to logs/bs-256.target_bs-32768.n_envs-8.n_steps-128.n_epochs-1_1
 Jun 20 13:23:23.525 [INFO] ppo_hivemind accumulated 1024 samples for epoch #0 from 1 peers. ETA 52.20 sec (refresh in 1$
 .00 sec)
 
-```
+```