3 vuotta sitten · 1eb7522b89
--- a/examples/ppo/README.md
+++ b/examples/ppo/README.md
@@ -1,6 +1,6 @@
 
				 # Training PPO with decentralized averaging
			
 
				 
			
 
				-This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
			
 
				+This tutorial will walk you through the steps to set up collaborative training of an on-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
			
 
				 
			
 
				 ## Preparation
			
 
				 
			
@@ -42,4 +42,4 @@ Logging to logs/bs-256.target_bs-32768.n_envs-8.n_steps-128.n_epochs-1_1
 
				 Jun 20 13:23:23.525 [INFO] ppo_hivemind accumulated 1024 samples for epoch #0 from 1 peers. ETA 52.20 sec (refresh in 1$
			
 
				 .00 sec)
			
 
				 
			
 
				-```
			
 
				+```