|
3 ani în urmă | |
---|---|---|
.. | ||
README.md | 3 ani în urmă | |
ppo.py | 3 ani în urmă | |
requirements.txt | 3 ani în urmă |
This tutorial will walk you through the steps to set up collaborative training of an on-policy reinforcement learning algorighm PPO to play Atari Breakout. It uses stable-baselines3 implementation of PPO, hyperparameters for the algorithm are taken from rl-baselines3-zoo, collaborative training is built on hivemind.Optimizer
to exchange information between peers.
pip install git+https://github.com/learning-at-home/hivemind.git
pip install -r requirements.txt
Run the first DHT peer to welcome trainers and record training statistics (e.g., loss and performance):
Run python3 ppo.py
$ python3 ppo.py
To connect other peers to this one, use --initial_peers /ip4/127.0.0.1/tcp/41926/p2p/QmUmiebP4BxdEPEpQb28cqyhaheDugFRn7M
CoLJr556xYt
A.L.E: Arcade Learning Environment (version 0.7.4+069f8bd)
[Powered by Stella]
Using cuda device
Wrapping the env in a VecTransposeImage.
[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
Jun 20 13:23:20.515 [INFO] Found no active peers: None
Jun 20 13:23:20.533 [INFO] Initializing optimizer manually since it has no tensors in state dict. To override this, prov
ide initialize_optimizer=False
Logging to logs/bs-256.target_bs-32768.n_envs-8.n_steps-128.n_epochs-1_1
---------------------------------
| rollout/ | |
| ep_len_mean | 521 |
| ep_rew_mean | 0 |
| time/ | |
| fps | 582 |
| iterations | 1 |
| time_elapsed | 1 |
| total_timesteps | 1024 |
| train/ | |
| timesteps | 1024 |
---------------------------------
Jun 20 13:23:23.525 [INFO] ppo_hivemind accumulated 1024 samples for epoch #0 from 1 peers. ETA 52.20 sec (refresh in 1$
.00 sec)