4 years ago · c328698b86
--- a/docs/user/quickstart.md
+++ b/docs/user/quickstart.md
@@ -183,6 +183,7 @@ we show how to use a more advanced version of DecentralizedOptimizer to collabor
 
				 
			
 
				 If you want to learn more about each individual component,
			
 
				 - Learn how to use `hivemind.DHT` using this basic [DHT tutorial](https://learning-at-home.readthedocs.io/en/latest/user/dht.html),
			
 
				+- Read more on how to use `hivemind.Optimizer` in its [documentation page](https://learning-at-home.readthedocs.io/en/latest/modules/optim.html), 
			
 
				 - Learn the underlying math behind hivemind.Optimizer in [Diskin et al., (2021)](https://arxiv.org/abs/2106.10207), 
			
 
				   [Li et al. (2020)](https://arxiv.org/abs/2005.00124) and [Ryabinin et al. (2021)](https://arxiv.org/abs/2103.03239).
			
 
				 - Read about setting up Mixture-of-Experts training in [this guide](https://learning-at-home.readthedocs.io/en/latest/user/moe.html),
			
--- a/hivemind/optim/experimental/optimizer.py
+++ b/hivemind/optim/experimental/optimizer.py
@@ -64,7 +64,7 @@ class Optimizer(torch.optim.Optimizer):
 
				       Like in PyTorch LR Scheduler, **epoch does not necessarily correspond to a full pass over the training data.**
			
 
				       At the end of epoch, peers perform synchronous actions such as averaging gradients for a global optimizer update,
			
 
				       updating the learning rate scheduler or simply averaging parameters (if using local updates).
			
 
				-      The purpose of this is to ensure that changing the number of peers does not reqire changing hyperparameters.
			
 
				+      The purpose of this is to ensure that changing the number of peers does not require changing hyperparameters.
			
 
				       For instance, if the number of peers doubles, they will run all-reduce more frequently to adjust for faster training.
			
 
				 
			
 
				     :Configuration guide: This guide will help you set up your first collaborative training run. It covers the most