AIForce/hivemind @ ee53720d38dcbc5bcc7fb96c53066c7144d8eeda

Geen omschrijving

Max Ryabinin ee53720d38 Hotfix wrong getattr in run_server (#122)		5 jaren geleden
.circleci	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden
.github	9ba811788c add blank issue	5 jaren geleden
docs	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden
hivemind	82c3e51131 Fix DHT listening address, allow starting a server with no experts (#121)	5 jaren geleden
scripts	ee53720d38 Hotfix wrong getattr in run_server (#122)	5 jaren geleden
tests	0595f4af90 Group AllReduce protocol (#119)	5 jaren geleden
.gitignore	c43fabcddb Add .gitignore	5 jaren geleden
.readthedocs.yml	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden
LICENSE	f386fb4d42 Create LICENSE	5 jaren geleden
README.md	46c3b85550 Add references, expand README.md (#117)	5 jaren geleden
requirements-dev.txt	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden
requirements-docs.txt	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden
requirements.txt	2bd481c73f added torch1.7 support, switch to grpc 1.33, grpc bump, improved tests & logging, (#116)	5 jaren geleden
setup.py	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	5 jaren geleden

hivemind: decentralized deep learning in PyTorch

Hivemind is a PyTorch library to train large neural networks across the Internet. Imagine training one huge Transformer model on thousands of computers from different universities, companies, and volunteers.

Key Features

Train neural networks of arbitrary size: parts of their layers are distributed across the participants
Run distributed training without master node: Distributed Hash Table allows to connect computers in a decentralized network
Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond

To learn more about the idea behind this library and its components, see https://learning-at-home.github.io or read the NeurIPS 2020 paper

Documentation

Quickstart tutorial: install hivemind, set up a server and train experts
Documentation & guides: learning-at-home.readthedocs.io

Contributing

Hivemind is currently at the active development stage, and we welcome all contributions from bug fixes and documentation improvements to entirely new features. If you want to contribute to hivemind, take a look at the issues or join our chat room. The Developer's guide page contains best practices, as well as description of tests and performance benchmarks.

References

You can read the paper that inspired hivemind here:

Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts (Max Ryabinin and Anton Gusev, NeurIPS 2020).

@misc{ryabinin2020crowdsourced,
      title={Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts}, 
      author={Max Ryabinin and Anton Gusev},
      year={2020},
      eprint={2002.04013},
      archivePrefix={arXiv},
      primaryClass={cs.DC}
}

The initial implementation of hivemind used to conduct experiments for the paper is available here: mryab/learning-at-home.

In the docs, we list several related projects and acknowledgements.

README.md

hivemind: decentralized deep learning in PyTorch

Key Features

Documentation

Contributing

References