Bez popisu

justheuristic c36b5b1a9b Add DHT peer validation, add DHT.get_visible_address, add blacklist for unresponsive peers (#137) před 4 roky
.circleci e159605143 Address averaging corner cases, add benchmark_averaging.py, chunk averaged tensors, fix DHTNode get (#134) před 4 roky
.github 9ba811788c add blank issue před 5 roky
docs 0ed7b46bb6 Reuse only successful DHT search results (#130) před 4 roky
hivemind c36b5b1a9b Add DHT peer validation, add DHT.get_visible_address, add blacklist for unresponsive peers (#137) před 4 roky
scripts d6ac1fbd8a Remove duplicate log entries, report aggregate runtime performance and parameter count (#135) před 4 roky
tests c36b5b1a9b Add DHT peer validation, add DHT.get_visible_address, add blacklist for unresponsive peers (#137) před 4 roky
.gitignore c43fabcddb Add .gitignore před 5 roky
.readthedocs.yml c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118) před 4 roky
LICENSE f386fb4d42 Create LICENSE před 5 roky
README.md 46c3b85550 Add references, expand README.md (#117) před 4 roky
requirements-dev.txt c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118) před 4 roky
requirements-docs.txt c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118) před 4 roky
requirements.txt aecff2286d Add anomaly detection to RemoteMixtureOfExperts (#132) před 4 roky
setup.py c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118) před 4 roky

README.md

hivemind: decentralized deep learning in PyTorch

Build status Documentation Status Gitter

Hivemind is a PyTorch library to train large neural networks across the Internet. Imagine training one huge Transformer model on thousands of computers from different universities, companies, and volunteers.

img

Key Features

  • Train neural networks of arbitrary size: parts of their layers are distributed across the participants
  • Run distributed training without master node: Distributed Hash Table allows to connect computers in a decentralized network
  • Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond

To learn more about the idea behind this library and its components, see https://learning-at-home.github.io or read the NeurIPS 2020 paper

Documentation

Contributing

Hivemind is currently at the active development stage, and we welcome all contributions from bug fixes and documentation improvements to entirely new features. If you want to contribute to hivemind, take a look at the issues or join our chat room. The Developer's guide page contains best practices, as well as description of tests and performance benchmarks.

References

You can read the paper that inspired hivemind here:

Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts (Max Ryabinin and Anton Gusev, NeurIPS 2020).

@misc{ryabinin2020crowdsourced,
      title={Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts}, 
      author={Max Ryabinin and Anton Gusev},
      year={2020},
      eprint={2002.04013},
      archivePrefix={arXiv},
      primaryClass={cs.DC}
}

The initial implementation of hivemind used to conduct experiments for the paper is available here: mryab/learning-at-home.

In the docs, we list several related projects and acknowledgements.