AIForce/hivemind @ 0.9.9

Sin descripción

82 Ramas

Michael Diskin 86f3c0dd0d Support auxiliary peers in CollaborativeOptimizer (#279)		hace 4 años
.circleci	86f3c0dd0d Support auxiliary peers in CollaborativeOptimizer (#279)	hace 4 años
.github	dfbc401196 Add Dockerfile, refactor tests (#245)	hace 4 años
benchmarks	42b9b6cef8 Use logging in benchmarks, fix libp2p-related issues (#280)	hace 4 años
docs	f0c5627139 Improve error handling, remove deprecated functionality (#261)	hace 4 años
examples	86f3c0dd0d Support auxiliary peers in CollaborativeOptimizer (#279)	hace 4 años
hivemind	86f3c0dd0d Support auxiliary peers in CollaborativeOptimizer (#279)	hace 4 años
tests	42b9b6cef8 Use logging in benchmarks, fix libp2p-related issues (#280)	hace 4 años
.gitignore	aea7a387b5 Add initial support for connecting via libp2p (#238)	hace 4 años
.readthedocs.yml	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	hace 4 años
CONTRIBUTING.md	dfbc401196 Add Dockerfile, refactor tests (#245)	hace 4 años
Dockerfile	dfbc401196 Add Dockerfile, refactor tests (#245)	hace 4 años
LICENSE	f386fb4d42 Create LICENSE	hace 5 años
README.md	5ba9b29f28 Add BibTeX reference for the library to README (#283)	hace 4 años
codecov.yml	ef2c6abc41 Fix Codecov (#282)	hace 4 años
requirements-dev.txt	fccc591944 Measure testing coverage on pull request (#271)	hace 4 años
requirements-docs.txt	c450a43fd0 Fix flaky test_remote_module_call, extract requirements for docs/tests (#118)	hace 4 años
requirements.txt	b6fbae478c Remove use of packaging module (#284)	hace 4 años
setup.py	b6fbae478c Remove use of packaging module (#284)	hace 4 años

Hivemind: decentralized deep learning in PyTorch

Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage is training a single Transformer model on hundreds of computers from different universities, companies, and volunteers.

Key Features

Train neural networks of arbitrary size: parts of their layers are distributed across the participants.
Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralized network.
Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond.
Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to synchronize across the entire network.

To learn more about the ideas behind this library, see https://learning-at-home.github.io or read the NeurIPS 2020 paper.

Installation

Before installing hivemind, make sure that your environment has Python 3.7+ and PyTorch with a version at least as new as 1.6.0.

To start using this library, you can either use the pip package manager or build it from source. Since currently the release cycle is not established yet, we recommend installing hivemind from source to keep up with the latest bugfixes and improvements.

With pip

If your versions of Python and PyTorch match the requirements, you can install hivemind from pip:

pip install hivemind

From source

To install hivemind from source, simply clone the repository and install

git clone https://github.com/learning-at-home/hivemind.git
cd hivemind
pip install .

If you would like to verify that your installation is working properly, you can install with pip install -e .[dev] instead. Then, you can run the tests with pytest tests/.

Documentation

Quickstart: install hivemind, set up a server and train experts
Documentation & guides are available at learning-at-home.readthedocs.io

Contributing

Hivemind is currently at the active development stage, and we welcome all contributions. Everything, from bug fixes and documentation improvements to entirely new features, is equally appreciated.

If you want to contribute to hivemind but don't know where to start, take a look at the unresolved issues. Open a new issue or join our chat room in case you want to discuss new functionality or report a possible bug. Bug fixes are always welcome, but new features should be preferably discussed with maintainers beforehand.

If you want to start contributing to the source code of hivemind, please see the contributing guidelines first. To learn more about other ways to contribute, read our guide.

Citation

If you found hivemind or its underlying algorithms useful for your experiments, please cite the following source:

@misc{hivemind,
  author = {Learning@home team},
  title = {{H}ivemind: a {L}ibrary for {D}ecentralized {D}eep {L}earning},
  year = 2020,
  howpublished = {\url{https://github.com/learning-at-home/hivemind}},
}

Also, you can cite the paper that inspired the creation of this library:

@inproceedings{ryabinin2020crowdsourced,
 author = {Ryabinin, Max and Gusev, Anton},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {3659--3672},
 publisher = {Curran Associates, Inc.},
 title = {Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts},
 url = {https://proceedings.neurips.cc/paper/2020/file/25ddc0f8c9d3e22e03d3076f98d83cb2-Paper.pdf},
 volume = {33},
 year = {2020}
}

The initial implementation of hivemind used for the paper is available at mryab/learning-at-home.

In the documentation, we list several related projects and acknowledgements.

README.md