AIForce/hivemind @ 4a33e155b645c42287124a9879ff972a10c25fbe

Sin descripción

82 Ramas

justheuristic 4a33e155b6 remove run_and_await_k completely		hace 5 años
.circleci	0ce0dd769f add codecov	hace 5 años
docs	5016002186 remove dependency on run_and_await_k, rename GatingFunction to RemoteMixtureOfExperts	hace 5 años
scripts	e71bb5428f change print time for network	hace 5 años
tesseract	4a33e155b6 remove run_and_await_k completely	hace 5 años
tests	3ceb24d07d separate dht script	hace 5 años
.gitignore	c43fabcddb Add .gitignore	hace 5 años
CONTRIBUTING.md	18bca6731e explicit standards	hace 5 años
LICENSE	f386fb4d42 Create LICENSE	hace 5 años
README.md	5016002186 remove dependency on run_and_await_k, rename GatingFunction to RemoteMixtureOfExperts	hace 5 años
requirements.txt	6a4a7e8831 Change deps - testing	hace 5 años
setup.py	37eb6dde36 infer version	hace 5 años

Tesseract

Distributed training of large neural networks across volunteer computers.

[WIP] - this branch is a work in progress. If you're interested in supplementary code for Learning@home paper, you can find it at https://github.com/mryab/learning-at-home.

What do I need to run it?

One or several computers, each equipped with at least one GPU
Each computer should have at least two open ports (if not, consider ssh port forwarding)
Some popular Linux x64 distribution
- Tested on Ubuntu16.04, should work fine on any popular linux64 and even MacOS;
- Running on Windows natively is not supported, please use vm or docker;

How do I run it?

Currently, there is no way to do it easily. There are some tests (you can check ./tests/benchmark_throughput.py or look into CI logs) and we want to expand them. If you want to do something complex with it, please contact us by opening an issue (less preferred: telegram).

`tesseract` quick tour

Trainer process:

RemoteExpert(tesseract/client/remote_expert.py) behaves like a pytorch module with autograd support but actually sends request to a remote runtime.
RemoteMixtureOfExperts(tesseract/client/remote_moe.py) finds best experts for a given input and either returns them as RemoteExpert or applies them right away.

Runtime process:

TesseractRuntime (tesseract/runtime/__init__.py) aggregates batches and performs inference/training of experts according to their priority.
TesseractServer (tesseract/server/__init__.py) wraps runtime and periodically uploads experts into TesseractNetwork.

DHT:

TesseractNetwork(tesseract/network/__init__.py) is a node of Kademlia-based DHT that stores metadata used by trainer and runtime.

Limitations

DHT:

DHT functionality is severely limited by its inability to traverse NAT.
Because of this all the features that require DHT are in deep pre-alpha state and cannot be used without special setup.

Runtime:

You can achieve 4x less network load by passing quantized uint8 activations across experts. Implement your own quantization or wait for tesseract v0.8.
Currently runtime can form batches that exceed maximal batch_size by task_size - 1. We will fix that in the nearest patch.

README.md

Tesseract

What do I need to run it?

How do I run it?

tesseract quick tour

Limitations

`tesseract` quick tour