|
@@ -85,10 +85,10 @@ This is important because it's technically possible for peers serving model laye
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
-Here's how to install the dependencies with conda:
|
|
|
+Here's how to install Petals with conda:
|
|
|
```
|
|
|
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
|
|
|
-pip install -r requirements.txt
|
|
|
+pip install git+https://github.com/bigscience-workshop/petals
|
|
|
```
|
|
|
|
|
|
This script uses Anaconda to install cuda-enabled PyTorch.
|
|
@@ -107,7 +107,7 @@ For a detailed instruction with larger models, see ["Launch your own swarm"](htt
|
|
|
|
|
|
First, run a couple of servers, each in a separate shell. To launch your first server, run:
|
|
|
```bash
|
|
|
-python -m cli.run_server bloom-testing/test-bloomd-560m-main --num_blocks 8 --torch_dtype float32 \
|
|
|
+python -m petals.cli.run_server bloom-testing/test-bloomd-560m-main --num_blocks 8 --torch_dtype float32 \
|
|
|
--host_maddrs /ip4/127.0.0.1/tcp/31337 # use port 31337, local connections only
|
|
|
```
|
|
|
|
|
@@ -124,7 +124,7 @@ Mon Day 01:23:45.678 [INFO] Running DHT node on ['/ip4/127.0.0.1/tcp/31337/p2p/A
|
|
|
You can use this address (`/ip4/whatever/else`) to connect additional servers. Open another terminal and run:
|
|
|
|
|
|
```bash
|
|
|
-python -m cli.run_server bloom-testing/test-bloomd-560m-main --num_blocks 8 --torch_dtype float32 \
|
|
|
+python -m petals.cli.run_server bloom-testing/test-bloomd-560m-main --num_blocks 8 --torch_dtype float32 \
|
|
|
--host_maddrs /ip4/127.0.0.1/tcp/0 \
|
|
|
--initial_peers /ip4/127.0... # <-- TODO: Copy the address of another server here
|
|
|
# e.g. --initial_peers /ip4/127.0.0.1/tcp/31337/p2p/QmS1GecIfYouAreReadingThisYouNeedToCopyYourServerAddressCBBq
|
|
@@ -140,11 +140,10 @@ Once your have enough servers, you can use them to train and/or inference the mo
|
|
|
```python
|
|
|
import torch
|
|
|
import torch.nn.functional as F
|
|
|
-import transformers
|
|
|
-from src import DistributedBloomForCausalLM
|
|
|
+from petals.client import BloomTokenizerFast, DistributedBloomForCausalLM
|
|
|
|
|
|
initial_peers = [TODO_put_one_or_more_server_addresses_here] # e.g. ["/ip4/127.0.0.1/tcp/more/stuff/here"]
|
|
|
-tokenizer = transformers.BloomTokenizerFast.from_pretrained("bloom-testing/test-bloomd-560m-main")
|
|
|
+tokenizer = BloomTokenizerFast.from_pretrained("bloom-testing/test-bloomd-560m-main")
|
|
|
model = DistributedBloomForCausalLM.from_pretrained(
|
|
|
"bloom-testing/test-bloomd-560m-main", initial_peers=initial_peers, low_cpu_mem_usage=True, torch_dtype=torch.float32
|
|
|
) # this model has only embeddings / logits, all transformer blocks rely on remote servers
|
|
@@ -177,14 +176,14 @@ To run minimalistic tests, spin up some servers:
|
|
|
```bash
|
|
|
export MODEL_NAME=bloom-testing/test-bloomd-560m-main
|
|
|
export INITIAL_PEERS=/ip4/127.0.0.1/tcp/31337/p2p/QmS9KwZptnVdB9FFV7uGgaTq4sEKBwcYeKZDfSpyKDUd1g
|
|
|
-python -m cli.run_server $MODEL_NAME --block_indices 0:12 --throughput 1 --torch_dtype float32 \
|
|
|
+python -m petals.cli.run_server $MODEL_NAME --block_indices 0:12 --throughput 1 --torch_dtype float32 \
|
|
|
--identity tests/test.id --host_maddrs /ip4/127.0.0.1/tcp/31337 &> server1.log &
|
|
|
sleep 5 # wait for the first server to initialize DHT
|
|
|
-python -m cli.run_server $MODEL_NAME --block_indices 12:24 --throughput 1 --torch_dtype float32 \
|
|
|
+python -m petals.cli.run_server $MODEL_NAME --block_indices 12:24 --throughput 1 --torch_dtype float32 \
|
|
|
--initial_peers /ip4/127.0.0.1/tcp/31337/p2p/QmS9KwZptnVdB9FFV7uGgaTq4sEKBwcYeKZDfSpyKDUd1g &> server2.log &
|
|
|
|
|
|
tail -f server1.log server2.log # view logs for both servers
|
|
|
-# after you're done, kill servers with 'pkill -f cli.run_server'
|
|
|
+# after you're done, kill servers with 'pkill -f petals.cli.run_server'
|
|
|
```
|
|
|
|
|
|
Then launch pytest:
|