소스 검색

Mention 1 sec/token explicitly

Alexander Borzunov 2 년 전
부모
커밋
955eae30b3
1개의 변경된 파일1개의 추가작업 그리고 1개의 파일을 삭제
  1. 1 1
      README.md

+ 1 - 1
README.md

@@ -51,7 +51,7 @@ Check out more tutorials:
 
 - **Petals** runs inference or fine-tunes large language models like [BLOOM-176B](https://huggingface.co/bigscience/bloom) by joining compute resources with people all over the Internet.
 - One participant with weak GPU can load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.
-- This way, one inference step takes ≈ 1 sec — 10x faster than possible with offloading. Enough for chatbots and other interactive apps.
+- This way, inference takes ≈ 1 sec/token — 10x faster than possible with offloading. Enough for chatbots and other interactive apps.
 - Beyond classic language model APIs — you can employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states. This combines the comforts of an API with the flexibility of PyTorch.
 
 <p align="center">