Alexander Borzunov
|
056f22515a
Prioritize short inference, unmerge pools for long inference (#458)
|
2 år sedan |
Alexander Borzunov
|
8c546d988a
Test Llama, rebalancing, throughput eval, and all CLI scripts (#452)
|
2 år sedan |
Alexander Borzunov
|
cb3f018f9f
Add LLaMA support (#323)
|
2 år sedan |
Max Ryabinin
|
3e7ae5116d
Remove unused imports and attributes (#324)
|
2 år sedan |
Alexander Borzunov
|
8f6342a861
Refactor RemoteSequenceManager (#309)
|
2 år sedan |
Max Ryabinin
|
793726b041
Speed up loading blocks using init with meta weights (#285)
|
2 år sedan |
justheuristic
|
ae9e71fe8e
Add local tensor-parallel fwd/bwd (#143)
|
2 år sedan |
Alexander Borzunov
|
7bd5916744
Make Petals a pip-installable package (attempt 2) (#102)
|
2 år sedan |
Alexander Borzunov
|
11d6ba683c
Make inference, forward, and backward fully fault-tolerant (#91)
|
2 år sedan |
Pavel Samygin
|
0be21775af
remove transformer block, implement as sequential of size 1 (#54)
|
3 år sedan |
justheuristic
|
d271b75dd4
Let users specify sequence length instead of assuming 2048 (#52)
|
3 år sedan |
justheuristic
|
f0c7383181
Implement RemoteSequential slicing and extra repr, add tests (#30)
|
3 år sedan |
justheuristic
|
e2711a033b
Add automated tests (#23)
|
3 år sedan |
Aleksandr Borzunov
|
b78d713347
refactor, add swarm info
|
3 år sedan |
justheuristic
|
34b8b86673
disclaimer: this part is a work-in-progress
|
3 år sedan |
justheuristic
|
2eb47cbedd
support hosting multiple instances of the same block
|
3 år sedan |
justheuristic
|
2bf83b42e5
add testing guide
|
3 år sedan |