Aleksandr Borzunov 5578378202 Lower --max_batch_size and --inference_max_length defaults to 2048 2 سال پیش
..
__init__.py 05faa0b3c8 add quantization script for cpu 3 سال پیش
config.json a798ea04a6 add minimalistic benchmarks 3 سال پیش
convert_model.py a2634001e9 Reduce vocabulary size in test model, fix bug in routing when overlapped (#45) 3 سال پیش
deploy_server.sh 11a424837f integrate mixed-8bit model (#39) 3 سال پیش
inference_one_block.py 4695071ad2 WIP: make DistributedBloom compliant with HF interface 3 سال پیش
local_server_config_example.cfg f60a7dd183 deploy swarm on local & remote machines 3 سال پیش
remote_server_config_example.cfg f60a7dd183 deploy swarm on local & remote machines 3 سال پیش
run_local_servers.sh 11a424837f integrate mixed-8bit model (#39) 3 سال پیش
run_remote_servers.sh 6573076883 Sequential and parallel forward / backward (#36) 3 سال پیش
run_server.py 5578378202 Lower --max_batch_size and --inference_max_length defaults to 2048 2 سال پیش
speed_test.py e2711a033b Add automated tests (#23) 3 سال پیش