justheuristic
|
d703c8d4c5
background_server is now a contextmanager
|
5 éve |
justheuristic
|
9a8320c106
pep8
|
5 éve |
justheuristic
|
aa0743c587
pep8
|
5 éve |
justheuristic
|
6605b00d05
safer shutdown order
|
5 éve |
justheuristic
|
f9798a474a
unified prefix scheme
|
5 éve |
justheuristic
|
cbf1c42df1
unified prefix scheme
|
5 éve |
justheuristic
|
dfa9dfaae2
move to notes
|
5 éve |
justheuristic
|
8931c56f73
move to notes
|
5 éve |
justheuristic
|
b20f3ee985
grad logits wrt actual logits
|
5 éve |
justheuristic
|
be3119b12e
add basic moe correctness test
|
5 éve |
justheuristic
|
662357fcb3
reweigh grads correctly
|
5 éve |
justheuristic
|
153ab20232
change order of grads
|
5 éve |
justheuristic
|
284250d00c
change order of grads
|
5 éve |
justheuristic
|
c5ee3d6041
only return grad w.r.t. inputs
|
5 éve |
justheuristic
|
05e7c92f3d
unpack tuple
|
5 éve |
justheuristic
|
5cbcf79b00
list -> tensor
|
5 éve |
justheuristic
|
c8889bde96
list -> tensor
|
5 éve |
justheuristic
|
8030c075c9
use lists for gatehr
|
5 éve |
justheuristic
|
49e4459ec8
do not .detach non-tensor parameters
|
5 éve |
justheuristic
|
97c4003e5c
enumerate
|
5 éve |
justheuristic
|
60af3952c9
flag to remove optimizer
|
5 éve |
justheuristic
|
9a4e306f39
flag to remove optimizer
|
5 éve |
justheuristic
|
80ab75583f
wip: parallel fault-tolerant moe backward pass
|
5 éve |
justheuristic
|
2b2ddf8280
wip: parallel fault-tolerant moe backward pass
|
5 éve |
justheuristic
|
6fb99c8746
wip: parallel fault-tolerant moe backward pass
|
5 éve |
justheuristic
|
ebe07eebfd
typo
|
5 éve |
justheuristic
|
88d1bdc025
unused imports
|
5 éve |
justheuristic
|
c58d08cc06
remove run_and_await_k completely, rename gating_function to moe
|
5 éve |
justheuristic
|
4a33e155b6
remove run_and_await_k completely
|
5 éve |
justheuristic
|
5016002186
remove dependency on run_and_await_k, rename GatingFunction to RemoteMixtureOfExperts
|
5 éve |