justheuristic
|
b0c7b5c30f
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
785b029d48
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
359624fcd1
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
e8ee28a392
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
5cc3cd99c3
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
2146fb6d0e
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
b79d05e037
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
785e115d89
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
077ce58323
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
c005da2089
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
676066baed
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
87b2f8b635
wip: implement grad wrt logits
|
5 лет назад |
justheuristic
|
8931c56f73
move to notes
|
5 лет назад |
justheuristic
|
662357fcb3
reweigh grads correctly
|
5 лет назад |
justheuristic
|
153ab20232
change order of grads
|
5 лет назад |
justheuristic
|
284250d00c
change order of grads
|
5 лет назад |
justheuristic
|
c5ee3d6041
only return grad w.r.t. inputs
|
5 лет назад |
justheuristic
|
05e7c92f3d
unpack tuple
|
5 лет назад |
justheuristic
|
5cbcf79b00
list -> tensor
|
5 лет назад |
justheuristic
|
c8889bde96
list -> tensor
|
5 лет назад |
justheuristic
|
8030c075c9
use lists for gatehr
|
5 лет назад |
justheuristic
|
60af3952c9
flag to remove optimizer
|
5 лет назад |
justheuristic
|
80ab75583f
wip: parallel fault-tolerant moe backward pass
|
5 лет назад |
justheuristic
|
2b2ddf8280
wip: parallel fault-tolerant moe backward pass
|
5 лет назад |
justheuristic
|
6fb99c8746
wip: parallel fault-tolerant moe backward pass
|
5 лет назад |
justheuristic
|
c58d08cc06
remove run_and_await_k completely, rename gating_function to moe
|
5 лет назад |