I just tried hooking up local Minimax 2.7 to Opencode on my M3 Ultra. I'm pretty impressed that it can run so many agents churning through work in parallel so quickly! Batching like this feels like it's really making the most of the hardware.
EDIT: more details
llama.cpp, unsloth IQ2_XXS UD
slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.708 (> 0.100 thold), f_keep = 1.000 slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist slot launch_slot_: id 3 | task 2488 | processing task, is_child = 0 slot update_slots: id 3 | task 2488 | new prompt, n_ctx_slot = 196608, n_keep = 0, task.n_tokens = 49213 slot update_slots: id 3 | task 2488 | n_tokens = 34849, memory_seq_rm [34849, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 36897, batch.n_tokens = 2048, progress = 0.749741 slot update_slots: id 3 | task 2488 | n_tokens = 36897, memory_seq_rm [36897, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 38945, batch.n_tokens = 2048, progress = 0.791356 slot update_slots: id 3 | task 2488 | n_tokens = 38945, memory_seq_rm [38945, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 40993, batch.n_tokens = 2048, progress = 0.832971 slot update_slots: id 3 | task 2488 | n_tokens = 40993, memory_seq_rm [40993, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 43041, batch.n_tokens = 2048, progress = 0.874586 slot update_slots: id 3 | task 2488 | n_tokens = 43041, memory_seq_rm [43041, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 45089, batch.n_tokens = 2048, progress = 0.916201 slot update_slots: id 3 | task 2488 | n_tokens = 45089, memory_seq_rm [45089, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 47137, batch.n_tokens = 2048, progress = 0.957816 slot update_slots: id 3 | task 2488 | n_tokens = 47137, memory_seq_rm [47137, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 49185, batch.n_tokens = 2048, progress = 0.999431 slot update_slots: id 3 | task 2488 | n_tokens = 49185, memory_seq_rm [49185, end) reasoning-budget: activated, budget=2147483647 tokens reasoning-budget: deactivated (natural end) slot init_sampler: id 3 | task 2488 | init sampler, took 4.23 ms, tokens: text = 49213, total = 49213 slot update_slots: id 3 | task 2488 | prompt processing done, n_tokens = 49213, batch.n_tokens = 28 srv log_server_r: done request: POST /v1/chat/completions 200 slot print_timing: id 3 | task 2488 | prompt eval time = 72627.76 ms / 14364 tokens ( 5.06 ms per token, 197.78 tokens per second) eval time = 4712.60 ms / 118 tokens ( 39.94 ms per token, 25.04 tokens per second) total time = 77340.36 ms / 14482 tokens slot release: id 3 | task 2488 | stop processing: n_tokens = 49330, truncated = 0 srv update_slots: all slots are idle
submitted by