Minimax 2.7 running sub-agents locally

Reddit r/LocalLLaMA / 4/13/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Minimax 2.7 is reported to run “sub-agents” locally and to execute many parallel agent tasks quickly on an Apple M3 Ultra.
The user connects Minimax 2.7 to Opencode and observes that batching helps maximize available hardware performance.
Logs from llama.cpp show slot selection based on LCP similarity, task slot launching, and large context windows (up to ~196k tokens) with iterative prompt processing progress updates.
The configuration references a quantized model (llama.cpp, unsloth IQ2_XXS UD) and a sampler chain with typical sampling controls (top-k/top-p/min-p/typical, etc.).

I just tried hooking up local Minimax 2.7 to Opencode on my M3 Ultra. I'm pretty impressed that it can run so many agents churning through work in parallel so quickly! Batching like this feels like it's really making the most of the hardware.

EDIT: more details

llama.cpp, unsloth IQ2_XXS UD

slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.708 (> 0.100 thold), f_keep = 1.000 slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist slot launch_slot_: id 3 | task 2488 | processing task, is_child = 0 slot update_slots: id 3 | task 2488 | new prompt, n_ctx_slot = 196608, n_keep = 0, task.n_tokens = 49213 slot update_slots: id 3 | task 2488 | n_tokens = 34849, memory_seq_rm [34849, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 36897, batch.n_tokens = 2048, progress = 0.749741 slot update_slots: id 3 | task 2488 | n_tokens = 36897, memory_seq_rm [36897, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 38945, batch.n_tokens = 2048, progress = 0.791356 slot update_slots: id 3 | task 2488 | n_tokens = 38945, memory_seq_rm [38945, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 40993, batch.n_tokens = 2048, progress = 0.832971 slot update_slots: id 3 | task 2488 | n_tokens = 40993, memory_seq_rm [40993, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 43041, batch.n_tokens = 2048, progress = 0.874586 slot update_slots: id 3 | task 2488 | n_tokens = 43041, memory_seq_rm [43041, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 45089, batch.n_tokens = 2048, progress = 0.916201 slot update_slots: id 3 | task 2488 | n_tokens = 45089, memory_seq_rm [45089, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 47137, batch.n_tokens = 2048, progress = 0.957816 slot update_slots: id 3 | task 2488 | n_tokens = 47137, memory_seq_rm [47137, end) slot update_slots: id 3 | task 2488 | prompt processing progress, n_tokens = 49185, batch.n_tokens = 2048, progress = 0.999431 slot update_slots: id 3 | task 2488 | n_tokens = 49185, memory_seq_rm [49185, end) reasoning-budget: activated, budget=2147483647 tokens reasoning-budget: deactivated (natural end) slot init_sampler: id 3 | task 2488 | init sampler, took 4.23 ms, tokens: text = 49213, total = 49213 slot update_slots: id 3 | task 2488 | prompt processing done, n_tokens = 49213, batch.n_tokens = 28 srv log_server_r: done request: POST /v1/chat/completions 200 slot print_timing: id 3 | task 2488 | prompt eval time = 72627.76 ms / 14364 tokens ( 5.06 ms per token, 197.78 tokens per second) eval time = 4712.60 ms / 118 tokens ( 39.94 ms per token, 25.04 tokens per second) total time = 77340.36 ms / 14482 tokens slot release: id 3 | task 2488 | stop processing: n_tokens = 49330, truncated = 0 srv update_slots: all slots are idle

submitted by /u/-dysangel-
[link] [comments]