AI Navigate

Last Week in Multimodal AI - Local Edition

Reddit r/LocalLLaMA / 3/18/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The post is a local/open-source roundup of recent multimodal AI tools and models from last week, highlighting several projects and where to find their resources.
FlashMotion claims a 50x speedup over state-of-the-art methods for controllable video generation on Wan2.2-TI2V with multi-object box/mask guidance, and provides weights.
Foundation 1 presents a text-to-sample music production model that runs on 7 GB VRAM, with links to a post and weights for access.
GlyphPrinter offers glyph-accurate multilingual text rendering for image generation, handling complex Chinese characters with open weights.
The roundup also notes MatAnyone 2 for video object matting (open code and demo) and ViFeEdit for editing video from image pairs (no video training needed), both with code and demos.

Last Week in Multimodal AI - Local Edition

I curate a weekly multimodal AI roundup, here are the local/open-source highlights from last week:

FlashMotion - Controllable Video Generation

Few-step video gen on Wan2.2-TI2V with multi-object box/mask guidance.
50x speedup over SOTA. Weights available.
Project | Weights

https://reddit.com/link/1rwuxs1/video/d9qi6xl0mqpg1/player

Foundation 1 - Music Production Model

Text-to-sample model built for music workflows. Runs on 7 GB VRAM.
Post | Weights

https://reddit.com/link/1rwuxs1/video/y6wtywk1mqpg1/player

GlyphPrinter - Accurate Text Rendering for Image Gen

Glyph-accurate multilingual text rendering for text-to-image models.
Handles complex Chinese characters. Open weights.
Project | Code | Weights

https://preview.redd.it/2i60hgm2mqpg1.png?width=1456&format=png&auto=webp&s=f82a1729c13b45849c60155620e0782bcd5bafe6

MatAnyone 2 - Video Object Matting

Cuts out moving objects from video with a self-evaluating quality loop.
Open code and demo.
Demo | Code

https://reddit.com/link/1rwuxs1/video/4uzxhij3mqpg1/player

ViFeEdit - Video Editing from Image Pairs

Edits video using only 2D image pairs. No video training needed. Built on Wan2.1/2.2 + LoRA.
Code

https://reddit.com/link/1rwuxs1/video/yajih834mqpg1/player

Anima Preview 2

Latest preview of the Anima diffusion models.
Weights

https://preview.redd.it/ilenx525mqpg1.png?width=1456&format=png&auto=webp&s=b9f883365c8964cea17883447cce3e420a53231b

LTX-2.3 Colorizer LoRA

Colorizes B&W footage via IC-LoRA with prompt-based control.
Weights

https://preview.redd.it/jw2t6966mqpg1.png?width=1456&format=png&auto=webp&s=d4b0dc1f2541c09659e34b2e07407bbd70fc960d

Honorable mention:

MJ1 - 3B Multimodal Judge (code not yet available but impressive results for 3B active)

RL-trained multimodal judge with just 3B active parameters.
Outperforms Gemini-3-Pro on Multimodal RewardBench 2 (77.0% accuracy).
Paper

MJ1 grounded verification chain.

Checkout the full newsletter for more demos, papers, and resources.

submitted by /u/Vast_Yak_4147
[link] [comments]

Related Articles

What 81,000 people want from AI

What 81,000 people want from AI

Anthropic News

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

日経XTECH

生成AIで盛り上がる「推論専用チップ」、著名科学者が示す進化の行方

生成AIで盛り上がる「推論専用チップ」、著名科学者が示す進化の行方

日経XTECH

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

日経XTECH

関連おすすめサービス

※当サイトはアフィリエイト広告を利用しています

Notta搭載AI議事録イヤホン ZENCHORD1

AI時代の仕事術。Notta搭載で会議の議事録を自動生成するスマートイヤホン。

AI搭載ボイスレコーダー Plaud

世界100万人が愛用。AIで文字起こし・要約を自動化するボイスレコーダー。

画像高画質化AIツール Aiarty Image Enhancer

AIで画像を高画質化。写真・イラストを簡単にアップスケール。