HappyHorse maybe will be open weights soon (it beat seedance 2.0 on Artificial Analysis!)

Reddit r/LocalLLaMA / 4/8/2026

💬 OpinionSignals & Early TrendsIndustry & Market MovesModels & Research

共有:

Key Points

HappyHorse is described as a multimodal, open-source unified model for text-to-video, image-to-video, and audio, developed by the TTG Future Life Lab under Taobao/Tmall Group (TTG).
The post claims the model uses a “single transformer” approach with CFG-less (classifier-free guidance-less) inference and is reported to use 8 inference steps.
Reported generation specs include 1280×720 (720p) resolution, 24fps, and 5-second clips, with audio generation supporting sound effects, ambient sound, and voiceover across multiple languages.
The article suggests the team may release HappyHorse 1.0 on the 10th and possibly publish multiple model variants, after intensive testing and earlier leaked information.
It states HappyHorse is planned to be fully open source, including the base model, distilled model, super-resolution components, and inference code.

HappyHorse maybe will be open weights soon (it beat seedance 2.0 on Artificial Analysis!)

The multimodal large model HappyHorse (an open-source unified large model for text-to-video/image-to-video + audio)has recently been making waves on the international stage. After verification from multiple sources, the team behind it has been revealed: they are from the Tobao and Tmall Group (TTG) Future Life Labled by ang Di(The lab was created by the ATH-AI Innovation Business Department and has since become an independent entity).

ofile of Zhang Di: He holds both a Bachelor's and Master's degree from Shanghai Jiao Tong University. He is the head of the TTG Future Life Lab (Rank: P11) and reports to Zheng Bo, Chief Scientist of TTG and CTO of Alimama. He previously served as the lead (No. 1 position) for Kuaishou’s ing.d prior to that, he was the head of Big Data and Machine Learning Engineering Architecture at Alimama.

P.S.

It is rumored that HappyHorse 1.0 will be officially released on the 10th of this month. (It has been undergoing intensive testing recently; in fact, information was leaked back in March, but Alibaba PR immediately deleted the relevant sources). Word is that the team will also release several different types of models, so stay tuned.
Alimama is the algorithm platform within the Taobao and Tmall ecosystem and has produced many renowned algorithm experts (this is also the birthplace of the Wan model). After honing his skills at Kuaishou’s Kling, Zhang Di’s return is described as "a fish back in water." He is reportedly extremely excited lately. The team at Xixi District C works late every night and is even happily putting in overtime on Saturdays.

[Basic Information]

Model Type: Open-source unified model for Text-to-Video / Image-to-Video + Audio.
Inference Paradigm: Single Transformer Transfusion, CFG-less (Classifier-Free Guidance-less).
Inference Steps: 8 steps.

[Video Parameters]

Resolution: 1280×720 (720p)

Frame Rate: 24fps

Duration: 5 seconds

[Audio Capabilities]

Native Synchronous Generation: Sound effects / Ambient sound / Voiceover

Supported Languages: Chinese, English, Japanese, Korean, German, French

[Open Source Status]

Fully Open Source: Base model + Distilled model + Super-resolution + Inference code

Source: https://mp.weixin.qq.com/s/n66lk5q_Mm10UYTnpEOf3w?poc_token=HKwe1mmjFX-RhveuVjk_MbRgFTcirVE2tKrRP_gS

https://preview.redd.it/95l4ujf5sxtg1.png?width=1461&format=png&auto=webp&s=66a5a5d362e94c762073a9c0b9b77a9ce447b563

https://preview.redd.it/qtvhodf5sxtg1.png?width=1446&format=png&auto=webp&s=f24a99a6d4aed501c0d7adc55a9ac19b4ba01a07