Happy Horse 1.1: video and sound generated together

Q: How is Happy Horse 1.5 different from Happy Horse 1.0?

Happy Horse 1.5 is the production-focused update to the original model. It sharpens multilingual lip-sync, holds character identity and motion more steadily across shots, extends clips to 15 seconds, and generates richer native audio, adding ambient sound, music, and Foley alongside dialogue in a single pass.

Q: What is Happy Horse 1.5?

Happy Horse 1.5 is Alibaba's AI video model that generates video and synchronized audio in a single pass. It runs text-to-video, image-to-video, and reference-to-video through one pipeline, producing 720p or 1080p clips of 3 to 15 seconds with dialogue, ambient sound, music, and Foley.

Q: Does Happy Horse 1.5 generate sound with the video?

Yes. Happy Horse 1.5 synthesizes audio jointly with the picture in the same pass, so dialogue, ambient noise, music, and Foley arrive built into the clip. It also produces phoneme-level lip-sync across seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French.

Q: What resolution and clip length does Happy Horse 1.5 support?

Happy Horse 1.5 outputs 720p or 1080p at 24 fps, with clips from 3 to 15 seconds and a default length of 5 seconds. There is no native 4K, and longer sequences are built by stitching multiple generations together.

Q: Can I use Happy Horse 1.5 videos commercially?

Yes. Happy Horse 1.5 is cleared for commercial use. Confirm how Artlist's plan and license terms apply to your specific project before publishing, since commercial scope and any usage limits are set at the platform level.

Alibaba's video model turns a prompt, a still image, or up to nine reference images into 1080p clips, with dialogue, ambient sound, music, and Foley produced in the same pass, and lip-sync that holds across seven languages.

Generate a Clip

Inside HappyHorse 1.1

Start Creating

Three ways to start a scene
Generate from a text prompt, animate a still as your first frame, or drive a scene from 1 to 9 reference images. One model handles text-to-video, image-to-video, and reference-to-video.
Sound made with the picture
Dialogue, ambient noise, music, and Foley are generated in the same pass as the visuals, not layered on afterward, so a clip arrives already scored and mixed to the action.
Lip-sync in seven languages
Phoneme-level lip-sync matches the spoken audio across English, Mandarin, Cantonese, Japanese, Korean, German, and French. Put the lines in your prompt and the mouth follows.
Characters that stay on-model
Reference up to nine subjects as character1 through character9 and they hold their look across shots, so the same face and wardrobe carry through a sequence.
Multi-shot in a single prompt
Sequence several shots in one generation by leading each segment with a timecode range, like 00-05 then 05-10, each with its own action and framing.
The specs that matter
720p or 1080p, 24 fps, 3 to 15 seconds per clip, aspect ratios from 16:9 to 9:16 and beyond. Served on fal.ai and cleared for commercial use.

What is Happy Horse 1.1?

Happy Horse 1.1 is Alibaba's AI video model that generates video and synchronized audio together in a single pass. It works from a text prompt, a still image, or up to nine reference images, producing 720p or 1080p clips of 3 to 15 seconds with native dialogue. On Artlist, you can generate with Happy Horse 1.1 and use the output commercially.

How to prompt Happy Horse 1.1?

Happy Horse 1.1 follows scene direction closely, so the more you specify, the more control you keep. Spell out motion, framing, lighting, and pacing rather than leaving them to the model. A few patterns get the most out of it:

Start Creating

01
Write the lines, hear them spoken
For talking-head shots, write the spoken lines directly in the prompt. The audio and lip-sync are generated to match them.
02
Sequence shots with timecodes
To sequence multiple shots in one clip, lead each segment with a timecode range, like 00-05 then 05-10, and give each its own action.
03
Name your characters, lock their look
In reference-to-video, name your subjects character1 through character9 in upload order so they stay consistent across shots.
04
Lock identity with clean references
Use clean, high-resolution reference images with a single clear subject to lock identity and wardrobe.

Learn what to do with HappyHorse AI family

Discover how to use the HappyHorse 1.1 model to create videos

Frequently asked questions

Happy Horse 1.5 is the production-focused update to the original model. It sharpens multilingual lip-sync, holds character identity and motion more steadily across shots, extends clips to 15 seconds, and generates richer native audio, adding ambient sound, music, and Foley alongside dialogue in a single pass.

Happy Horse 1.1 is Alibaba's AI video model that generates video and synchronized audio in a single pass. It runs text-to-video, image-to-video, and reference-to-video through one pipeline, producing 720p or 1080p clips of 3 to 15 seconds with dialogue, ambient sound, music, and Foley.

Yes. Happy Horse 1.1 synthesizes audio jointly with the picture in the same pass, so dialogue, ambient noise, music, and Foley arrive built into the clip. It also produces phoneme-level lip-sync across seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French.

Happy Horse 1.5 outputs 720p or 1080p at 24 fps, with clips from 3 to 15 seconds and a default length of 5 seconds. There is no native 4K, and longer sequences are built by stitching multiple generations together.

Yes. Happy Horse 1.1 is cleared for commercial use. Confirm how Artlist's plan and license terms apply to your specific project before publishing, since commercial scope and any usage limits are set at the platform level.

Still have questions? We're here to help.

Happy Horse 1.1: video and sound generated together

Inside HappyHorse 1.1

Three ways to start a scene

Sound made with the picture

Lip-sync in seven languages

Characters that stay on-model

Multi-shot in a single prompt

The specs that matter

What is Happy Horse 1.1?

How to prompt Happy Horse 1.1?

Write the lines, hear them spoken

Sequence shots with timecodes

Name your characters, lock their look

Lock identity with clean references

Learn what to do with HappyHorse AI family

HappyHorse 1.0 on Artlist: Alibaba first AI video model (opens in new tab)

Presenting the exclusive AI video generator designed specifically for video producers. (opens in new tab)

Meet Artlist Studio (opens in new tab)

Frequently asked questions

How is Happy Horse 1.1 different from Happy Horse 1.0?

What is Happy Horse 1.1?

Does Happy Horse 1.1 generate sound with the video?

What resolution and clip length does Happy Horse 1.5 support?

Can I use Happy Horse 1.1 videos commercially?