Happy Horse 1.1: video and sound generated together

Alibaba's video model turns a prompt, a still image, or up to nine reference images into 1080p clips, with dialogue, ambient sound, music, and Foley produced in the same pass, and lip-sync that holds across seven languages.

Inside HappyHorse 1.1

  • Three ways to start a scene

    Generate from a text prompt, animate a still as your first frame, or drive a scene from 1 to 9 reference images. One model handles text-to-video, image-to-video, and reference-to-video.

  • Sound made with the picture

    Dialogue, ambient noise, music, and Foley are generated in the same pass as the visuals, not layered on afterward, so a clip arrives already scored and mixed to the action.

  • Lip-sync in seven languages

    Phoneme-level lip-sync matches the spoken audio across English, Mandarin, Cantonese, Japanese, Korean, German, and French. Put the lines in your prompt and the mouth follows.

  • Characters that stay on-model

    Reference up to nine subjects as character1 through character9 and they hold their look across shots, so the same face and wardrobe carry through a sequence.

  • Multi-shot in a single prompt

    Sequence several shots in one generation by leading each segment with a timecode range, like 00-05 then 05-10, each with its own action and framing.

  • The specs that matter

    720p or 1080p, 24 fps, 3 to 15 seconds per clip, aspect ratios from 16:9 to 9:16 and beyond. Served on fal.ai and cleared for commercial use.

what is Happy Horse 1.1?

Happy Horse 1.1 is Alibaba's AI video model that generates video and synchronized audio together in a single pass. It works from a text prompt, a still image, or up to nine reference images, producing 720p or 1080p clips of 3 to 15 seconds with native dialogue. On Artlist, you can generate with Happy Horse 1.1 and use the output commercially.

happy horse 1.5

How to prompt Happy Horse 1.1?

Happy Horse 1.1 follows scene direction closely, so the more you specify, the more control you keep. Spell out motion, framing, lighting, and pacing rather than leaving them to the model. A few patterns get the most out of it:

  1. 01

    Write the lines, hear them spoken

    For talking-head shots, write the spoken lines directly in the prompt. The audio and lip-sync are generated to match them.

  2. 02

    Sequence shots with timecodes

    To sequence multiple shots in one clip, lead each segment with a timecode range, like 00-05 then 05-10, and give each its own action.

  3. 03

    Name your characters, lock their look

    In reference-to-video, name your subjects character1 through character9 in upload order so they stay consistent across shots.

  4. 04

    Lock identity with clean references

    Use clean, high-resolution reference images with a single clear subject to lock identity and wardrobe.

Frequently asked questions

Happy Horse 1.5 is the production-focused update to the original model. It sharpens multilingual lip-sync, holds character identity and motion more steadily across shots, extends clips to 15 seconds, and generates richer native audio, adding ambient sound, music, and Foley alongside dialogue in a single pass.

Happy Horse 1.1 is Alibaba's AI video model that generates video and synchronized audio in a single pass. It runs text-to-video, image-to-video, and reference-to-video through one pipeline, producing 720p or 1080p clips of 3 to 15 seconds with dialogue, ambient sound, music, and Foley.

Yes. Happy Horse 1.1 synthesizes audio jointly with the picture in the same pass, so dialogue, ambient noise, music, and Foley arrive built into the clip. It also produces phoneme-level lip-sync across seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French.

Happy Horse 1.5 outputs 720p or 1080p at 24 fps, with clips from 3 to 15 seconds and a default length of 5 seconds. There is no native 4K, and longer sequences are built by stitching multiple generations together.

Yes. Happy Horse 1.1 is cleared for commercial use. Confirm how Artlist's plan and license terms apply to your specific project before publishing, since commercial scope and any usage limits are set at the platform level.

Still have questions? We're here to help.