Are there any HeyGen avatar maker limitations I should be aware of?

Avatar IV has a few hard constraints worth knowing before you start: Output is 16:9 horizontal only: portrait or square images will have white bars added, and that's not adjustable after generation. It's single-speaker only, so multi-character scenes aren’t possible in one generation. Maximum duration is three minutes per run, and resolution caps at 1080p. On the behavioral side: flat, monotone recordings produce flat avatars. The model reads what’s in the audio, but doesn't compensate for a low-energy performance. For HeyGen Translate, the main limitation is cost. It’s the more expensive of the two models. Output resolution and ratio match the input video, so quality is only as good as your source file.

How can I use the Toolkit to create a ready-made production that will include my avatar video?

Generate your avatar video inside the Artlist AI Toolkit using Avatar IV: Upload your image and audio file, configure your settings, and export once you’re happy with the output. From there, bring it into your wider Artlist project and build around it with AI music , stock footage , and motion elements. Edit everything into a finished piece, such as an ad, explainer, or social video, with the avatar as the presenter.

How many languages can I translate with HeyGen Video Translate?

HeyGen Translate supports 76 languages in total: 38 European, including French, German, Italian, Spanish, and Portuguese 30 Asian, spanning Hindi, Japanese, Korean, Chinese/Mandarin, and Indonesian 5 African (Afrikaans, Amharic, Somali, Swahili, and Zulu) 3 Middle Eastern (Arabic, Hebrew, and Persian)

HeyGen AI — Build expressive avatars that speak for you

Turn a photo and audio file into a talking avatar with HeyGen AI. Your input drives natural facial movement, expression, and lip sync, creating presenter-style videos. No production crew, no studio, no reshoots.

Try HeyGen Avatar 4

What is HeyGen AI?

HeyGen AI is a generation platform that creates talking avatar videos and translates existing video content into new languages. On Artlist, two HeyGen models are available inside the AI Toolkit.

Start Creating

How HeyGen AI changes traditional video production

Every shoot costs time, money, and coordination. Creators using AI avatars report 80% less production time, 90% lower costs, and 10x the output. Here’s how HeyGen AI makes that possible.

Start Creating

→
Lip sync that actually looks right
Mouth and facial movements update to match the audio, whether you’re generating a talking avatar from a photo or dubbing an existing video into a new language. The result reads as native. How close it gets depends on source quality and language pair.
→
Audio as the creative input
Audio is the main input across HeyGen’s models. The voice drives expression, gesture, and delivery. What goes into the audio is what comes out on screen, in any language you choose.
→
The voice stays yours
HeyGen AI doesn’t replace your voice; it preserves it. Vocal tone matches expression on one side, and carries across every language on the other. Delivery stays intact either way.
→
Same presence, every video
Consistency is usually the first thing that breaks at scale. HeyGen’s technology keeps the same face, delivery, and on-screen presence across every piece of content. A reusable avatar is consistent by design, which means it doesn’t surprise you.
→
One asset, any language
Multilingual output is built in from the start. This is really where you save yourself time. One strong source video can cover dozens of markets without touching the production again.
→
Same presence, every video
Consistency is usually the first thing that breaks at scale. HeyGen’s technology keeps the same face, delivery, and on-screen presence across every piece of content. A reusable avatar is consistent by design, which means it doesn’t surprise you.

What creators are actually making with HeyGen AI

From presenter-led social content to localized campaign videos, here's where HeyGen AI talking avatar tools fit into a real production workflow.

Try HeyGen AI

Personal avatars and digital clones
Create a reusable digital version of yourself from an image and audio file. First generation takes minutes, then the same face is available for every project.
Marketing and social media assets
HeyGen’s AI avatar generator is for short-form, presenter-led content for TikTok, Instagram, and YouTube. Creating ten variations of the same ad with different CTAs costs the same as creating one.
Education and training resources
Generate avatar-led training videos and onboarding walkthroughs that stay consistent across teams. For longer content, segment your audio into focused sections.
Localization and global scaling
Finished videos can be translated into 76 languages (and 99 dialects) without rebuilding the production, with voice cloning to preserve the original speaker.
Singing and music content
Upload a song as your audio input, and the avatar sings it back with lip-sync precision. While one of the less obvious use cases, it’s an impressive example of what the model can actually do.

How to create your first HeyGen AI video on Artlist

HeyGen AI lives inside the Artlist AI Toolkit. Here’s how you get the most out of the available models.

Start Creating

Navigate to the AI Toolkit. Choose Avatar IV to create from a photo and audio file, or Video translation to dub existing footage.
For Avatar IV: one image (JPG, PNG, WebP, GIF, AVIF) and one audio file (MP3, WAV, M4A, OGG, AAC). For Video Translate: your video file, plus your target language selected from 175+ options.
For Avatar IV, set output resolution up to 1080p. For HeyGen Video Translate, choose lip-sync on or off, and toggle dynamic duration based on whether output length needs to match the original.
Run the generation and preview inside the Toolkit before exporting. Avatar IV automatically saves generated avatars. Same face, next project, no re-upload.

Expert tips to get the most out of HeyGen AI

Most issues with Avatar IV and HeyGen Translate come from the same place, treating them as automatic when they’re actually responsive. Here's where people usually go wrong.

Treat the audio as the brief
Avatar IV extracts expression from your recording. It doesn’t invent it. Before generating, listen back and ask whether it sounds like a performance or a first read. Vary pace, use natural pauses, let inflection carry the emotion. You can rerun with better audio, but you can’t fix a flat avatar in post.
Segment long audio, don’t run it as one block
For anything over 30 seconds, break the audio into logical sections before generating. Avatar IV maintains expression alignment better across shorter segments than across a continuous input.
Plan your framing before you upload
Avatar IV outputs in 16:9 horizontal only. If your source image is portrait or square, white bars get added, and that's not fixable after generation. Crop or reframe to 16:9 before uploading.
Lock duration for ads, go dynamic for the rest
Dynamic duration is the better default. It produces more natural pacing. Lock to original length only when timing is non-negotiable, like ads or synced presentations.
Portraits, half-bodies, and full-bodies all work
Unlike most avatar tools, Avatar IV handles portrait, half-body, and full-body image formats, and works from angled or profile shots, not just front-facing headshots. Sharper, better-lit images still produce more detailed outputs.

Is HeyGen AI right for your workflow?

HeyGen AI is built for anyone producing video at scale without a camera, a crew, or a full production budget.

Try HenGen AI

Marketing and performance teams
Produce UGC-style ads, product promos, and multiple campaign variations without reshooting. Change the hook, swap the CTA, test across channels, all from the same source assets.
Education and L&D teams
With HeyGen AI, you can create training modules and internal explainers that stay consistent across teams and locations. When content needs updating, regenerate the relevant segment.
Sales and outreach teams
Use avatar-led video for client updates, product walkthroughs, or outreach campaigns. It keeps a human presenter format without requiring a shoot every time the message changes.
Content creators
Stay off camera without losing your on-screen presence. Build a reusable digital presenter, keep a consistent face across every video, and produce recurring avatar-led content.

Learn more about HeyGen AI

Frequently asked questions

HeyGen AI is strongest at two things: generating expressive talking-avatar videos from a single image and audio file, and translating existing videos into 76 languages without losing the original speaker's voice. Both Avatar IV and Video Translator are built for human video at scale. One creates it from scratch, the other adapts what already exists.

Avatar IV has a few hard constraints worth knowing before you start:

Output is 16:9 horizontal only: portrait or square images will have white bars added, and that's not adjustable after generation.
It's single-speaker only, so multi-character scenes aren’t possible in one generation.
Maximum duration is three minutes per run, and resolution caps at 1080p.
On the behavioral side: flat, monotone recordings produce flat avatars. The model reads what’s in the audio, but doesn't compensate for a low-energy performance.

For HeyGen Translate, the main limitation is cost. It’s the more expensive of the two models. Output resolution and ratio match the input video, so quality is only as good as your source file.

Avatar IV requires an image and an audio file. The image becomes the visual base of the avatar. The audio drives speech, expression, and delivery. It’s the only creative lever you have over how the avatar performs. Supported image formats are JPG, PNG, WebP, GIF, and AVIF. Supported audio formats are MP3, WAV, M4A, OGG, and AAC.

Yes. Only use images you have the rights to. Using someone’s likeness without consent can violate privacy and intellectual property laws, and goes against Artlist’s Terms of Use. When in doubt, use your own image or a purpose-built character. If you’re using a real person’s photo, make sure you have explicit permission before generating.

Realism comes mostly from the inputs, not the settings. To create your AI Avatar, start with a clear, well-lit photo where the face isn’t heavily obscured or angled. For audio, keep pacing natural and expressive — tone directly shapes facial movement.

Record in short, natural segments rather than one long continuous take. Matching gesture intensity to the content also helps avoid a staged look.

Generate your avatar video inside the Artlist AI Toolkit using Avatar IV:

Upload your image and audio file, configure your settings, and export once you’re happy with the output.
From there, bring it into your wider Artlist project and build around it with AI music, stock footage, and motion elements.
Edit everything into a finished piece, such as an ad, explainer, or social video, with the avatar as the presenter.

HeyGen Translate supports 76 languages in total:

38 European, including French, German, Italian, Spanish, and Portuguese
30 Asian, spanning Hindi, Japanese, Korean, Chinese/Mandarin, and Indonesian
5 African (Afrikaans, Amharic, Somali, Swahili, and Zulu)
3 Middle Eastern (Arabic, Hebrew, and Persian)

Yes. HeyGen Translate also supports 99 regional dialects, so accent coverage goes well beyond basic translation. English alone has 16 accents, Spanish covers 22 regional variants across Latin America and Spain, Arabic has 16 dialects, and Chinese/Mandarin offers 9. French, German, Dutch, Portuguese, Tamil, Urdu, and Swahili also include multiple regional options.

Yes. HeyGen Video Translate includes automatic lip syncing. The model updates the speaker’s lip and facial movements to match the translated audio, so the output looks and sounds like it was recorded in the target language.

Still have questions? We're here to help.

HeyGen AI — Build expressive avatars that speak for you

What is HeyGen AI?

How HeyGen AI changes traditional video production

Lip sync that actually looks right

Audio as the creative input

The voice stays yours

Same presence, every video

One asset, any language

Same presence, every video

What creators are actually making with HeyGen AI

Personal avatars and digital clones

Marketing and social media assets

Education and training resources

Localization and global scaling

Singing and music content

HeyGen AI models

HeyGen Avatar IV

How to create your first HeyGen AI video on Artlist

Open the AI Toolkit and select your model

Upload your inputs

Configure before you generate

Generate, preview, and export

Expert tips to get the most out of HeyGen AI

Treat the audio as the brief

Segment long audio, don’t run it as one block

Plan your framing before you upload

Lock duration for ads, go dynamic for the rest

Portraits, half-bodies, and full-bodies all work

Is HeyGen AI right for your workflow?

Marketing and performance teams

Education and L&D teams

Sales and outreach teams

Content creators

Learn more about HeyGen AI

HeyGen Avatar 4: The strong all-rounder for video creators making AI avatars (opens in new tab)

HeyGen Translate for video creators: localize your content without losing your voice (opens in new tab)

Meet your digital cast: a guide to AI Avatars on Artlist (opens in new tab)

Frequently asked questions

What does HeyGen AI excel in?

Are there any HeyGen avatar maker limitations I should be aware of?

What inputs are available to start creating an avatar?

Do I need permission to add someone else’s picture in HeyGen Avatar IV within Artlist’s Toolkit?

How can I make my HeyGen-generated avatar more realistic?

How can I use the Toolkit to create a ready-made production that will include my avatar video?

How many languages can I translate with HeyGen Video Translate?

Does HeyGen Video Translate also provide different accents?

Does the HeyGen Video Translate include lip syncing?