Their tool, known as HART (short for Hybrid Autoregressive Transformer), can generate images that match or exceed the quality of state-of-the-art diffusion models ... or a person's hair, eyes, or ...
Generates a video with synchronized audio based on text input. Combines images with the text-to-speech audio to create a cohesive video.