Abstract: High-quality image captions play a crucial role in improving the performance of cross-modal applications such as text-to-image generation, text-to-video generation, and text-image retrieval.
Abstract: Recent studies have explored the combination of different LoRAs to jointly generate learned style and content. However, existing methods either fail to effectively preserve both the original ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results