Abstract: Automatic evaluation of sequence generation, which has traditionally relied on metrics such as BLEU and ROUGE, often struggles to capture the semantic accuracy of generated text due to an ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results