Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Streaming Sequence Transduction through Dynamic Compression

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Efficiently Harnessing Parameter Importance for Better Training

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

Language-Aware Multilingual Machine Translation with Self-Supervised Learning

The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer