[ot][aiml] Merge Higher-Performing Models Without Compute Across Modes?
from March via X (my X feed content is restabilizing after the change) https://arxiv.org/abs/2403.13187 https://github.com/SakanaAI/evolutionary-model-merge <https://github.com/SakanaAI/evolutionary-model-merge?tab=readme-ov-file> The Open LLM Leaderboard is now dominated by merged models, showcasing its potential for democratizing foundation model development. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard 1. Automated Model Composition: We introduce Evolutionary Model Merge, a general evolutionary method to automatically discover optimal combinations of diverse open-source models for creating new foundation models with user-specified capabilities. This approach harnesses the collective intelligence of existing open models, enabling the creation of powerful models without the need for extensive training data or compute. 2. Cross-Domain Merging: We demonstrate that our method can discover novel ways to merge models from disparate domains (e.g., non-English language and Math, non-English language and Vision), potentially exceeding the capabilities achievable through conventional human design strategies. 3. State-of-the-Art Performance: We showcase the effectiveness of our method by automatically generating a Japanese LLM with Math reasoning capability and a Japanese Vision-Language Model (VLM). Notably, both models achieve state-of-the-art performance on various benchmarks, even without explicit optimization for those tasks. 4. High Efficiency and Surprising Generalizability: We observe that our 7B parameter LLM surpasses the performance of some previous 70B parameter Japanese LLMs on benchmark datasets, highlighting the high efficiency and surprising generalization capability of our approach. We believe this model can serve as a strong general-purpose Japanese LLM.
looks like a wip addition to mergekit: https://github.com/arcee-ai/mergekit/blob/main/mergekit/scripts/evolve.py documented at https://github.com/arcee-ai/mergekit/blob/main/docs/evolve.md a couple more implementation approaches or repos discussed at https://github.com/arcee-ai/mergekit/issues/207 mergekit seems a reasonable resource for this for now, they just implemented a newer method called DELLA https://arxiv.org/abs/2406.11617
participants (1)
-
Undescribed Horrific Abuse, One Victim & Survivor of Many