Skip to content
HeartMuLa

HeartMuLa

Generation and understanding, featuring high-fidelity song synthesis and controllable structural creation

Features

Open SourceMusic

System Requirements

32GB RAM recommended. 30GB+ storage recommended.
macOS 15+: M-series chips required.
Windows 10/11: NVIDIA GPU (12GB+ VRAM) recommended. Intel or AMD GPU compatibility unverified.
Note: For NVIDIA GPUs, install a newer driver.

Introduction

HeartMuLa is a versatile "AI Music Virtuoso" that understands and creates music across various cultural boundaries:

  • Multilingual Expertise: Unlike many tools, it features robust multilingual support, including but not limited to English, Chinese, Japanese, Korean, and Spanish.
  • Text-to-Song: Simply provide lyrics or descriptions, and it generates high-quality songs with synthesized vocals and full instrumentation.
  • Structural Control: You can act as a director, specifying the musical energy and style for different sections (e.g., Verse, Chorus, Outro).
  • Lyric Transcription: It can "listen" to complex audio tracks and accurately extract lyrics across different languages.

Key Features & Capabilities

  • Global Language Support: Seamlessly handles prompts and lyrics in English, Chinese, Japanese, Korean, Spanish, and more.
  • All-in-One Framework: Integrates music generation, understanding, lyric recognition, and audio-text alignment into a single library.
  • Pro-Level Quality: Aims to match leading commercial AI services (like Suno) in terms of acoustic fidelity and musicality.
  • Open Source: The code and model weights are released to the community, fostering transparency and local innovation.

Team & Core Technology

  • The Team: HeartMuLa is the result of a collaborative effort between leading academic and research institutions, including:

  • Peking University

  • The Chinese University of Hong Kong

  • Scale Global / Ario

  • Contributions also include expertise from Independent Researchers.

  • Underlying Technology:

  • HeartMuLa LLM: A large language model architecture that treats music generation as a sophisticated sequence modeling task.

  • HeartCodec: A proprietary high-fidelity audio codec that ensures crystal-clear sound output.

  • HeartCLAP: A cross-modal alignment technology that bridges the gap between human language and musical audio.