Skip to content

Audio Pre-process

Introduction

Due to the advanced phoneme-level speech synthesis used in this model, the audio files require preprocessing. These preprocessing steps are part of standard music production processes. If there are any parts you are unsure about, please discuss them with your music producer or audio engineer, or contact us to confirm the relevant details.

Lyrics Swap

We need to output the song's MMA(Minus Music all) and its Vocal, regardless of whether the vocal performer is the same. For example, if you want to output singer A's voice singing "Twinkle Twinkle Little Star," you need to provide the Vocal of "Twinkle Twinkle Little Star" as well as its MMA.

  • VOCAL:The vocal track will be used to generate guide data, so pitch and rhythm accuracy are very important (though timbre is not).
    • format: Dry vocal track, 16bits, 44.1kHz, mono wave format.
  • MMA :MMA will not participate in AI Vocal synthesis; it is only used to create a mixed audio file (MMM) for the user during output. If mixing is not needed, you may choose not to provide MMA.
    • format: Minus music all, 16bits, 44.1kHz, mono wave format.