YTPMV or 音MAD, very generally speaking these are videos in a musical composition is recreated using sounds from often absurd sources. There are some tunes that are recreated more often than others in that format, four of them are from the same author and form a tetralogy. These pieces are:
The thing that stuck to me for over two years ago and refuses to let go is an YTPMV with Miodowe Lata (popular polish sitcom) as the sound source and The Bluefin Tuna Comes Flying as the song being recreated. Inspired by Pony Preservation Project, I wanted to train a voice synthesis model and with that model hand-pick and generate the relevant parts of the song (it seemed more obvious to me than sentecemixing the sitcom). Over a year ago, I wrote a list describing the next steps of the project.
. I managed to get to step 2.1 and got stuck, figuring out the diarization was too much for me, which was unimaginably frustrating because the program I was using for diarization somehow worked. The provided sample script was generating on the basis of what the program was spitting out a nice video-chart showing who was speaking and with what confidence but when I tried to redirect the information given to the graphing library (matplotlib) to my own script which was supposed to cut out the parts where one character was speaking and the other was not everything was fell apart (I couldn't debug it and I shelved the project). I have a nice video graph saved and it looked like this:
I was able to generate the Polish speech mentioned in 1.1 by using TransormerTTS2 with M-AILABS dataset in the Transformer itself I just had to change a few lines of code, my mate helped me with that, I was managed by changing only a few lines because the package which was used for phonetization had support for Polish language. I managed to train the model on a male voice from M-AILABS . and I also managed to swap the source of the files during the training and the female model quickly learned as well. .
All this was done a year ago, it started around March and the audio files are from July, if I made my progress available on the internet I would be (at least as far as I know) the first polish guy who put on the internet stuff synthesized in polish by new generation AI synthesizers (what I managed to do was bad, now there are various beautiful models in polish with sites where you can freely synthesize whatever e.g. mekatron)
Well I got to that point and I couldn't do anything else (in terms of preparation for the next phases of the projects outlined in the list I also found the original project files and midi files of all the music tracks I wanted to make ytpmv based on (I would use them as reference)). I'm still a long way from being able to tackle this project.
I assume if I were to take up this project again (I plan to), I would have to:
Start with a different architecture (it's hard to imagine how fast everything is going in the CompSci field, TalkNET looks promising)
Figure out a comfy way to collect data, certainly diarization has jumped forward, if I search I should find it (god i love the internet!)
Ask one of my few friends who have good graphics cards to give me ssh access to some linux-usb to train me when I will have everything ready (my poor x970 is lacking a bit in the VRAM departament and some aplications require 10gb VRAM minimum)
Figure out:
Reaper (I heard it's the best DAW for YTPMV)
AdobeAE or Blender (for finishing up with a a stunning visual experience)
I think that after the first one the further ones would be easier and better. For sure, if I decided to try this project again, I would have to make 2/3 unrelated YTPMVs before, (it's quite a challenging task in itself) so that the first part of the tetralogy will be something I could be proud of.
I'm thinking of sampling Polish sitcoms in this order:
Miodowe Lata
Święta wojna
Trzynasty posterunek
and last, with bang Świat wg. Kiepskich.
To close up this post I'm sharing with every internet traveller my favourite (not beaten at being my favourite in 2 years of exploring the cavities of the interwebz) fragment of YTPMV/音MAD, between 3:52 and 4:12 is an absolute masterpiece, I invite you to watch it on full screen (the next two submissions after this masterpiece are also phenomenal (the third one wasn't my thing)).