Ovi SECourses Premium App to Generate Audio Having 121 Frames Videos from Text and Images - Supports all GPUs Including RTX 5000 Series - Has Flash Attention + Batch Processing and Block Swapping - As Low as 8 GB VRAM - Like VEO 3 and SORA 2 - 1-Click to Install on Windows, RunPod and Massed Compute
Added 2025-10-08 21:00:00 +0000 UTCPatreon exclusive posts index to find our scripts easily, Patreon scripts updates history to see which updates arrived to which scripts and amazing Patreon special generative scripts list that you can use in any of your task.
Join discord to get help, chat, discuss and also tell me your discord username to get your special rank : SECourses Discord
Please also Star, Watch and Fork our Stable Diffusion & Generative AI GitHub repository and join our Reddit subreddit and follow me on LinkedIn (my real profile)
=======
Latest zip file : Ovi_Pro_v8.zip
Full scale ultra advanced app for Ovi - an open source project that can generate videos from both text prompts and image + text prompts with real audio.
When Clear All Memory is selected (default in 32 GB and below presets) make sure to click Cancel button first and then close CMD or it will continue working as a subprocess
Project page is here : https://aaxwaz.github.io/Ovi/
I have developed an ultra advanced and easy to use Gradio app and much better pipeline that fully supports block swapping
Our block swapping is based on Kohya Musubi tuner implemention thus it is the best in the world right now
Our app also supports Block Based FP8 Scaling which is also based on Kohya Musubi tuner and it is also the best in the world right now from quality point
So we are not using base FP8 but using FP8_Scaled when enabled with higher quality
With intelliengt Block Based Scaling there isn't almost any quality loss
Our FP8_Scaled Base model reduces VRAM like 10 GB and its safentesors file will be auto downloaded as well
Now we can generate full quality videos with as low as 6 GB VRAM with Block Swapping + Tiled VAE
Our implemented tiled-VAE is same as how ComfyUI does so it is perfect quality and best out there
The 1-click installer will install into Python 3.10.11 venv and will auto download models as well so it is literally 1-click
My installer auto installs with Torch 2.8, CUDA 12.9, Flash Attention 2.8.3 and it supports literally all GPUs like RTX 3000 series, 4000 series, 5000 series, H100, B200, etc
All generations will be saved inside outputs folder and we support so many features like batch folder processing, number of generations, full preset save and load
All generations will have metadata txt files saved as well
Look the examples to understand how to prompt the model that is extremely important
You can use Google Studio AI and Gemini for free to write new amazing prompts hopefully I will show in upcoming tutorial
Look our below screenshots to see the app features
50 Steps recommended but you can do low too like 20
1-Click to install on Windows, RunPod and Massed Compute
Optimized presets for literally every GPU (starting from 6 GB to 96 GB)
15 October 2025 V8.3
New checkbox Merge LoRAs on GPU added
This is for cloud services faster LoRA merge
This is auto enabled in 80 gb and 96 gb configs but should work with 48 gb GPUs as well
9 October 2025 V8.1
Lots of amazing new examples added
Added new example tabs T2V Video Extend Examples and I2V Video Extend Examples
These tabs will auto set Video Extend examples, and duration to 4 second for each clip - total 12 second
Now when you click load example, it will auto switch back to generate tab
New zip file has prompt_generate_guide_in_Gemini which you can use to generate prompts
Automatic set of Aspect Ratio with different base resolutions bug fixed
Manually set of Video Width and Video Height bugs fixed
Please run Windows_Install_and_Update.bat to very quickly update to latest version and see on your Gradio top left 8.1
Windows Requirements
Python 3.10.11, FFmpeg, CUDA 12.9, cuDNN 9.12, C++ Tools, MSVC and Git
If you get any errors follow below video and its source link
Source post : https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-111553210
Massed Compute (Recommend Cloud) :
Please register via this link : https://vm.massedcompute.com/signup?linkId=lp_034338&sourceId=secourses&tenantId=massed-compute
Use our coupon SECourses
Our coupon works on all GPUs now
H100 has amazing price and speed but you can use like RTX A6000 ADA as well
Full details here : https://www.patreon.com/posts/26671823
Then select our image SECourses from Creator dropdown
Then follow Massed_Compute_Instructions_READ.txt
Same as my any other Massed Compute installer script
Example tutorial for learn how to install and use Massed Compute
(Starts at 12:58) : https://youtu.be/KW-MHmoNcqo?si=G1WbG-Qw4ujWvOtG&t=778
RunPod (Cloud):
Please register via this link : https://runpod.io?ref=1aka98lq
Then follow Runpod_Instructions_READ.txt
Same as my any other RunPod installer script
Use the template written in Runpod_Instructions_READ.txt file
Example tutorial for learn how to install and use RunPod
(starts at 22:03) : https://youtu.be/KW-MHmoNcqo?si=QN8X8Sjn13ZYu-EU&t=1323
9 October 2025 V7.6
This is a massively bug fix and performance improvement update
This time we fixed for real finally :D
Now we have prompt caching feature - auto enabled
It will generate hash value of your Video Prompt + Video Negative Prompt + Audio Negative Prompt + FP8_Scaled enabled or not
With this hash value, it will check if T5 encoding exists in prompt_cache or not
If exists, it will skip T5 encoding, will speed up immersely, if not it will cache and save
This should work in all cases we have like single processing, batch processing, multi line, video extend, etc
I have entirely re-factored T5 system for above change and hopefully now we have finally fixed T5 infinite loop error, so update and try again
The loop error was caused auto enabling Delete T5 After Encoding and
Clear All Memory at the same time and now issue fixed
I have fixed the RAM leak that was happening when Clear All Memory was not enabled
Therefore now 96 GB RAM PCs can disable Clear All Memory, It may work with 64 GB RAM too so test
Still if you get error enable Clear All Memory
When Clear All Memory was not enabled, changing video duration after first run was not working since it was initializing and never changing again
This bug fixed and changing video duration should work in all cases now
Auto pad for 32px divisibility checkbox added
When enable, it won't crop any part of image, it will auto downscale to target resolution and it will fill missing parts with black pixels to make it divisible to 32
Use with Auto Crop Image - don't disable it
Do not auto enforce validation check was not working in all cases but this issue is fixed and should work now
Prompt validation system improved
Now it will show you errors of single generation on Gradio
So you will see your error
When doing batch processing, it wont start the batch and show errors on cmd
You can always enable Do not auto enforce validation check and skip auto enforce
Sorry for the errors
8 October 2025 V6.4
LoRAs were not working accurately with FP8_Scaled base model and this issue fixed
However, for LoRAs to work, we have to each time re-scale on the fly, so cached scale won't be used
This adds like 10 seconds delay
This LoRA having version F5_Scaled won't be saved, we can add save feature if you wish but that means 11.5 GB model file for every LoRA combination
I will look further if there is any way to apply LoRA to F5_Scaled base model cache
In some cases, users reported that it was infinite looping T5 encoding, this issue hopefully fixed
Hopefully I will add T5 text encoder caching to cache directory, so same prompt will use cache directly not re-cache
Hopefully I will add load background sound, so you can upload background music and auto added
Just run Windows_Install_and_Update.bat to very quickly update to latest version and see on your Gradio top left 6.4
6 October 2025 V6.3
Exact resolution issue fixed and the model will use exactly the resolution you give on interface Video Width and Video Height
When doing batch processing, based on your base Width and Height, it will auto crop and resize your input folder images accurately to exact resolutions
e.g. when base resolution is 960x960, 1152x1728px image will be generated as 768x1184
Be careful bigger resolution uses more VRAM
When doing video extension, it was not keeping Sage Attention selection, now will respect your selection
First example prompt issue fixed
Do not auto enforce validation check added
So you can generate videos without any speech tags
Now we are supporting up to 4 LoRAs
Put your loras into lora or loras folder - case insensitive
You can apply LoRA to Video Layer, Sound Layer or Both
An example working LoRA : https://civitai.com/models/1936797/glowing-eyes-wan-22-5b-i2v?modelVersionId=2192059
Verified working
LoRA feature not fully tested yet but seems like working perfect
Just run Windows_Install_and_Update.bat to very quickly update to latest version and see on your Gradio top left 6.3

6 October 2025 V5.9
This is a super important major update that almost completes our app into maximum quality
Now you can use both videos and images as input
When you upload a video as input, it will get the last frame, auto crop it if enabled, use it as a reference image, then it will generate your video and merge back with your input video, so you have basically video extension of existing videos feature right now
If you don't want auto combine, enable Don't auto combine video input checkbox and it won't auto combine just use last frame of video as an input
Now we have Multi-line Prompts feature
When this is enabled, the prompt box input will be seperated into lines and every new line prompt will become an individual prompt and it will generate a video for each prompt
Lines lesser than 3 characters will be ignored so you can put 2 new line spaces and write if you wish
This will work with batch processing as well, just have your prompts multiple lines in your batch processing folder
Don't enable Multi-line Prompts and Video Extension at the same time
Now we have Video Extension (Last Frame Based) feature
When this is enabled, it will extend your video based on the number of lines you have
Lets say you have a prompt that is 3 lines
So first line will be base prompt and will generate 0001.mp4
The second line will be second prompt, it will get last frame of 0001.mp4 and use it as an input image, use second line prompt and will generate 0001_ext1.mp4
The third line will be third prompt, it will get last frame of 0001_ext1.mp4 and use it as an input image, use third line prompt and will generate 0001_ext2.mp4
After all generations done it will merge all generations and generate 0001_final.mp4
You can extend as many as times you want with number of lines, fully automatic and working pretty good
I will hopefully add example for this into examples tab soon
Don't enable Multi-line Prompts and Video Extension at the same time
Now we have presets for all lower VRAM GPUs for Scaled FP8 Base Model
Scaled FP8 Base Model is working perfect and 24 GB GPUs can generate 5 second videos without any Block Swap, thus ultra fast
I have implemented Sage Attention and working perfect
It did speed up 15% during inference and now it is auto enabled in all presets
Auto cropping logic improved, some bugs fixed and made more robust
Automatic prompt format validation system implemented
When you click Generate button it will check and throw error if not valid
There is also Validate Prompt Format button now for you to validate and see errors
Entire Gradio app font changed to Tahoma for better readibility
New prompting feature setting duration in prompts
If you write in beginning of your prompt like {2} it will make that video generation as 2 seconds
This feature is useful for multi line generation and video extension features
So the format is {x}, if there is no such format, it will use duration slider set value
Example prompt
{4} A man is doing a podcast video. He is saying <S> Hi guys! How are you! Did you know, I am not real? <E> He continues to talk.
{2} A pod cast making man talking. He is saying <S> Like for real, I just found out. I was made by Furkan's Ovi app! <E> He then giggles <S> Hi-hi-hi! <E>
You can also write speaking prompts like this
<S>[strong sound] I am an artificial intelligence android robot.<E>
<S>[soft whisper] I am an artificial intelligence android robot.<E>
Don't modify our presets and save as your modified presets since when you update, they will be overwritten back to originals
Don't close running cmd window immediately, first click cancel and then close cmd
5 October 2025 V4.4
This is a super important update from performance wise
Now Delete T5 After Encoding will be auto enabled if your RAM is under 64 GB but if you don't have 96 GB RAM i really recommend to enable
Now when Delete T5 After Encoding is enabled, it will start it as a sub-process therefore there will be 0 RAM and VRAM leakage
Now CPU-Only T5 will load like 2x faster
New option Scaled FP8 Base Model added
This saves like 10 GB VRAM
If you had generated with V4.0 delete it from Ovi_Pro\ckpts\Ovi folder and regenerate
With FP8 Base Model, 24 GB GPUs can generate without any Block Swap
It uses like 17 GB VRAM at the moment during inference without Block Swap
T5 loading speed increased for all BF16, FP8_Scaled and CPU
FP8 Scaled T5 and Base model will be auto downloaded now
Working on fixing longer generation - reported to be broken
Just run Windows_Install_and_Update.bat to very quickly update to latest version
5 October 2025 V3.8
This is a big update and I am still testing so many new amazing features
Still in testing so report errors
Fully automatic aspect ratio resolutions and detection based on your entered Base Width and Base Height
Like set 960x960 and you will get 1280x704 automatically for 16:9
Moreover now you can generate bigger resolution videos like 1280x704 but remember it would use more VRAM than 960x544 - base resolution which is 720x720
Longer generations are also available so give them a try
Now it will show uploaded image resolution
Now it will auto crop to the new desired resolution you give immediately and show cropped image resolution too
Batch processing fixed and auto cropping perfected
It will auto recognize your images in your folder and process them with their closest aspect ratio based on your base resolution
Just run Windows_Install_and_Update.bat to very quickly update to latest version

5 October 2025 V3.4
Ok this is a massive update
We have added full tiled-VAE same as ComfyUI and working amazing
Now with Block Swapping + tiled-VAE + T5 Text Encoding on CPU (still super fast) we can generate 121 frames 5 second videos as low as on 6 GB GPUs
I have added presets for every GPU out there and the app will automatically detect your GPU and select your preset when you first time install and start
Cancel button was not working properly and now working perfect
3:4 and 4:3 aspect ratios added as well
Original repo was forcing all resolution to be 720x720, I have added a new feature called as Force Exact Resolution and with this you can generate with higher resolution like 1280x704
It must be divisible to 32
Auto crop will auto handle this
All presets have this feature enabled by default but remember VRAM presets are made for 720x720 base resolution, higher resolution uses more VRAM
Much more robust preset system developed to not have any errors when saving or loading older presets
If you are low on both VRAM and RAM try this Delete T5 After Encoding + Scaled FP8 T5 + CPU-Only T5
Inaccurately showing previous generation result on interface fixed
Just run Windows_Install_and_Update.bat to very quickly update to latest version
4 October 2025 V2.9
Delete Text Encoder After Encoding : Now you can enable or disable
Now will load Text Encoder directly into VRAM, encode and then delete or move according to your selection - Therefore it is was faster than before
Clear All Memory added and recommended - 0 VRAM and RAM leak
Scaled FP8 T5 - reduces T5 VRAM usage but slower tto load - quality same
It will be auto enabled when your VRAM is below 23 GB and you don't load a preset
Delete T5 After Encoding - enable if low on RAM
Preset save and load fully working now
Consequent generation error fixed with Clear All Memory
With CPU Offload now we will move VAE to RAM while not needed and thus with 29 Block Swap now it uses only 6 GB VRAM during inference
Upcoming tiled VAE hopefully and FP8_Scaled model loading and video extending - loop
With v2.9 it will save FP8_Scaled version of T5 and use it when you next time used Scaled FP8 T5 to speed up
The file will be saved inside ckpts\Wan2.2-TI2V-5B
Just run Windows_Install_and_Update.bat to very quickly update to latest version
Full tutorial video coming soon hopefully


Comments
looks like your lora is incompatiable. so far we have verified Wan 2.2 5B model loras working. what lora is you trying?
Furkan Gözükara
2025-10-26 16:42:02 +0000 UTCHi!, everything is great, the samples work I'm using 3080Ti 12GB . However, I'm having issues with LORA, it will not apply, please assist: [VIDEO MODEL] Merging 1 LoRA(s)... WARNING:root:⚠ DETECTED: bfloat16 CPU matmul is catastrophically slow on this system! WARNING:root: Automatically enabling float32 workaround for LoRA merging WARNING:root: Recommendation: Downgrade PyTorch to 2.4.x or 2.5.x (current: 2.8.0+cu129) Merging LoRA layers: 0%| | 0/1035 [00:00CPU transfers for each layer! WARNING:root:Failed to merge LoRA for layer blocks.0.self_attn.q.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.self_attn.k.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.self_attn.v.weight: shape '[3072, 3072]' is invalid for input of size 26214400 Merging LoRA layers: 1%|▊ | 15/1035 [00:00<00:07, 139.50it/s]WARNING:root:Failed to merge LoRA for layer blocks.0.self_attn.o.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.cross_attn.q.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.cross_attn.k.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.cross_attn.v.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.cross_attn.o.weight: shape '[3072, 3072]' is invalid for input of size 26214400 WARNING:root:Failed to merge LoRA for layer blocks.0.ffn.0.weight: shape '[14336, 3072]' is invalid for input of size 70778880 WARNING:root:Failed to merge LoRA for layer blocks.0.ffn.2.weight: shape '[3072, 14336]' is invalid for input of size 70778880 WARNING:root:Failed to merge LoRA for layer blocks.1.self_attn.q.weight: shape '[3072, 3072]' is invalid for input of size 26214400 Merging LoRA layers: 4%|██ | 37/1035 [00:00<00:05, 170.06it/s]WARNING:root:Failed to merge LoRA for layer blocks.1.self_attn.k.weight: shape '[3072, 3072]' is invalid for input of size 26214400
TokyoIdolsAFK
2025-10-26 11:57:37 +0000 UTCHello again. Not VRAM how much RAM you have? Did you set 100 GB virtual RAM? can you set and let me know after restarting windows : https://www.windowscentral.com/software-apps/windows-11/how-to-manage-virtual-memory-on-windows-11
Furkan Gözükara
2025-10-25 19:26:12 +0000 UTCHi. I have VRAM 24Gb, all parameters as you write. I checked all items and then I reinstalled all. Again, The app calculates/generates, but in the end I always have only this message : "Error during video generation: [WinError 2] The system cannot find the file specified" without any video in the Output. only wav founded in Output folder. ================================================================================ STARTING VAE DECODE - VRAM before: 2.07 GB ================================================================================ VAE DECODE PROGRESS: Decoding video (standard mode)... VAE DECODE PROGRESS: Standard decode completed ================================================================================ VAE DECODE COMPLETE VRAM after: 2.81 GB Peak during decode: 14.33 GB VRAM used by decode: 12.25 GB ================================================================================ Error during video generation: [WinError 2] The system cannot find the file specified [SINGLE-GEN] Failed - no result returned [SUBPROCESS] Generation failed with return code: 1 [GENERATION 1/1] No output file found in N:\AI\AI_Video\Ovi_Pro_v8\Ovi_Pro\outputs after retries Total generation time: 332.54 seconds ================================================================================ VIDEO GENERATION COMPLETED Final output path: None File exists: No ================================================================================ [MEMORY CLEANUP] Final cleanup completed - all generation memory freed Please I need help. Thank you for your job and other programs and scripts.
Ant-2014
2025-10-25 18:50:22 +0000 UTCyes out of RAM. how much RAM you have?
Furkan Gözükara
2025-10-25 18:26:28 +0000 UTCHello, I have all python, cuda etc locally, but always get this error , whatever t2v, i2v : Can you help me, please? VAE DECODE COMPLETE VRAM after: 2.38 GB Peak during decode: 7.51 GB VRAM used by decode: 5.46 GB ================================================================================ Error during video generation: [WinError 2] The system cannot find the file specified [SINGLE-GEN] Failed - no result returned [SUBPROCESS] Generation failed with return code: 1 [GENERATION 1/1] No output file found in N:\AI\AI_Video\Ovi_Pro_v8\Ovi_Pro\outputs after retries Total generation time: 188.18 seconds ================================================================================ VIDEO GENERATION COMPLETED Final output path: None File exists: No ================================================================================ [MEMORY CLEANUP] Final cleanup completed - all generation memory freed
Ant-2014
2025-10-25 17:51:49 +0000 UTCthanks. next gen will be hopefully even better model and app :)
Furkan Gözükara
2025-10-20 22:09:40 +0000 UTCSo after trying again this isn't for me especially how you can't let the app generate what it wants to say for you instead of you telling it everything. Anyway great job on the app
James Woodill
2025-10-20 21:56:08 +0000 UTCi am looking as well. i hope there will be speed loras
Furkan Gözükara
2025-10-19 10:51:35 +0000 UTCThank you very much for this great work. I've already done some tests and am happy about the possibility of working locally. For the first time, I have video and sound, and it works with German voice. I hope there will soon be an option to achieve significantly reduced generation times with a 4- or 8-step LORA. Are you working on this?
thom mick
2025-10-18 01:04:55 +0000 UTCwhich preset you using? changed any settings? it should be instant normally
Furkan Gözükara
2025-10-13 15:18:30 +0000 UTCIf I load a lora, the process requires quite some (big) time: [VIDEO MODEL] Merging 1 LoRA(s)... Merging LoRA layers: 79%|█ It takes about 5 minutes, but I see my runpod (H100 + cpu xeon platinum 8352Y), the cpu is stuck at almost 100%, and gpu not running. Maybe there's space for some optimization on lora loading (maybe now device is set on cpu). Hope this can help!
FalconBravery
2025-10-13 15:13:07 +0000 UTCout of RAM. how much RAM you have? did you set 100 gb virtual disk?
Furkan Gözükara
2025-10-13 10:35:36 +0000 UTCI've got this erro on first generation. What it could be? Initial VRAM: 0.00 GB Removing weight norm... ================================================================================ SCALED FP8 T5: Loading T5 in Scaled FP8 format Expected VRAM savings: ~50% (~5-6GB saved) ================================================================================ [FP8 CACHE] Found cached FP8 checkpoint: E:\AI\Ovi_Pro_v8\Ovi_Pro\ckpts\Wan2.2-TI2V-5B\models_t5_umt5-xxl-enc-fp8_scaled.safetensors [FP8 CACHE] Creating structure on CPU first (avoids BF16 VRAM allocation) [T5 LOAD][FP8] Structure created on CPU in 32.66s (FP8 cached path) [SUBPROCESS] Generation failed with return code: 3221225477 [GENERATION 1/1] No output file found in E:\AI\Ovi_Pro_v8\Ovi_Pro\outputs after retries Total generation time: 63.85 seconds ================================================================================ VIDEO GENERATION COMPLETED Final output path: None File exists: No
Pedro Burle
2025-10-13 01:44:28 +0000 UTCWan 2.2 5b loras tested and working. i didnt test others. sadly you cant set ending frame. only beginning frame
Furkan Gözükara
2025-10-12 23:51:20 +0000 UTCWould it ever be possible to set a starting frame and ending frame with this? Also, what all are the types of LoRA that can be used?
Diggy Dre
2025-10-12 22:38:02 +0000 UTCi suppose closer shot is better. also it has 2 options you can try and see : Audio Guidance Scale - SLG Layer (Skip Layer Guidance layer - affects audio-video synchronization)
Furkan Gözükara
2025-10-12 19:05:53 +0000 UTCThis is very nice. I can't wait to try the batch feature. I'm trying to use an anime-style character. How can I get the mouth movements to be more accurate (this is for pronunciation teaching)
Taiga
2025-10-12 14:00:30 +0000 UTChi we have it in requirements. that means your install failed for some reason. can you run installer again and email me logs? please delete venv before : monstermmorpg@gmail.com
Furkan Gözükara
2025-10-12 12:20:02 +0000 UTCHi, I get this error when trying to run the app, I have downloaded everything and run the update. Do you know this error? [STARTUP] Set MKL/OMP threads to 20 for optimal CPU performance Traceback (most recent call last): File "C:\OVI\Ovi_Pro\premium.py", line 21, in from ovi.utils.io_utils import save_video File "C:\OVI\Ovi_Pro\ovi\utils\io_utils.py", line 5, in from moviepy.editor import ImageSequenceClip, AudioFileClip ModuleNotFoundError: No module named 'moviepy' Press any key to continue . . .
Dan
2025-10-12 11:51:46 +0000 UTCnah they are official sources no issues. possibly safe tensor can be used if there is accurate version
Furkan Gözükara
2025-10-11 19:46:18 +0000 UTCWould you be able to make it use all .safetensors files instead of it having some .pt and .pth files (that I think are riskier because those could contain pickled content in theory)?
cool1
2025-10-11 17:26:28 +0000 UTCthanks a lot for comment. i believe what you want and what i also want will become available soon. currently vibevoice can generate other languages. hopefully i will publish an app for it soon
Furkan Gözükara
2025-10-11 15:03:23 +0000 UTCThank you always! If I may express a personal wish, it would be great if Korean and Japanese voice options were also available. And if there were features like generating videos that lip-sync to the input voice, or generating voices that match sample voices (e.g., GPT-SoVITS), that would be truly amazing. I believe that someday, something even better encompassing these features will emerge. :) Anyway, thank you so much!!
Mimic
2025-10-11 14:25:38 +0000 UTCthis app can't do it but multi talk is doing that : https://youtu.be/8cMIwS9qo4M
Furkan Gözükara
2025-10-11 08:44:53 +0000 UTCI have an interesting question. Suppose you had some pre-recorded audio that you wanted this thing to animate to, would that ever be a feature? Almost like giving it something to lip-sync to?
Diggy Dre
2025-10-11 07:03:06 +0000 UTCi dont know if any flags sadly. but --share will start on gradio live so you can use from anywhere
Furkan Gözükara
2025-10-10 23:36:14 +0000 UTCthat is gradio error. not important. what else do you see after this?
Furkan Gözükara
2025-10-09 22:36:35 +0000 UTCHello there. I have just tried one of the examples and this is the error im getting: traceback (most recent call last): File "C:\Python310\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\Python310\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054]
rSandor
2025-10-09 22:34:19 +0000 UTCHey Furkan, thanks for fixing the looping bug. I am able to get video output now. Is there a flag to fun the Gradio server on the local network?
Justin
2025-10-09 15:26:19 +0000 UTCDid you try v8 zip file? it only has txt and bat files you can see what is inside. so please allow and download it
Furkan Gözükara
2025-10-09 14:06:02 +0000 UTCits getting flagged as a virus when downloading the installer
Abu Bhakar
2025-10-09 13:16:16 +0000 UTCbecause it is wrong. it is not s s. we even added check :D it is <s> </s>
Furkan Gözükara
2025-10-09 00:42:07 +0000 UTCI am now able to get video, which is pretty cool, but I can't get the voice part to actually read through the script. It randomly bounces back and forth between previous words within the S and E speech brackets.
Diggy Dre
2025-10-08 23:34:48 +0000 UTChi are you on 7.2? can you copy cmd logs starting from first line copy into txt file and email me? monstermmorpg@gmail.com
Furkan Gözükara
2025-10-08 17:43:07 +0000 UTCJuanmyth
2025-10-08 17:41:53 +0000 UTCWindows error. You can see every file content with notepad. You have to allow it to download.
Furkan Gözükara
2025-10-08 10:08:22 +0000 UTCYes I think so
Furkan Gözükara
2025-10-08 10:08:03 +0000 UTCWhat languages does this version support? Only English?
Hoàng Giang Sơn Trương
2025-10-08 02:41:48 +0000 UTCThe latest version does not allow downloading, the Windows system marks it as a virus. I was able to download it without any problem. Thank you very much.
carlos chavez
2025-10-08 02:31:06 +0000 UTCWell you can mute sound. I added checkbox to generate without speech tags option. Try that too . But will check if no audio option available or not
Furkan Gözükara
2025-10-07 20:03:44 +0000 UTCHi Furkan. Can I generate videos without speech or background sounds, or with one or the other?
michele carlone
2025-10-07 19:26:27 +0000 UTCUpdate latest version, delete outputs and send me entire cmd log as email : monstermmorpg@gmail.com You can use T5 cpu almost same speed
Furkan Gözükara
2025-10-07 17:26:46 +0000 UTCI wasn't able to get this to work on my 4080. Also, for some reason, it kept trying to offload the text encoder to my 9950x.
Diggy Dre
2025-10-07 16:54:32 +0000 UTCclear your output folder. i need to make a fix for this. after doing that restart and let me know. also try v 6.2
Furkan Gözükara
2025-10-07 16:04:52 +0000 UTCGreetings. I did a clean install from v3 (working) to v5. Now my generations are consistently stuck in a loop back into the T5 encoding process. Only way to stop the process is closing Python. Thanks in advance
Oliver
2025-10-07 15:33:18 +0000 UTCyou can edit all of the files with notepad and see content. 100% false positive. also virustotal has 0 : https://www.virustotal.com/gui/file/9b4b81a000308cc5ce9d01a138cbd0737820331b6ed24799521637dea3b5336e
Furkan Gözükara
2025-10-07 10:24:07 +0000 UTCyou can change seed. currently it is 99 unless you enable randomize seed
Furkan Gözükara
2025-10-07 10:22:54 +0000 UTCIn the samples on the project page https://aaxwaz.github.io/Ovi/ the voices vary, mine sound all the same. Is there a way to vary the voice when using an image as a starting point for the video? A prompt trigger word or something
Neil Rhodes
2025-10-07 08:07:07 +0000 UTCThe tags are not allow on the comment lol so I will use * instead of the tag symbols :p In the "how to use" windows, you said : Check *S*...*/S* tag format Add *AUDCAP*...*/ENDAUDCAP* descriptions ect… but in the examples and on the page of the model they use the tag formats *S* *E* (for speech) *AUDCAP* *ENDAUDCAP* (for audio description). which ones are the good ones ?
thecatzman
2025-10-07 05:25:24 +0000 UTCIn the "how to use" windows, you said : Check <s>...</s> tag format Add ... descriptions ect… but in the examples and on the page of the model they use the tag formats <s> (for speech) (for audio description). which ones are the good ones ?</s>
thecatzman
2025-10-07 05:20:26 +0000 UTCNew version is lighting up my antivirus "Threat found - action needed. 10/6/2025 9:38 PM Severe Detected: Trojan:Script/Wacatac.C!ml Status: Active Active threats have not been remediated and are running on your device. Date: 10/6/2025 9:38 PM Details: This program is dangerous and executes commands from an attacker. Affected items: file: C:\Users\ Downloads\Ovi_Pro_v5.zip Learn more"
leem0nchu
2025-10-07 01:42:45 +0000 UTCi have dual GPUs too. REM means it is commented. so make a copy of bat file and add this line before call line as SET CUDA_VISIBLE_DEVICES=1 - message me from discord i will send you bat file
Furkan Gözükara
2025-10-06 22:42:45 +0000 UTCHi, great work. Everything worked and I managed to create a video just fine. I have two GPU's a 4060 8gb and 5060ti 16gb, yet It ignores the 16gb even though I change the REM SET CUDA_VISIBLE_DEVICES=0 as it was =1. (or set CUDA_VISIBLE_DEVICES=0) The 8gb is plugged to monitor so I use the 5060 in OVI/ComfyUI etc for rendering. What can I do to fix this?
Daniel Smith
2025-10-06 21:45:39 +0000 UTCtry 4.5 and email me cmd logs : monstermmorpg@gmail.com
Furkan Gözükara
2025-10-06 09:12:05 +0000 UTCI’ve updated to version 3 and v2 worked better. It did find my card but not building. It gets stuck.
James Charleston II
2025-10-06 02:18:04 +0000 UTCwhat is 1tgb? are you on v 4.5? how much RAM you have and what GPU
Furkan Gözükara
2025-10-05 20:52:28 +0000 UTCHello, I got everything installed and running, but running this even with the 1tgb preset it slows down my computer a lot and a few times it slowed it down so much I had to hold in the button to turn it off. I'm not sure what to really do about it.
James Woodill
2025-10-05 20:48:28 +0000 UTChi install error. please have Python 3.10.11 installed. follow this tutorial : https://youtu.be/DrhUHnYfwC0
Furkan Gözükara
2025-10-05 17:01:03 +0000 UTCI ran the installer ok on Windows 64 but when trying to run I get the error: The system cannot find the path specified. Traceback (most recent call last): File "C:\Ovi_Pro_v3\Ovi_Pro\premium.py", line 1, in import gradio as gr ModuleNotFoundError: No module named 'gradio'
Alexandre Rangel
2025-10-05 16:53:52 +0000 UTCthanks gonna check now
Furkan Gözükara
2025-10-05 15:30:45 +0000 UTCThank you for this awesome release. I'm currently testing on H100 and 60 steps (3 min generation/video). Ive maybe found a bug, no matter which video length I set, the videos are always 5 seconds long, even if I set 10 seconds or any other number
FalconBravery
2025-10-05 15:27:34 +0000 UTCi think you need to change prompt. did you test example prompts in example tab? they are perfectly animated with default presets.
Furkan Gözükara
2025-10-05 10:30:24 +0000 UTCI mostly of times has a static video with sound, which parameter i have to change to avoid it?
Fco Muñoz
2025-10-05 10:16:07 +0000 UTCJust updated app to v3.3 for you and made preset more robust. please email me logs of Windows_Install_and_Update.bat file : monstermmorpg@gmail.com and try again
Furkan Gözükara
2025-10-05 09:14:12 +0000 UTChow much RAM VRAM you have and what preset?
Furkan Gözükara
2025-10-05 08:40:16 +0000 UTCthis one easy to solve. set your virtual disk to 100 GB : https://www.windowscentral.com/software-apps/windows-11/how-to-manage-virtual-memory-on-windows-11 and try again. i will try to add fp8 scaled loading today that will reduce needed RAM
Furkan Gözükara
2025-10-05 08:39:25 +0000 UTCupdated to v3, rtx 4060 8gb gpu and 64gb ram, got this error [OK] OviFusionEngine initialized successfully (models will load on first generation) [GENERATION 1/1] Starting with seed: 99 ================================================================================ STEP 1/2: Loading T5 text encoder FIRST to minimize RAM usage ================================================================================ ================================================================================ Loading OVI models for first generation... Block Swap: 29 blocks CPU Offload: True ================================================================================ Initial VRAM: 0.00 GB Removing weight norm... ================================================================================ T5 CPU-ONLY MODE: Loading T5 on CPU for CPU inference This saves VRAM but text encoding will be slower ================================================================================ [T5 CPU-ONLY MODE] Loading T5 text encoder on CPU for CPU inference [T5 CPU-ONLY MODE] This saves VRAM but encoding will be slower [T5 LOAD][BF16] Encoder structure created in 33.61s [T5 LOAD][BF16] Weights loaded in 6.59s (total 40.20s) [T5 CPU-ONLY MODE] T5 encoder ready on CPU ================================================================================ T5 loaded. Fusion model will load AFTER text encoding. ================================================================================ STEP 2/2: Encoding text and optionally deleting T5 before loading fusion model ================================================================================ Encoding text prompts... Text embeddings encoded on CPU and moved to GPU Keeping T5 on CPU (already in CPU-only mode) ================================================================================ STEP 3/3: Loading fusion model (T5 already deleted if enabled) ================================================================================ ================================================================================ Loading OVI models for first generation... Block Swap: 29 blocks CPU Offload: True ================================================================================ Initial VRAM: 0.00 GB Step 1/6: Creating model structure on meta device... Score model (Fusion) all parameters:11660753108 Step 2/6: Loading checkpoint weights to CPU... Error during video generation: The paging file is too small for this operation to complete. (os error 1455) [SINGLE-GEN] Failed - no result returned [SUBPROCESS] Generation failed with return code: 1 [GENERATION 1/1] No output file found
b
2025-10-05 06:51:02 +0000 UTCThanks for that! Few thing to mention: (I used the update batch file to update to 3.0. But I'll try to delete venv and hit it again just to make sure the update really happened) - Cancelling doesn't work. - Saving new preset also not working Traceback (most recent call last): File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\queueing.py", line 759, in process_events response = await route_utils.call_process_api( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\route_utils.py", line 354, in call_process_api output = await app.get_blocks().process_api( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\blocks.py", line 2112, in process_api inputs = await self.preprocess_data( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\blocks.py", line 1774, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\components\dropdown.py", line 206, in preprocess raise Error( gradio.exceptions.Error: "Value: True is not in the list of choices: ['BlahBlah']" Traceback (most recent call last): File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\queueing.py", line 759, in process_events response = await route_utils.call_process_api( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\route_utils.py", line 354, in call_process_api output = await app.get_blocks().process_api( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\blocks.py", line 2112, in process_api inputs = await self.preprocess_data( File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\blocks.py", line 1774, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "P:\AI\Ovi_Pro_v1\Ovi_Pro\venv\lib\site-packages\gradio\components\dropdown.py", line 206, in preprocess raise Error( gradio.exceptions.Error: "Value: True is not in the list of choices: ['BlahBlah']"
Richard Nagy
2025-10-05 06:09:16 +0000 UTCim getting this error with the 3.1 - Initial VRAM: 0.02 GB Step 1/6: Creating model structure on meta device... Score model (Fusion) all parameters:11660753108 Step 2/6: Loading checkpoint weights to CPU... [SUBPROCESS] Generation failed with return code: 3221225477 [GENERATION 1/1] No output file found
Agino Terra
2025-10-05 04:18:46 +0000 UTCLet me check if broken
Furkan Gözükara
2025-10-05 00:05:14 +0000 UTCIt won’t run in text to video mode for me. I have to start with an image or nothing happens.
James Charleston II
2025-10-05 00:03:05 +0000 UTCI don't know sadly. I dont have AMD card to test.
Furkan Gözükara
2025-10-04 22:09:21 +0000 UTCI don't know both of them. I am working on improving the VAE for lower VRAM gpus but i will check them later.
Furkan Gözükara
2025-10-04 22:09:10 +0000 UTCHi Furkan. Thanks for this fantastic work. I tried generating videos in Italian, but the audio is horrible. Is it possible to improve this language? Can videos be generated that are at least 10 seconds long?
michele carlone
2025-10-04 22:02:20 +0000 UTCrun it with amd gpu´s?
Alexander Hempel
2025-10-04 21:18:26 +0000 UTCyes it works perfect. you can use default config
Furkan Gözükara
2025-10-04 21:07:53 +0000 UTCWill this work with 4090?
guni
2025-10-04 19:45:04 +0000 UTC5090 really fast but i dont know 3080 ti. give it a try. also making improvements. you can also generate in 20 steps to speed up
Furkan Gözükara
2025-10-04 18:51:56 +0000 UTCWhat are the generation time? For example on rtx 3080ti 16gb VRAM, 32 gb RAM?
ranjeet
2025-10-04 17:06:51 +0000 UTCyou are welcome thanks for comment
Furkan Gözükara
2025-10-04 14:35:02 +0000 UTCAmazing work, really. Thank you so much :)
Damjan Žakelj
2025-10-04 13:50:12 +0000 UTCUpdate to latest version and try block swap 12
Furkan Gözükara
2025-10-04 13:11:22 +0000 UTC[GENERATE_VIDEO] Called with clear_all=False, num_generations=1 ================================================================================ INITIALIZING OVI FUSION ENGINE IN MAIN PROCESS Block Swap: 8 blocks (0 = disabled) CPU Offload: True Image Generation: False No Block Prep: False Note: Models will be loaded in main process (Clear All Memory disabled) ================================================================================ [OK] OviFusionEngine initialized successfully (models will load on first generation) [GENERATION 1/1] Starting with seed: 99 ================================================================================ STEP 1/2: Loading T5 text encoder FIRST to minimize RAM usage ================================================================================ ================================================================================ Loading OVI models for first generation... Block Swap: 8 blocks CPU Offload: True ================================================================================ Initial VRAM: 0.00 GB Removing weight norm... Loading T5 text encoder directly to GPU (BEFORE fusion model to save RAM)... Error during video generation: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 18.23 GiB is allocated by PyTorch, and 361.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [SINGLE-GEN] Failed - no result returned [SUBPROCESS] Generation failed with return code: 1 [GENERATION 1/1] Failed in subprocess
REGINALDO BARBOSA
2025-10-04 12:58:41 +0000 UTCfixed with v2.5 just run installer again and make sure Clear All Memory enabled
Furkan Gözükara
2025-10-04 12:15:28 +0000 UTCfixed with v2.5 just run installer again and make sure Clear All Memory enabled
Furkan Gözükara
2025-10-04 12:14:57 +0000 UTCHi Furkan, thanks a lot for your great work on Ovi Pro Fusion! I found a bug on my RTX 4090: With block swap = 12 and CPU offload = true, the first generation works perfectly. But the second generation with the same settings always fails. Error message: RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same It looks like the patch_embedding layer stays on CPU after the first run, while the inputs are already on CUDA. Without block swap/offload it doesn’t run well on 24 GB, so this is important. Restarting the app always fixes it for one run. Maybe the block swap step needs to re-sync that layer on every run. Thanks again for this amazing project!
macmotu
2025-10-04 11:09:53 +0000 UTCThanks for your amazing work, dear Furkan. Unfortunately, it did not work for me, I will wait for the next version. Here is the error I got, in case you're interested: "INFERENCE STARTING - VRAM: 16.58 GB Block swap active: 12/30 blocks on CPU ================================================================================ 2it [08:37, 258.78s/it] ERROR:root:Traceback (most recent call last): File "E:\AI\Ovi_Pro\Ovi_Pro\ovi\ovi_fusion_engine.py", line 455, in generate pred_vid_pos, pred_audio_pos = self.model( File "E:\AI\Ovi_Pro\Ovi_Pro\venv\lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "E:\AI\Ovi_Pro\Ovi_Pro\venv\lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Ovi_Pro\Ovi_Pro\ovi\modules\fusion.py", line 310, in forward vid, audio = gradient_checkpointing( File "E:\AI\Ovi_Pro\Ovi_Pro\ovi\modules\model.py", line 22, in gradient_checkpointing return module(*args, **kwargs) File "E:\AI\Ovi_Pro\Ovi_Pro\ovi\modules\fusion.py", line 223, in single_fusion_block_forward assert not torch.equal(og_audio, audio), "Audio should be changed after cross-attention!" torch.AcceleratorError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Error during video generation: cannot unpack non-iterable NoneType object"
JP LONB
2025-10-04 09:51:42 +0000 UTC8 GB memory is too low. so far i got as low as 8.2 GB. I am trying to add FP8_Scaled right now lets see if that can fix your issue
Furkan Gözükara
2025-10-04 08:53:26 +0000 UTCrtx 4060 with 8gb gpu and 64gb ram memory not working, seems like memory issue
b
2025-10-04 08:51:06 +0000 UTCthanks a lot. working to add more features today hopefully
Furkan Gözükara
2025-10-04 08:28:34 +0000 UTCYou are amazing Dr! thank you once again!!
Hipno
2025-10-04 03:05:33 +0000 UTCLooks like it's not going to work for me. Started fresh. 512X512. Max block swapping. Ran out of memory.
DanO..
2025-10-04 03:03:10 +0000 UTCJust gave it a try. Ran out of memory with RTX 3060 12GB VRAM. I lowered resolution and maxed out swapped blocks but wouldn't generate. It appears as though it leaves system memory (40GB) and VRAM maxed out after failed attempt. Needs to clean it out after failure or add some mechanism to clear out memory. Restarting to try with lower resolution and maxed block swapping.
DanO..
2025-10-04 03:02:16 +0000 UTC