I have 6 machines I can render on: one has a GTX1080 8GB graphics card, one a GTX1660 super 6GB, three have GTX1060 6GB and one a GTX970 4GB graphics card (see then note at the end about this card - it's really only 3.5GB!).
I was rendering a scene with Farmerjoe using all six and I noticed a few of PCs were taking a lot longer than the others. Normal render time for a GTX1060 was 10 seconds but these were taking well over a minute. The 1060s were doing it at certain points in the animation but the 970 was doing it all the time.
The scene I was rendering had suddenly got bigger, I had duplicated everything in it twice (I did have reasons) and now I was having the problems. The thing was, nothing in the scene had very big geometry. I reduced a few textures that I had unwittingly made too large. But no good. As I had duplicated the scene by duplicating collections I could turn them off. And that fixed the problem. But I was interested in what was contributing the most to the problem and from what i read Graphics card memory (or the lack of) was probably the problem.
There are not really any good tools for looking at how memory is allocated in graphics cards and the only tools you have are the TaskManager in Windows (GPU view) and the Linux command
watch -n 2 nvidia-smi
where I am guessing the number is how often it polls the graphics card. I set it to 1.
The 'VRAM' number presented in the Status Bar of Blender is pretty useless because the graphics card memory only fills up when you actually render. The render window offers a 'Peak Mem' but this is system memory so not useful either.
TaskManager on windows disables screen shots so here's my phone camera. First image is blender open but not rendering.
Second image is with Blender rendering
This is an 8GB card and as you can see a 4GB card would be struggling here.
My GTX970 4GB is in a linux box so I can't use taskmanager.
Using my Nvidia-smi command for non rendering I get this
Not much graphics memory.
Then next we have with a scene about as big at the card can render. Notice graphics memory is up to 3120 MBs and power is 139watts. The card is working hard and the scene renders quickly
Finally if I increase the size of the scene we can see that the graphics card is not really used. It must switch to using the CPU. GPU memory has dropped considerable and its using 80watts. This is compared to 13watts when idle, so it's doing something, but not as much and the render time is maybe 8 times longer.
So what to do?
After a lot of playing about removing geometry I finally realised it was lights that were causing the problem.
For this blog I had a scene that rendered within the 4GB limit: 3315MBs, I removed 4 point lights and the card used 2928MBs. So roughly 100MBs per light. Checking my scene that wouldn't render I had 36 lights so if this calculation is true it is not suprising; at least 3.6GB for lighting alone.
The highest memory usage I got from the 970 4GB card by switching lights
on and off was 3411MB so it looks like it has a 500MB safety buffer.
Total memory on the card reads as 4024MB. [UPDATE: and I was right. Read this article as it explains why it's only 3.5GB! https://wccftech.com/nvidia-geforce-gtx-970-memory-issue-fully-explained/ )
Switching
lights on for render but setting them to 0watts has the same effect as disabling them. Turning them back on even to just 100mW
causes the memory problem to come back.
You can't animate the 'disable in renders' button. So animating the lamp to 0w when your away from it looks like the best option. And if you link copy a group of lamps local to each other you can animate them in one go.
Nice to know I don't have to restrict my geometry, I just need to be more clever about lighting and my project is back on track.
Hope this was useful. L