Jump to content

GPU Benchmark for testing Mistika Boutique's perfomance


Cristobal Bolaños
 Share

Recommended Posts

Hi everyone!

We've created an environment for testing your system performance capabilities. Basically, it contains a 4 seconds video with 80 layers of color correction created. There, you can remove layers until real-time playback is obtained or sound synchronization is achieved.

You guys can find the environment attached in this post. It is a 1920x1080 and 24 fps timeline. The video file which you have to relink can be found here

This is how the environment looks like:

unnamed.thumb.png.687e11ad28d3a976cd1153bc22314875.png

 

It would be great if you can share with us your results, describing your system specifications and the maximum playback you obtain. This way, we can share useful information for many users. 

My system is the following:

Intel(R) Xeon (R) CPU X5550 @2.67 GHz with 2 processors.
12 GB RAM
64 bits 
Quadro GV 100 graphics cards

These are my results:

- Capable of reproducing 45 color grades effects with real time playback.
- Synchronization lost at 36 color grade effects. 
- Capable of rendering 22.5 frames per seconds the 80 layers. 

Not bad!

Thanks a lot in advance for your collaboration.

Have a great day!
Cheers,
Cristóbal ?

Standard_GPU_Benchmark_For_Testing_Mistika_Boutique_SGO.env

  • Like 2
Link to comment
Share on other sites

Fantastic. Just what I was looking for as I'm testing a different GPU for my system.

One thing I'm specifically looking for in this test - a way to see if Mistika can take advantage of the full duplex loading of the textures. Do you know if this test will stress this?

Also second question - if I have multiple GPUs in the system, is there a way of telling Mistika which GPU to use? I looked in mConfig and didn't see anything. I recall a support article mentioning this, but can't find it right now. I'm thinking of leaving one 2080TI in side-by-side to the Quadro RTX 6000 so it's easier to compare the two. Same question, can I tell Mistika to use one for the UI and the other one for all processing?

I know Boutique is not multi-GPU enabled, which is fine. But can it be told which GPU to use if multiple are present?

Edited by jan1
Link to comment
Share on other sites

My system is the following:

iMac 5K 2017
4,2GHz  i7
40 GB RAM
AMD RADEON Pro 580 8gb

These are my results:

- Capable of reproducing 18 color grades effects with real time playback.
- Synchronisation lost after 18 color grade effects. 
- Capable of rendering 5.7 frames per seconds the 80 layers. 

  • Thanks 1
Link to comment
Share on other sites

On 12/11/2019 at 4:22 AM, jan1 said:

 

One thing I'm specifically looking for in this test - a way to see if Mistika can take advantage of the full duplex loading of the textures. Do you know if this test will stress this?

It will use it if available, there is nothing to configure about it. To test it you can do a "complicated" playback (for example  a Comp3D  with some layers at a resolution that is near realtime in your system  but not realtime (this is easier to evaluate in UHD  60p modes), and make sure the disk speed is not the bottleneck. Then write down how many FPS you get on each case.  Use one GPU and then the other,  and in both cases  do one playback test to   the  GUI monitor and another one to the  SDI output,. The GUI playback may be similar for both models (as in that case there is not need to download images from the GPU), while the playback to SDI board should be faster on the Quadro, as it can download images for the SDI output at the same time that next frames are being uploaded. 

Quote

Also second question - if I have multiple GPUs in the system, is there a way of telling Mistika which GPU to use? I looked in mConfig and didn't see anything. I recall a support article mentioning this, but can't find it right now. I'm thinking of leaving one 2080TI in side-by-side to the Quadro RTX 6000 so it's easier to compare the two. Same question, can I tell Mistika to use one for the UI and the other one for all processing?

I know Boutique is not multi-GPU enabled, which is fine. But can it be told which GPU to use if multiple are present?

 

Regarding the GPU selection for Mistika , an  easy way to chose the GPU  is to connect the monitor to the GPU you want. By default Mistika will use the GPU connected to the display where it is launched.

 

Another way is to  select the Boutique  icon and use the Nvidia or graphics processor applet in the control panel to define which GPU is used for that one. 

 

Alternatively use right mouse button on the Boutique icon to open the contextual menu, there should be a menu to decide the graphics processor that is used. But this method requires to do it every time.

 

Regarding the other question, Mistika can not use one GPU for GUI and one for processing directly.  But you can launch independent render jobs (rnd files) the other GPU , either by using "PathToMistika.exe/mistika.exe  -r  PathToRNDFile" command in a console running in  the other GPU ,  or by configuring a 3rd party render manager like Smedge to render jobs in the other GPU. (launch  Smedge in the other GPU by using the above techniques, or check Smedge documentation for more possible ways to configure this )

 

  • Like 1
Link to comment
Share on other sites

Here are my results:

Real-time playback with all 80 layers, no loss of sync indicated.
I added more layers to 100, still real-time. GPU load is 100%.
But I can't hear audio. Something is wrong in the routing. Can't hear audio even on the clip on its own.

System:

Puget System Build, i9 16 core CPU 3.1GHz, boost to 4GHz
128 GB RAM
1 Quadro RTX 6000 GPU

Edited by jan1
Link to comment
Share on other sites

I've created a slightly different test timeline in light of the full-duplex copy.

1st media clip: https://arriwebgate.com/directlink/b043cddb9c4658e0/1684269 (Alexa Mini LF RAW F003C106)
2nd media clip: https://arriwebgate.com/directlink/77689a9b6ef6ddc1/995144  (Alexa Mini Arri RAW M002C006)

both from https://www.arri.com/en/learn-help/learn-help-camera-system/camera-sample-footage

 

The timeline has three stacks. They use UHD or bigger footage and ACES color. In the first stack I did a combination of keyed color and blur (like Cristobal's example) and also added temporal NR. Then stacked these pairs three times.

The other two stacks have groups of the same footage to create additional textures. They're then combined via Comp3D and overlay. The second stack uses a combination of key, track and blur. The last one stacks higher but omits the blur which is more CPU focused I believe.

The first stack tests more GPU compute and also memory. It starts using more VRAM until it runs out and then swaps. Interestingly enough once it starts swapping it never releases the GPU memory. You have to exist and restart Mistika to get the better performance back.

1st stack: 6.8fps until VRAM fills up, then 2.6fps

2nd stack: 6.4fps

3rd stack: 4.8fps

On the second and third stack I start seeing up to 8% copy utilization in task manager which is what I was after for testing bus management.

This is on the system with the Quadro RTX 6000. Later today I'll install both the 2080 TI and Quadro and compare the two against each other.

Obviously an extreme scenario to stress test a system.

 

MistikaTest_1.env

Edited by jan1
  • Like 1
Link to comment
Share on other sites

I've created a somewhat similar test on Resolve, so I can compare them. They're not identical matches as it's not easy to set TNR and DeNoise to the same parameters, same as with blur. But they're principally very similar.

I've now put the 2080TI back into the system, so I can run Mistika on either one for easier comparison.

Here are the results I'm seeing on the Quadro RTX 6000

Mistika (with AJA I/O)

TNR 6.8fps peak, then 2.6fps after VRAM fills up
8 Node w/ blur at 6.4fps
12 Node w/o blur at 4.8 fps

Resolve (with DeckLink I/O):

TNR 10fps steady
8 Node w/ blur at 21fps
modified 8 Node w/ blur at 13fps

(In the first Resolve test I have a layer mixer with 8 inputs. I believe Resolve realizes it's the same source media, so for the modified test I created 8 separate copies of the clip and layered them on the timline with overlay mode in the timeline, not a layer mixer to approximate what we see in Mistika)

 

After forcing both Resolve and Mistika to use the RTX 2080TI instead of the Quadro RTX 6000 I did not find any noticeable differences in render speeds. I verified utilization via task manager to make sure the right GPU would be used.

Where does that leave me? With a few questions....

1) It seems that while the full-duplex bus management is in theory an advantage, it doesn't seem to be a frequent bottleneck. At least I have not been able to construct a test that can demonstrate it.

2) While the two tests are an approximate pair due to tool differences, it does appear Resolve has much more efficient playback as it outperformed Mistika by 2 to 3x.

Of course this is not a very exhaustive test, as I was looking for one specific aspect, which hasn't born out yet. At present the extra cost of the Quadro card other than long-term stability doesn't bear out in performance differences.

 

To be continued....

Link to comment
Share on other sites

Hi Jan,

 

Depending on what you are doing Mistika will not return graphics memory buffers  (and neither buffers in RAM). This is made on purpose: In order to avoid memory fragmentation Mistika has its own memory manager , it will keep memory buffers and only reuse them with images of the same size.

 

The collateral effect is that if your timeline contains images of different sizes or different  color spaces it will use more memory, but it is because the  performance is the priority by default.  If you need to reduce memory usage you can reduce related  parameters in mConfig (Max Cache memory,  ring buffer, pipe units, render units, ..). Alternatively restart mistika session from time to time depending on the content 

Also please  keep an eye on task manager  about memory usage during playbacks . The memory use (both GPU and ram)  should only increase between clips with different needs, not between frames of a same clip (in that case we would be talking about a potential  memory leak / software bug)

Please let us know the differences after testing with the other board. Your tests are very interesting.

 

 

Link to comment
Share on other sites

I finally did find a difference with the Quadro card. It's actually not so much with the heavy duty benchmark tests, but with just plain playback of footage. I think the heavy duty benchmark tests get bogged down in the processing, that copy bandwidth is not the bottle neck.

But taking just the plain 8K 60fps RED clip (the shark), I get a distinct 10fps advantage in playback rate on the Quadro card compared to the 2080. I should caveat that this with the new CUDA RED debayer, so the GPU does the heavy lifting for the RED footage, which presumably adds extra I/O overhead to the GPU.

Putting this clip on an 8K 60fps timeline, and configuring output via AJA to UHD 60fps, I get the following results:

(without I/O)

RTX 6000: 53-60fps for RED clip as is
RTX 6000: 54-60fps for RED transcoded into 8K 60fps ProRes HQ

2080TI: 42-43fps for RED clip as is
2080TI: 36-45fps for RED clip transocded 

(with AJA I/O)

RTX 6000 - same numbers
2080TI: 46-48fps
2080TI: 25fps for RED clip transcoded, this is a curious result, not sure yet 

So the Quadro card can get a leg up if you have super high res / super fast footage and are doing RED decode. 

 

Link to comment
Share on other sites

On 12/12/2019 at 7:13 AM, Javier Moreno said:

Also please  keep an eye on task manager  about memory usage during playbacks . The memory use (both GPU and ram)  should only increase between clips with different needs, not between frames of a same clip (in that case we would be talking about a potential  memory leak / software bug)

That makes sense about not returning memory and using internal memory management. Keep in mind that this totally find on a dedicated system, but could post problems in case of Boutique where other apps may also be running. That means I'll have to quit Boutique instead of switching to Premiere or other software that may make extensive use of the GPU. At times as I conform I do keep multiple applications open.

In this scenario there may be in fact a memory leak or software bug. 

This is running the first clip in my new test. It starts playing at 7fps the moment you see the VRAM start ramping up. Once it hits the max at 24GB and flattens out, it continues to run, but the playback drops to 2-3fps (all within a single clip). Presumably Mistika cannot find any new memory and has to start swapping memory out. Maybe a case where buffers do not get re-used properly. Render speed within the same clip should remain constant absence anything changing, right?

image.png.d0b7247a8161e96826407826a8231ded.png

Edited by jan1
Link to comment
Share on other sites

In all those examples the bottleneck is the effect processing,  not the upload and download of images to/from GPU. So  the full duplex can not help on them  because at such low FPS speeds the PCie bus  is infrautilized anyway.  To take advantage of it you would need to build an example that is close to realtime, for example a more simpler stack that can deliver speeds  close to UHD 60p. In that scenery is where having  the Quadro can make the difference between realtime and non realtime.

Also, if your storage is fast enough I  would recommend to use  uncompressed images like Mistika js format , exr raw or dpx. This is to avoid dependence on 3rd party codecs like Arri in a benchmark that  is supposed to be  about GPUs.

Apart from this, one difference that should already be noticeable in your exanple is the advantage of the  double amount of GPU memory in the Quadro, as you said that using all of it.

 

Even with that the GeForce has a much better performance / price ratio and it is  more than enough  in may cases,  there is no doubt about it. 

 

Link to comment
Share on other sites

On 12/12/2019 at 7:32 AM, Javier Moreno said:

Also, if your storage is fast enough I  would recommend to use  uncompressed images like Mistika js format , exr raw or dpx. This is to avoid dependence on 3rd party codecs like Arri in a benchmark that  is supposed to be  about GPUs.

I will experiment with that. My main drive is a 1TB .M2 SSD and I've finding 2.4GB/s transfer speeds. 

On 12/12/2019 at 7:32 AM, Javier Moreno said:

Even with that the GeForce has a much better performance / price ratio and it is  more than enough  in may cases,  there is no doubt about it. 

I'm in agreement on that. The difference exists, but it requires extreme conditions to surface. Given the price difference it is a luxury but not a necessity to run with the Quadro card. I may keep it for the extra memory more than anything. If I were to start from scratch, the Titan RTX is probable the sweet spot with 24GB VRAM but a better price.

Link to comment
Share on other sites

On 12/12/2019 at 12:30 PM, jan1 said:

That means I'll have to quit Boutique instead of switching to Premiere or other software that may make extensive use of the GPU. At times as I conform I do keep multiple applications open.

In a case like this you should reduce the memory related parameters ( Max cache memory, ring buffer, pipe units, render units, encode threads ).  That will free a lot of memory for other applications. 

 

On 12/12/2019 at 12:30 PM, jan1 said:

In this scenario there may be in fact a memory leak or software bug. 

 

image.png.d0b7247a8161e96826407826a8231ded.png

I totally agree, that memory curve is typical of memory leaks.  Otherwise it should look like a staircae . What is the exact  date of your Boutique version? 

First thing I would recommend to render the source images to other format and test , as this test will show  if the memory leak comes from the decoder or from one of  the effects (in that case you would need to remove effects  one by one to see which one is causing it)

Please if you don't mind  open a support case about the memory leak so developers can investigate it,  in that way it is much more efficient for them 

 

Link to comment
Share on other sites

On 12/12/2019 at 7:45 AM, Javier Moreno said:

I totally agree, that memory curve is typical of memory leaks.  Otherwise it should look like a staircae . What is the exact  date of your Boutique version? 

This is the latest release version 8.10.0. I will file a ticket and provide the devs the details.

Thanks for all your insight and help in testing this.

Link to comment
Share on other sites

On 12/12/2019 at 12:40 PM, jan1 said:

 The difference exists, but it requires extreme conditions to surface. 

Well, it also depends on each client. The one case where Quadro is mostly preferred is  for client attended sessions, as  in those cases it is very desirable to have as much realtime playback performance and stability as possible.  

  • Like 1
Link to comment
Share on other sites

On 12/12/2019 at 12:20 PM, jan1 said:

But taking just the plain 8K 60fps RED clip (the shark), I get a distinct 10fps advantage in playback rate on the Quadro card compared to the 2080. 

That's an interesting test, thanks for the feedback.

BTW for that particular format,  a key parameter is the Codecs-GPU frames. In general it  should be 1 for GeForce and 2 for the Quadro. (you could also try with 3, but I am afraid it could be a waste of memory) 

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.