Today's task is to use swap chains to avoid the expensive BitBlt.
I am using this MSDN article as a guide.
The original code retains these objects:
ComPtr<ID2D1DeviceContext> m_target;
ComPtr<IDXGISwapChain1> m_swapChain;
ComPtr<ID2D1Factory1> m_factory;
I have renamed these to:
m_target : pD2DC
m_swapChain : pSwapChain
m_factory : pD2DFactory
I originally dropped the use of ComPtr<> wrappers, but then decided they did provide some value by automatically calling the Release() when they dropped out of scope.
The switch to SwapChains has pervasive changes on the original code, which used an "offscreen image" and painted to the screen. The offscreen code would derive a compatible DC (hdcImg) and bitmap (hbmImg) from the onscreen DC, draw everything using hdcImg, then bitblt from hdcImg to hdcDst during the WM_PAINT message. The draw phase was quick, the expensive operation was copying the pixels from hbmImg to the onscreen device. I am not sure why this was so slow, it seems that the system should be smart enough to do this entirely inside the GPU -- but the fact that the bitblt would take 14ms tells me it was using the CPU to copy the pixels. Using D2D and SwapChains should keep everything inside the GPU, and the final "paint" operation should involve nothing more than changing a GPU pointer -- nearly instanteous.
The flow of the rendering code using hdcImg was:
CreateCompatibleDC > CreateCompatibleBitmap
Update Data
Redraw hdcImg
Invalidate hWnd
WM_PAINT
BitBlt(Pnt.hdc,hdcImg)
The flow using SwapChains will be:
Create ID2D1DeviceContext
Create SwapChain (with 2 buffers)
Run {
Update Data
BeginDraw
Redraw
EndDraw
Present (Swap buffers)
}
The WM_PAINT message is no longer used, the display will be updated
whenever there a Data change, which triggers a redraw to the back
buffer. The back buffer is presented as soon as the redraw
completes.
The Rocket project now compiles and builds using the SwapChains code.
The call to pDxFactory->CreateSwapChainForHwnd() fails with the exquisitely unhelpful error "bad parameter".
I tried to install the Direct2D Debug Layer, but the download link on the Microsoft site is broken. Fortunately, someone was kind enough to provide direct links.
I was not able to link to the debug dll, and it was only for Direct2D -- the error is in a call to the Direct3D library.
Found it. I needed to clear the SwapProp struct first.
All the initialization seemed to succeed, but nothing is visible.
Closer... The screen is now being painted once, but the WM_TIMER message doesn't seem to fire. This turned out to be an optimizer obfuscation.
OK. I now have the screen updating. The Draw times are a bit disappointing, about 25ms. This is about the same time required to perform the BitBlt, so it seems I took a long trip to nowhere. I need to do a bit more evaluation to be sure.
It appears the SwapChains method is achieving 4fps with 101 sprites. 6FPS when running natively on Pogo. Very disappointing.
My original WM_PAINT method also runs at 4fps, with a whole lot less code and simpler code.
So the question is, how does Doom3 achieve 60fps while painting a whole lot more pixels?
I think my SwapChains code is not actually using the GPU. Yes it is, it just isn't any faster than my Paint approach. Or, conversely, my Paint method is just as fast as SwapChains without a lot of the complexity.
OK, this was display error. The WM_PAINT code is actually achieving 40.8fps (60fps on Pogo) and the SwapChains is running at 41fps (60fps on Pogo). This is better, but there is still not enough of a difference between the two to justify all the trouble required to use SwapChains.
I created a "tight loop" version that spawns a thread that calls GameUpdate() and ImgPaint() in a tight loop. It maxes out at 60fps on Pogo.
The upper limit seems to be 60fps. I see 60fps when there are 420 sprites or 1.
There is a definite performance penalty to running in the VM, where the max frame rate is around 40fps.
If I turn off the "SyncInterval" flag in the call to pSwapChain->Present() I can achieve an FPS as high as 120 in the VM, although it fluctuates quite a bit. The Paint method seems to be capped at 60fps, I am assuming there is an implicit SyncInterval wait.
Running natively on Pogo, the SwapNoSync version hits a staggering 1800fps with a sustained 1300fps! This is with 100 sprites, the FPS drops in direct relation to the number of sprites. With 300 sprites I see 700fps.
Conversely, this should mean I can have many more sprites if I turn on the SyncInterval. I can run 1000 sprites at a steady 60fps, and the action is much smoother.
The Paint version is also able to hit 60fps with 1000 sprites, 2800 sprites at 58fps! 3200 sprites at 59fps at full screen! 5000 sprites, full screen, 40fps.
I now have 2 versions of Rocket: Paint and SwapChains. The Paint method creates an offscreen bitmap, draws on it, then uses BitBlt to present it in the window. SwapChains uses two buffers, one visible and the other hidden, draws to the hidden ("back") buffer, then swaps the buffers for each frame. With SyncInterval enabled, the performance of the two versions is identical. I prefer the Paint method because the code is simpler, it does not rely on Direct3D, and it is closer to the traditional GDI approach. SwapChains is better only when frames per second is much more important than being smooth -- which is never. The only other reason to use SwapChains is to have access to some of the Direct3D interfaces, such as the antialiasing functions.
This is the API for MyD2D, SwapChains method. The most significant difference is that the API uses an ID2D1DeviceContext to draw everything.
The code behind MyD2D is here: MyD2D: SwapChains Method