dev.nlited.com

>>

Performance

<<<< prev
next >>>>

2017-11-26 22:40:23 chip Page 2071 📢 PUBLIC

Nov 26 2017

Links:

I compile and test SimpleGrid inside a VMware virtual machine because it is convenient to carry my entire development machine around on a USB thumb drive and it also make debugging device drivers easier. Running inside a VM is just like running natively 99% of the time, but I need to be careful about that odd 1% -- especially when working with graphics. The VM has to emulate a physical GPU and there are differences.

I have already run into problems with Direct2D where I need to specify a SOFTWARE device or some effects will crash. I am running into some strange behavior with Direct3D and Direct2D interop which may be caused by the VM.

Graphics performance in the VM is dramatically worse than running natively, even when compared to the mid- to low- performance of the integrated Intel graphics on my system. I took some time to measure the performance of SimpleGrid running inside the VM to running it natively.

My System

I am running on a Skull Canyon NUC.

CPU 2.6GHz Skylake Core i7-6770HQ quad-core (boost up to 3.5GHz)
GPU Integrated Iris Pro 580 with 128MB eDRAM
RAM 32GB DDR4 2133MHz

Prep Work

I need to create some scaffolding before I can easily measure the performance with different parameters.

Command Line

I need to specify the number of vertices in the grid from the command line, which means adding some minimal text parsing.


Main.cpp command line: /*************************************************************************/ /** Command line **/ /*************************************************************************/ #define STRSIZE(S) (sizeof(S)/sizeof(S[0]) - 1) static const WCHAR *SkipWhitespace(const WCHAR *Str); static WCHAR *GetWord(const WCHAR *&Str, WCHAR *pDst, UINT DstSz); int GetInt(const WCHAR *&Str); double GetFloat(const WCHAR *&Str); static void ParseOptShort(const WCHAR *&Str); static void ParseOptLong(const WCHAR *&Str); static void ParseArg(const WCHAR *&Str); static void ParseArgVertex(const WCHAR *Str); enum ARG_TYPE { ARG_NONE, ARG_VERTEX }; static ARG_TYPE ArgType= ARG_NONE; void MainParseCmdLine(void) { const WCHAR *pCmd= GetCommandLineW(); GetWord(pCmd,0,0); //Skip over argv[0] while(*pCmd) { if(*pCmd=='-') { pCmd++; if(*pCmd=='-') { pCmd++; ParseOptLong(pCmd); } else { ParseOptShort(pCmd); } } else { ParseArg(pCmd); } } } void ParseOptShort(const WCHAR *&Str) { WCHAR text[40]; GetWord(Str,text,STRSIZE(text)); } void ParseOptLong(const WCHAR *&Str) { WCHAR text[40]; GetWord(Str,text,STRSIZE(text)); if(wcsicmp(L"vertex",text)==0) { ArgType= ARG_VERTEX; } } void ParseArg(const WCHAR *&Str) { WCHAR text[260]; GetWord(Str,text,STRSIZE(text)); switch(ArgType) { case ARG_VERTEX: ParseArgVertex(text); break; } } void ParseArgVertex(const WCHAR *Str) { UINT vertexCt= GetInt(Str); if(vertexCt>10) g_game->SetVertexCt(vertexCt); ArgType= ARG_NONE; } static const WCHAR *SkipWhitespace(const WCHAR *Str) { while(*Str==' ' || *Str=='\t') Str++; return(Str); } static WCHAR *GetWord(const WCHAR *&Str, WCHAR *pDst, UINT DstSz) { UINT nDst= 0; WCHAR isQuote= 0; for(;*Str;Str++) { if(!isQuote && (*Str=='\'' || *Str=='\"')) { isQuote= *Str; } else if(isQuote && *Str==isQuote) { isQuote= 0; } else if(!isQuote && (*Str==' ' || *Str=='\t' || *Str=='\r' || *Str=='\n')) { break; } else if(nDst<DstSz) { pDst[nDst++]= *Str; } } Str= SkipWhitespace(Str); if(nDst<DstSz) pDst[nDst]= 0; return(pDst); } static int GetInt(const WCHAR *&Str) { int val= 0; int sign= 1; Str= SkipWhitespace(Str); if(*Str=='+') { Str++; } else if(*Str=='-') { sign= -1; Str++; } while(*Str>='0' && *Str<='9') { val= val*10 + (*Str++ - '0'); } return(val); } static double GetFloat(const WCHAR *&Str) { bool isNeg= false; UINT Integer= 0; UINT Fraction= 0; int FracDigits= 0; int Exponent= 0; Str= SkipWhitespace(Str); if(*Str=='+') { Str++; } else if(*Str=='-') { isNeg= true; Str++; } // Extract integer part. while(*Str>='0' && *Str<='9') Integer= Integer*10 + (*Str++ - '0'); // Extract fractional part if(*Str=='.') { Str++; while(*Str>='0' && *Str<='9') { Fraction= Fraction*10 + (*Str++ - '0'); FracDigits++; } } // Extract exponent if(*Str=='E' || *Str=='e') { bool isNegExp= false; Str++; if(*Str=='+') { Str++; } else if(*Str=='-') { isNegExp= true; Str++; } while(*Str>='0' && *Str<='9') Exponent= Exponent*10 + (*Str++ - '0'); Exponent= isNegExp ? -Exponent:Exponent; } // Assemble integer, fraction, and exponent. double val= (double)Integer; if(FracDigits) val+= (double)Fraction*pow(10.0f,-FracDigits); if(Exponent) val= val*pow(10.0f,Exponent); if(isNeg) val= -val; Str= SkipWhitespace(Str); return(val); }

Setting the vertex count in Game.cpp:

Game.cpp: void Game::SetVertexCt(UINT vertexCt) { tileCt= (UINT)sqrt(vertexCt); tileSz= gridSz/(float)tileCt; }

I need to collect the performance statistics in the render loop. I want to record the time required for every frame and report the instantaneous frame rate for each frame. A simple averaging of the frame count for every x seconds loses too much information, it is better to keep a running FIFO of the high-resolution timestamps for each frame and use that to determine the frame rate.

This requires two records, a timestamp FIFO and a frame rate FIFO, both constructed from a deque. From this I can calculate the frame rate by simply taking the number of items in the timestamp FIFO divided by the expiration period. This frame rate is then stored in the frame rate FIFO to be used to draw the graph.

Game frame rates: class Game { private: void RenderTime(void); LARGE_INTEGER clkFreq; std::deque<LARGE_INTEGER> vecFrameTS; std::deque<INT16> vecFrameRate; UINT16 frameRate; //Instantaneous frame rate };
// Record the instantaneous frame rate for the last 10 seconds. // I want the actual instantaneous frame rate, not an average. // vecFrameTS is a FIFO of the high-resolution timestamps for each frame. // Drop all timestamps that have expired (older than 10 seconds) // The instantaneous framerate is the number of timestamps in the FIFO. // The framerate history for the last 1000 frames is stored in vecFrameRate. void Game::RenderTime(void) { LARGE_INTEGER clkTick; QueryPerformanceCounter(&clkTick); vecFrameTS.push_front(clkTick); INT64 clkExpire= clkTick.QuadPart - clkFreq.QuadPart*10; while(vecFrameTS.size()>0 && vecFrameTS.back().QuadPart < clkExpire) vecFrameTS.pop_back(); frameRate= (UINT16)(vecFrameTS.size()/10); while(vecFrameRate.size() > 200) vecFrameRate.pop_back(); vecFrameRate.push_front(frameRate); }

The frame rate information is then presented as both text and a moving graph. I expanded the D3Text class to draw other 2D primitives, including lines, rectangles, and filled rectangles. This approach means I always have a record of both the historic frame rate and the precise time required for each frame.

Game::RenderText(): void Game::RenderText(void) { text.Begin(swapChain.Get()); text.SetColor(D2D1::ColorF(D2D1::ColorF::White,0.25f)); text.FillRect(D2D1::Point2F(0,0),D2D1::Point2F(600,64)); text.SetColor(D2D1::ColorF(D2D1::ColorF::Black)); text.Write(L"Frame %06llu %ufps %u (%6.2f,%6.2f,%6.2f) %3.1f %3.1f" ,frameCt,frameRate,tileCt*tileCt ,viewPt.x,viewPt.y,viewPt.z ,XMConvertToDegrees(viewVec.y),XMConvertToDegrees(viewVec.z) ); text.Rectangle(D2D1::Point2F(10.0f,40.0f),D2D1::Point2F(210.0f,60.0f)); D2D1_POINT_2F pt0,pt1; pt0= { 10.0f,60.0f-std::min((INT16)120,vecFrameRate[0])*(20.0f/120.0f) }; text.SetColor(D2D1::ColorF(D2D1::ColorF::Yellow)); for(UINT n1=1;n1<vecFrameRate.size();n1++) { pt1= { 10.0f+n1,60.0f-std::min((INT16)120,vecFrameRate[n1])*(20.0f/120.0f) }; text.Line(pt0,pt1); pt0= pt1; } text.End(); }

Direct3D simple grid

SimpleGrid Performance Testing

I am using SimpleGrid to test performance by setting the number of vertices in the grid. I use the command line to specify a target for the total number of vertices but since the grid is always square the actual number will be something smaller. (Actually int(sqrt(x))^2 ) This does not include the vertices used to draw the sun.

The relationship between vertices and polygons is roughly two vertices to each polygon. Each polygon is defined by three vertices, but two of the vertices are usually shared with the previous polygon.

VM Performance

The VM seems to be using a maximum refresh rate of ~60Hz while running natively is locked to 30Hz.

Frame Rate
Discrete DrawTriangleList
VerticesVMNativeVMNative
100643064
40,000633064
59,536543064
79,524453064
99,856393064
149,769283062
224,676203059
349,281133051
499,849103049
599,07683048
649,6367294659
749,956624FAIL59
1,000,000419FAIL59
1,500,000FAIL59
2,000,000FAIL59
3,000,000FAIL59
4,000,000FAIL59
5,000,000FAIL51
6,000,000FAIL41
7,000,000FAIL35
8,000,000FAIL32
9,000,000FAIL28

Direct3D simple grid

Wow! Even a mid-class integrated GPU is able to pump out over 20 million shaded vertices per second! I can safely draw half a million vertices per frame and sustain 30fps. The VM is nowhere near this level, topping out at 150K. The good news is that the VM is still able to render a million vertices, just at a much lower frame rate.

Amazing! Using a static vertex buffer, I can create a terrain of 9 million vertices before the native frame rate drops below 30fps! Static vertex buffers let almost all the code run on the GPU, with the CPU used only to update the shape positions and view point.

This gives me a good framework for estimating how many vertices I can manage in a game.


NOTE: Disabling the CPU optimizer drops native performance for 500,000 vertices from 30fps to 12fps! This is yet another reason why even debug builds should enable the optimizer.


GPU Queries

I can query performance statistics from the GPU using D3D11_QUERY requests.



WebV7 (C)2018 nlited | Rendered by tikope in 63.720ms | 18.217.224.165