dev.nlited.com

>>

Performance

<<<< prev
next >>>>

2017-11-26 22:40:23 chip Page 2071 📢 PUBLIC

Nov 26 2017

Links:

I compile and test SimpleGrid inside a VMware virtual machine because it is convenient to carry my entire development machine around on a USB thumb drive and it also make debugging device drivers easier. Running inside a VM is just like running natively 99% of the time, but I need to be careful about that odd 1% -- especially when working with graphics. The VM has to emulate a physical GPU and there are differences.

I have already run into problems with Direct2D where I need to specify a SOFTWARE device or some effects will crash. I am running into some strange behavior with Direct3D and Direct2D interop which may be caused by the VM.

Graphics performance in the VM is dramatically worse than running natively, even when compared to the mid- to low- performance of the integrated Intel graphics on my system. I took some time to measure the performance of SimpleGrid running inside the VM to running it natively.

My System

I am running on a Skull Canyon NUC.

CPU 2.6GHz Skylake Core i7-6770HQ quad-core (boost up to 3.5GHz)
GPU Integrated Iris Pro 580 with 128MB eDRAM
RAM 32GB DDR4 2133MHz

Prep Work

I need to create some scaffolding before I can easily measure the performance with different parameters.

Command Line

I need to specify the number of vertices in the grid from the command line, which means adding some minimal text parsing.


Main.cpp command line: 1/*************************************************************************/ 2/** Command line **/ 3/*************************************************************************/ 4#define STRSIZE(S) (sizeof(S)/sizeof(S[0]) - 1) 5 6static const WCHAR *SkipWhitespace(const WCHAR *Str); 7static WCHAR *GetWord(const WCHAR *&Str, WCHAR *pDst, UINT DstSz); 8int GetInt(const WCHAR *&Str); 9double GetFloat(const WCHAR *&Str); 10static void ParseOptShort(const WCHAR *&Str); 11static void ParseOptLong(const WCHAR *&Str); 12static void ParseArg(const WCHAR *&Str); 13static void ParseArgVertex(const WCHAR *Str); 14 15enum ARG_TYPE { ARG_NONE, ARG_VERTEX }; 16static ARG_TYPE ArgType= ARG_NONE; 17 18void MainParseCmdLine(void) { 19 const WCHAR *pCmd= GetCommandLineW(); 20 GetWord(pCmd,0,0); //Skip over argv[0] 21 while(*pCmd) { 22 if(*pCmd=='-') { 23 pCmd++; 24 if(*pCmd=='-') { 25 pCmd++; 26 ParseOptLong(pCmd); 27 } else { 28 ParseOptShort(pCmd); 29 } 30 } else { 31 ParseArg(pCmd); 32 } 33 } 34} 35 36void ParseOptShort(const WCHAR *&Str) { 37 WCHAR text[40]; 38 GetWord(Str,text,STRSIZE(text)); 39} 40 41void ParseOptLong(const WCHAR *&Str) { 42 WCHAR text[40]; 43 GetWord(Str,text,STRSIZE(text)); 44 if(wcsicmp(L"vertex",text)==0) { 45 ArgType= ARG_VERTEX; 46 } 47} 48 49void ParseArg(const WCHAR *&Str) { 50 WCHAR text[260]; 51 GetWord(Str,text,STRSIZE(text)); 52 switch(ArgType) { 53 case ARG_VERTEX: ParseArgVertex(text); break; 54 } 55} 56 57void ParseArgVertex(const WCHAR *Str) { 58 UINT vertexCt= GetInt(Str); 59 if(vertexCt>10) 60 g_game->SetVertexCt(vertexCt); 61 ArgType= ARG_NONE; 62} 63 64static const WCHAR *SkipWhitespace(const WCHAR *Str) { 65 while(*Str==' ' || *Str=='\t') 66 Str++; 67 return(Str); 68} 69 70static WCHAR *GetWord(const WCHAR *&Str, WCHAR *pDst, UINT DstSz) { 71 UINT nDst= 0; 72 WCHAR isQuote= 0; 73 for(;*Str;Str++) { 74 if(!isQuote && (*Str=='\'' || *Str=='\"')) { 75 isQuote= *Str; 76 } else if(isQuote && *Str==isQuote) { 77 isQuote= 0; 78 } else if(!isQuote && (*Str==' ' || *Str=='\t' || *Str=='\r' || *Str=='\n')) { 79 break; 80 } else if(nDst<DstSz) { 81 pDst[nDst++]= *Str; 82 } 83 } 84 Str= SkipWhitespace(Str); 85 if(nDst<DstSz) 86 pDst[nDst]= 0; 87 return(pDst); 88} 89 90static int GetInt(const WCHAR *&Str) { 91 int val= 0; 92 int sign= 1; 93 Str= SkipWhitespace(Str); 94 if(*Str=='+') { 95 Str++; 96 } else if(*Str=='-') { 97 sign= -1; 98 Str++; 99 } 100 while(*Str>='0' && *Str<='9') { 101 val= val*10 + (*Str++ - '0'); 102 } 103 return(val); 104} 105 106static double GetFloat(const WCHAR *&Str) { 107 bool isNeg= false; 108 UINT Integer= 0; 109 UINT Fraction= 0; 110 int FracDigits= 0; 111 int Exponent= 0; 112 Str= SkipWhitespace(Str); 113 if(*Str=='+') { 114 Str++; 115 } else if(*Str=='-') { 116 isNeg= true; 117 Str++; 118 } 119 // Extract integer part. 120 while(*Str>='0' && *Str<='9') 121 Integer= Integer*10 + (*Str++ - '0'); 122 // Extract fractional part 123 if(*Str=='.') { 124 Str++; 125 while(*Str>='0' && *Str<='9') { 126 Fraction= Fraction*10 + (*Str++ - '0'); 127 FracDigits++; 128 } 129 } 130 // Extract exponent 131 if(*Str=='E' || *Str=='e') { 132 bool isNegExp= false; 133 Str++; 134 if(*Str=='+') { 135 Str++; 136 } else if(*Str=='-') { 137 isNegExp= true; 138 Str++; 139 } 140 while(*Str>='0' && *Str<='9') 141 Exponent= Exponent*10 + (*Str++ - '0'); 142 Exponent= isNegExp ? -Exponent:Exponent; 143 } 144 // Assemble integer, fraction, and exponent. 145 double val= (double)Integer; 146 if(FracDigits) 147 val+= (double)Fraction*pow(10.0f,-FracDigits); 148 if(Exponent) 149 val= val*pow(10.0f,Exponent); 150 if(isNeg) 151 val= -val; 152 Str= SkipWhitespace(Str); 153 return(val); 154}

Setting the vertex count in Game.cpp:

Game.cpp: 1void Game::SetVertexCt(UINT vertexCt) { 2 tileCt= (UINT)sqrt(vertexCt); 3 tileSz= gridSz/(float)tileCt; 4}

I need to collect the performance statistics in the render loop. I want to record the time required for every frame and report the instantaneous frame rate for each frame. A simple averaging of the frame count for every x seconds loses too much information, it is better to keep a running FIFO of the high-resolution timestamps for each frame and use that to determine the frame rate.

This requires two records, a timestamp FIFO and a frame rate FIFO, both constructed from a deque. From this I can calculate the frame rate by simply taking the number of items in the timestamp FIFO divided by the expiration period. This frame rate is then stored in the frame rate FIFO to be used to draw the graph.

Game frame rates: 1class Game { 2private: 3 void RenderTime(void); 4 LARGE_INTEGER clkFreq; 5 std::deque<LARGE_INTEGER> vecFrameTS; 6 std::deque<INT16> vecFrameRate; 7 UINT16 frameRate; //Instantaneous frame rate 8}; 9
10 // Record the instantaneous frame rate for the last 10 seconds. 11 // I want the actual instantaneous frame rate, not an average. 12 // vecFrameTS is a FIFO of the high-resolution timestamps for each frame. 13 // Drop all timestamps that have expired (older than 10 seconds) 14 // The instantaneous framerate is the number of timestamps in the FIFO. 15 // The framerate history for the last 1000 frames is stored in vecFrameRate. 16void Game::RenderTime(void) { 17 LARGE_INTEGER clkTick; 18 QueryPerformanceCounter(&clkTick); 19 vecFrameTS.push_front(clkTick); 20 INT64 clkExpire= clkTick.QuadPart - clkFreq.QuadPart*10; 21 while(vecFrameTS.size()>0 && vecFrameTS.back().QuadPart < clkExpire) 22 vecFrameTS.pop_back(); 23 frameRate= (UINT16)(vecFrameTS.size()/10); 24 while(vecFrameRate.size() > 200) 25 vecFrameRate.pop_back(); 26 vecFrameRate.push_front(frameRate); 27}

The frame rate information is then presented as both text and a moving graph. I expanded the D3Text class to draw other 2D primitives, including lines, rectangles, and filled rectangles. This approach means I always have a record of both the historic frame rate and the precise time required for each frame.

Game::RenderText(): 1void Game::RenderText(void) { 2 text.Begin(swapChain.Get()); 3 text.SetColor(D2D1::ColorF(D2D1::ColorF::White,0.25f)); 4 text.FillRect(D2D1::Point2F(0,0),D2D1::Point2F(600,64)); 5 text.SetColor(D2D1::ColorF(D2D1::ColorF::Black)); 6 text.Write(L"Frame %06llu %ufps %u (%6.2f,%6.2f,%6.2f) %3.1f %3.1f" 7 ,frameCt,frameRate,tileCt*tileCt 8 ,viewPt.x,viewPt.y,viewPt.z 9 ,XMConvertToDegrees(viewVec.y),XMConvertToDegrees(viewVec.z) 10 ); 11 text.Rectangle(D2D1::Point2F(10.0f,40.0f),D2D1::Point2F(210.0f,60.0f)); 12 D2D1_POINT_2F pt0,pt1; 13 pt0= { 10.0f,60.0f-std::min((INT16)120,vecFrameRate[0])*(20.0f/120.0f) }; 14 text.SetColor(D2D1::ColorF(D2D1::ColorF::Yellow)); 15 for(UINT n1=1;n1<vecFrameRate.size();n1++) { 16 pt1= { 10.0f+n1,60.0f-std::min((INT16)120,vecFrameRate[n1])*(20.0f/120.0f) }; 17 text.Line(pt0,pt1); 18 pt0= pt1; 19 } 20 text.End(); 21}

Direct3D simple grid

SimpleGrid Performance Testing

I am using SimpleGrid to test performance by setting the number of vertices in the grid. I use the command line to specify a target for the total number of vertices but since the grid is always square the actual number will be something smaller. (Actually int(sqrt(x))^2 ) This does not include the vertices used to draw the sun.

The relationship between vertices and polygons is roughly two vertices to each polygon. Each polygon is defined by three vertices, but two of the vertices are usually shared with the previous polygon.

VM Performance

The VM seems to be using a maximum refresh rate of ~60Hz while running natively is locked to 30Hz.

Frame Rate
Discrete DrawTriangleList
VerticesVMNativeVMNative
100643064
40,000633064
59,536543064
79,524453064
99,856393064
149,769283062
224,676203059
349,281133051
499,849103049
599,07683048
649,6367294659
749,956624FAIL59
1,000,000419FAIL59
1,500,000FAIL59
2,000,000FAIL59
3,000,000FAIL59
4,000,000FAIL59
5,000,000FAIL51
6,000,000FAIL41
7,000,000FAIL35
8,000,000FAIL32
9,000,000FAIL28

Direct3D simple grid

Wow! Even a mid-class integrated GPU is able to pump out over 20 million shaded vertices per second! I can safely draw half a million vertices per frame and sustain 30fps. The VM is nowhere near this level, topping out at 150K. The good news is that the VM is still able to render a million vertices, just at a much lower frame rate.

Amazing! Using a static vertex buffer, I can create a terrain of 9 million vertices before the native frame rate drops below 30fps! Static vertex buffers let almost all the code run on the GPU, with the CPU used only to update the shape positions and view point.

This gives me a good framework for estimating how many vertices I can manage in a game.


NOTE: Disabling the CPU optimizer drops native performance for 500,000 vertices from 30fps to 12fps! This is yet another reason why even debug builds should enable the optimizer.


GPU Queries

I can query performance statistics from the GPU using D3D11_QUERY requests.



WebV7 (C)2018 nlited | Rendered by tikope in 1563.251ms | 216.73.216.63