I no longer need pbmSrc1 and pbmSrc2. The balls will be drawn directly
onto pbmImg. I need a new composite bitmap that will serve as the fog layer,
which will be the same size as pbmImg.
Fog layer:
class PxlShader {
ID2D1Bitmap1 *pbmImg; // Final offscreen image (write-only)
ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap
int PxlShader::DrawCreate(void) {
int Err= ERR_OK;
D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) };
} else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context.");
} else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap.");
} else if(!pbmFog && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmFog))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog bitmap.");
Draw Sequence
Drawing the scene is now complex enough to break up into steps:
Reset the scene: Clear pbmImg to black, pbmFog to gray.
Draw the grid: This gives me a fixed visual frame of reference.
Draw the bouncing balls: The balls are drawn to pbmImg.
Draw the fog: Obscure pbmImg by painting the fog bitmap onto it.
I don't have a working fog effect yet, so the line that paints the
fog is currently commented out. This let's me verify that the balls are
bouncing properly.
Draw Sequence:
int PxlShader::DrawUpdate(void) {
int Err= ERR_OK;
if(!pdcDraw || !pbmImg || !pbmFog) {
Err= Warn(ERR_NOT_CREATED,"PxlShader:DrawUpdate: No resources.");
} else {
if(!SUCCEEDED(WinErr= pdcDraw->EndDraw())) {
Err= Error(ERR_DIRECTX,"PxlShader:DrawUpdate: EndDraw() failed. [%X]",WinErr);
DoReset= true;
int PxlShader::DrawClear(void) {
int Err= ERR_OK;
int PxlShader::DrawGrid(void) {
int Err= ERR_OK;
D2Brush brGrid(pdcDraw,ColorF(ColorF::Aquamarine));
for(int n1=0;n1<rWnd.right;n1+= 50)
for(int n1=0;n1<rWnd.bottom;n1+=50)
int PxlShader::DrawBalls(void) {
int Err= ERR_OK;
for(UINT n1=0;n1<BALL_MAX;n1++) {
int PxlShader::DrawFog(void) {
int Err= ERR_OK;
My balls are a-bouncing. Time to figure out how this Fog of War (FoW) Effect
is going to work.
The Fog Effect
My strategy is to clear pbmFog to solid gray at the beginning of
each draw sequence, then every time I draw an object (a bouncing ball)
I will clear a corresponding hole in the fog. When I paint the fog
bitmap onto pbmImg as the final step of the draw sequence, the holes
should let me see the area around the bouncing balls and obscure
everything else.
The holes in the fog are created by reducing the alpha channel, leaving
the RGB channels unchanged. This should create a gauzy fog that becomes
clearer (more transparent) with repeated clearings.
The pixel operation I want to perform is taking to decimate the alpha
channel of pbmFog within a circle. I'm not sure if a pixel shader can
use the same texture as both an input and an output. The fog effect will
be easy if I can do something like this:
Then I can do everything in the shader with a single source.
Let's find out...
Fog Effect V1
Clone MyEffect into FogEffect.cpp. FogEffect will have only one input,
so I can strip out most of the MapRects code.
Ball::Draw() will now return a point and PxlShader will clear the fog
around that point.
class Ball {
int Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg, POINT2D &ptClear);
class PxlShader {
int CreateFog(void);
int ClearFog(POINT2D ptCenter, float Radius);
ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap
ID2D1Effect *pFog; // Fog effect
int Ball::Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg, POINT2D &ptClear) {
int Err= ERR_OK;
D2D1_ELLIPSE dot= { ptNow, Radius,Radius };
ptClear= ptNow;
int PxlShader::DrawBalls(void) {
int Err= ERR_OK;
for(UINT n1=0;n1<BALL_MAX;n1++) {
POINT2D ptClear;
if(Balls[n1].Draw(pdcDraw,pbmImg,ptClear)>0) {
int PxlShader::ClearFog(POINT2D ptCenter, float Radius) {
int Err= ERR_OK;
if(pFog && pbmFog) {
D2D1_VECTOR_2F vecCenter= { ptCenter.x,ptCenter.y };
int PxlShader::DrawFog(void) {
int Err= ERR_OK;
After about two hours, I have my answer: NO. 0x88990025: Cannot draw with a bitmap that is currently bound as the target bitmap.
A crude work-around is to create a copy of pbmFog, draw the effect
onto the copy, then replace the original with the copy.
The problem with this approach is that when I try to replace the
original using pdcDraw->DrawBitmap(pbmFog2), the holes are not copied
because the alpha is zero! I need a version of DrawBitmap() that is a
verbatim copy, not an alpha blend.
I tried pdcDraw->DrawImage(D2D1_COMPOSITE_MODE_SOURCE_COPY) without success.
A minor, overlooked mistake sent me off on a wild goose chase for
almost two days. The black ring around the fog bubbles was caused by a
misconfigured gradient -- the outer edge of the gradient was being
drawn as opaque black when it should have been transparent red.
Everything below this point is based on that error and is
wrong. My "accumulation" theory was an attempt to explain that
black ring, but was based on faulty assumptions and eventually turned
out to be utter hogwash. I am preserving it because figuring out how
to manipulate the pixels on the cpu might be useful at some point. To
skip to the actual solution, go to FoW Redux - Return of the
The other problem is that I need pbmFog to accumulate holes.
This is a problem because the actual pixel updates do not occur until
EndDraw(), which means all the ClearFog() operations until then are
copying from the original solid pbmFog.
I am beginning to wonder if I need to keep pbmFog in main memory
and use the cpu to clear the holes. That would be disappointing!
SUCCESS! It took about five hours of experimenting, but I
finally have a solution. The PxlShader project really helped.
I kept running into the same roadblock: I wanted to modify and copy
the alpha channel, not use it as a blend operator. This turned out to
be the crux of why Fog was being so elusive. Then I had an epiphany:
My pixel shader is performing the blend operation -- there is no
reason why I had to use the alpha channel as the multiplier, I
could use any channel I wanted! If I used blue to indicate
transparency in the fog, all I needed was to have a final operation
that would composite the fog and the intermediate image using blue
instead of alpha. Then all my "can't copy alpha" problems disappear,
since I am now using blue.
So the draw sequence looks like this:
Create pbmImg as the base bitmap.
Create pbmField as a composite bitmap.
Create pbmFog as a composite bitmap.
Clear pbmField and pbmFog to solid black.
Clear pbmImg to solid fog color. (DarkGray)
Draw the grid onto pbmField.
For each bouncing ball:
Draw the object onto pbmField.
Draw a corresponding SOLID BLUE circle onto pbmFog.
Draw the fog layer:
Set Fog input 0 to pbmFog
Set Fog input 1 to pbmField
Set pdcDraw->Target(pbmImg)
Run the pixel shader: pdcDraw->DrawImage(pFog)
My pixel shader blends pbmFog and pbmField onto pbmImg by
copying the pbmFog.blue into pbmImg.alpha, then setting
the pbmImg.rgb color using the formula
Pixel.rgb= FogColor.rgb*Pixel.a + pbmField.rgb*pbmFog.blue; This effectively sets the output pixel to the pbmField pixel only
where pbmFog is blue.
And now I have my Fog of War:
The solution was not what I expected. I suspect there is a standard
Direct2D color matrix operation that could be used to move blue into
alpha, which would enable FoW without requiring a custom pixel shader.
But having pixel shaders in my toolbox makes it easier and I can control
every aspect of the operation.
This is the PxlShader code to create the fog resources: pbmField,
pbmFog, and pFog. Everything is drawn with pdcDraw.
PxlShader Create:
class PxlShader {
ID2D1DeviceContext *pdcDraw; // The workhorse, used to draw everything.
ID2D1Bitmap1 *pbmImg; // Final offscreen image (write-only)
ID2D1Bitmap1 *pbmField; // Composite Field bitmap
ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap
ID2D1Effect *pFog; // Fog effect
int PxlShader::DrawCreate(void) {
int Err= ERR_OK;
D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) };
if(!pDXGIDevice && IsErr(DrawCreateDX())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create DXGI device.");
} else if(!pSwapChain && IsErr(Err= DrawCreateSwapChain())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create SwapChain.");
} else if(!pDXGISurface && !SUCCEEDED(WinErr= pSwapChain->GetBuffer(0,IID_PPV_ARGS(&pDXGISurface)))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to retrieve DXGI surface. [%X]",WinErr);
} else if(!pD2Factory && IsErr(Err= DrawCreateD2Factory())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D factory.");
} else if(!pD2Device && !SUCCEEDED(WinErr= pD2Factory->CreateDevice(pDXGIDevice,&pD2Device))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D device.");
} else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context.");
} else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap.");
} else if(!pbmField && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmField))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Field bitmap.");
} else if(!pbmFog && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmFog))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog bitmap.");
} else if(!pFog && IsErr(Err= CreateFog(pdcDraw))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog effect.");
int PxlShader::CreateFog(ID2D1DeviceContext *pdcDst) {
int Err= ERR_OK;
if(IsErr(Err= FogEffectRegister(pD2Factory))) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to register FogEffect.");
} else if(!SUCCEEDED(WinErr= pdcDst->CreateEffect(CLSID_FogEffect,&pFog))) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create FogEffect. [%X]",WinErr);
} else {
Print(PRINT_INFO,"PxlShader:CreateFog: OK");
void PxlShader::ReleaseEverything(void) {
DoReset= false;
This is the draw code. Note there is only a single BeginDraw() and EndDraw().
DrawClear() resets the bitmaps. pbmField and pbmFog are cleared to
solid black. pbmImg is cleared to solid fog. The final operation to
draw the fog layer actually draws the fog everywhere not obscured,
leaving the original fog color.
int PxlShader::DrawClear(void) {
int Err= ERR_OK;
The grid and bouncing balls (essentially everything) is drawn to
pbmField. I cannot draw to pbmImg because I will need to copy the
field image later.
int PxlShader::DrawGrid(void) {
int Err= ERR_OK;
D2Brush brGrid(pdcDraw,ColorF(ColorF::Aquamarine));
for(int n1=0;n1<rWnd.right;n1+= 50)
for(int n1=0;n1<rWnd.bottom;n1+=50)
int PxlShader::DrawBalls(void) {
int Err= ERR_OK;
for(UINT n1=0;n1<BALL_MAX;n1++) {
POINT2D ptClear;
if(Balls[n1].Draw(pdcDraw,pbmField,ptClear)>0) {
The fog is cleared by drawing a solid blue circle into pbmFog
in the position corresponding to pbmField.
int PxlShader::ClearFog(POINT2D ptCenter, float Radius) {
int Err= ERR_OK;
Then I combine pbmFog and pbmField into pbmImg in DrawFog(). The pixel
shader is really just a simple alpha blend, except that it is using the
blue channel from Input0 as the multiplier.
int PxlShader::DrawFog(void) {
int Err= ERR_OK;
This is exciting, but I am really back to where I was before with hard-edges,
which I was already able to do without a shader. What happens if I draw the
fog bubble with a radial gradient?
BUMMER! I can see the overlaps. :(
UPDATE: The overlaps are caused by a badly configured gradient.
I used Blue:1 - Black:1 when I should have used Blue:1 - Blue:0.
Not realizing this mistake sent me down the cpu rabbit hole for
two days. All the blather about "accumulating" that follows is
completely wrong.
The problem I am bumping into is that I want my fog bubbles to
accumulate. The fog should become more transparent when the
same pixel is cleared more than once. But accumulation implies
time and multiple write operations. The whole point of the GPU
is to batch the operations into a single massively parallel
operation, where everything happens instantaneously -- at the same
moment. Everything about the GPU is designed to make accumulation
It may be that the fog layer simply has to happen on the cpu. And
that would be a very slow and tedious operation. Although I could build
a lookup table to avoid all the math, if the fog bubbles are always the
same size. I would draw a reference gradient fade circle, export the
pixels to the cpu, then use that as the lookup table.
... and four hours later, this is what Fog of War is supposed
to look like:
This is how I did it...
Create a Reference Alpha Map
The fog will be updated on the cpu, which means I need to figure
out how to avoid doing a lot of math. No circle calculations, no
radius calculations, and no alpha scaling. I can avoid all this math
by drawing a gradient fade circle once and using it as a
reference. So my CreateFog() function becomes a lot more
First I need to add some persistent stuff to PxlShader.
PxlShader FogPixels:
class PxlShader {
ID2D1Effect *pFog; // Fog effect
UINT DefogRadius; // Must be the same for all objects.
UINT szDefogMap; // Size (bytes) of pDefogMap[]
UINT32 *pDefogMap; // Defogging map
UINT szFogPixels; // Size (bytes) of pFogPixels[]
UINT32 *pFogPixels; // Fog pixels
My FogEffect is not being wasted, I still use it for the final
blend of the fog and field into pbmImg. I like using blue as my fog
alpha channel and I would need to perform a final blend in any case,
so it is not adding any overhead. And I have a strong hunch FogEffect
will grow in the future.
DefogRadius is now a constant. If I want to handle different defog
radii, I will need to create a separate pDefogMap for each one.
szDefogMap is the size (in bytes) of pDefogMap[].
pDefogMap[] is the "defog" image extracted to main memory where
the cpu can read it. More about this later.
szFogPixels is the size (in bytes) of pFogPixels[]. pFogPixels[]
is a buffer in main memory that will be used to create the final
fog overlay bitmap. It needs to match the size of pbmImg, although
it could be scaled. I am only interested in a single channel, so it
could be reduced from UINT32 to BYTE pixels but this would make
the final fog bitmap creation more complicated. And these days memory
is much cheaper than cpu cycles.
DrawCreate() still calls CreateFog(), but I no longer create pbmFog.
I do need to allocate pFogPixels here.
cpu DrawCreate():
int PxlShader::DrawCreate(void) {
int Err= ERR_OK;
D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) };
if(!pDXGIDevice && IsErr(DrawCreateDX())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create DXGI device.");
} else if(!pSwapChain && IsErr(Err= DrawCreateSwapChain())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create SwapChain.");
} else if(!pDXGISurface && !SUCCEEDED(WinErr= pSwapChain->GetBuffer(0,IID_PPV_ARGS(&pDXGISurface)))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to retrieve DXGI surface. [%X]",WinErr);
} else if(!pD2Factory && IsErr(Err= DrawCreateD2Factory())) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D factory.");
} else if(!pD2Device && !SUCCEEDED(WinErr= pD2Factory->CreateDevice(pDXGIDevice,&pD2Device))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D device.");
} else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context.");
} else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap.");
} else if(!pbmField && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmField))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Field bitmap.");
} else if(!pFog && IsErr(Err= CreateFog(pdcDraw))) {
Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog effect.");
} else if(!pFogPixels && IsErr(CreateFogPixels())) {
Err= Warn(ERR_NO_MEM,"PxlShader:DrawCreate: Unable to allocate FogPixels.");
CreateFog() now creates FogEffect and the reference pDefogMap[] pixels.
cpu CreateFog():
int PxlShader::CreateFog(ID2D1DeviceContext *pdcDst) {
int Err= ERR_OK;
if(IsErr(Err= FogEffectRegister(pD2Factory))) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to register FogEffect.");
} else if(!SUCCEEDED(WinErr= pdcDst->CreateEffect(CLSID_FogEffect,&pFog))) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create FogEffect. [%X]",WinErr);
} else {
Print(PRINT_INFO,"PxlShader:CreateFog: CreateEffect OK");
// Now I need to create my cpu defogging map.
ID2D1Bitmap1 *pbmDefog= 0;
if(IsErr(Err= CreateBitmapComposite(pdcDst,SizeU(DefogRadius*2,DefogRadius*2),pbmDefog))) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create Defog reference bitmap.");
} else {
// Draw a blue radial gradient circle.
FLOAT Radius= (FLOAT)DefogRadius;
ID2D1RadialGradientBrush *brFill= 0;
ID2D1GradientStopCollection *pStops= 0;
D2D1_GRADIENT_STOP Stops[2]= { { 0,ColorF(ColorF::Blue) },{ 1.0f,ColorF(ColorF::Black) } };
if(!SUCCEEDED(WinErr= pdcDst->EndDraw())) {
Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: EndDraw() failed. [%X]",WinErr);
} else if(IsErr(BitmapGetPixels(pdcDst,pbmDefog,szDefogMap,pDefogMap))) {
Err= Warn(Err,"PxlShader:CreateFog: Unable to extract DefogMap.");
The reference "defog" map is the shape of the my defogged area
rendered in blue. The blue channel will eventually be used as the
alpha channel by the FogEffect pixel shader to blend the pbmField
pixels into the final pbmImg bitmap. Why not use alpha directly?
Because it is a pain in the ass to copy from one image to another, and
using blue lets me draw the defog reference image and actually see it.
Plus, the final blend has to happen anyway so using FogEffect does not
add any overhead and will probably prove useful down the road.
So I use all the nice Direct2D stuff to draw a fancy reference
defog image, then extract the pixels to main memory. Now the value of
the BLUE component of the pixels has all the math baked into it. The
pbmDefog bitmap is no longer needed and is thrown away.
This is the code for BitmapGetPixels(), which was copied from BugsLib.
DrawUpdate() remains the same, still a single BeginDraw() and
EndDraw -- a good thing.
DrawClear() now uses memset() to clear the fog overlay. There
is a very strong reason to use 0x00000000 as the initial value
for the fog: This lets me clear the very large buffer, which
needs to happen every frame, using memset() and not a for loop.
The cpu is very fast at memset().
cpu DrawClear():
int PxlShader::DrawClear(void) {
int Err= ERR_OK;
ClearFog() happens entirely on the cpu now, so it needs to
be highly optimized. It is essentially a rectangular bitblt
with an accumulator; for each pixel, it adds the corresponding
blue channel to the blue channel of pFogPixels[], clamping
to 0xFF. So pFogPixels[] is acting like a giant accumulator
for all the ClearFog() calls for each frame. And because the
gradient and circular math is already baked into the blue
channel values, all I have to do is loop through and add.
UPDATE: I could just do the add and let the blue overflow into
green. Then rely on FogShader to treat any green value as saturated
blue. This would make the ClearFog() inner x0 loop both faster and
deterministic (better cpu caching). I think that inner if can be
optimized away as well.
The final draw step is to convert pFogPixels[] into the pbmFog
bitmap on the gpu and let the FogEffect pixel shader copy pbmField
into pbmImg, using pbmFog as the alpha channel. pbmFog is created
and destroyed every frame. Note that CreateBitmapComposite() was
changed to allow an optional pPixels argument, which are the source
pixels in main memory.
cpu DrawFog():
int PxlShader::DrawFog(void) {
int Err= ERR_OK;
ID2D1Bitmap1 *pbmFog= 0;
if(IsErr(CreateBitmapComposite(pdcDraw,SizeU(RWID(rWnd),RHGT(rWnd)),pbmFog,pFogPixels))) {
Err= Warn(Err,"PxlShader:DrawFog: Unable to create pbmFog.");
} else {
My FogShader now needs to check whether the blue has overflowed into green.
cpu FogShader:
/** FogShader.hlsl: Fog of War **/
/** (C)2022 nlited systems, cmd **/
#define D2D_INPUT_COUNT 2
#include "d2d1effecthelpers.hlsli"
// Input0 is an alpha-map, BLUE is the alpha blend channel.
// Input1 is the field image, copied to the output depending on the Input0.blue value.
D2D_PS_ENTRY(main) {
float4 AlphaMap= D2DGetInput(0);
float4 Pixel= D2DGetInput(1);
if(AlphaMap.g) {
// Any g means b overflowed.
Pixel.a= 1.0; // Alpha is saturated, rgb from Input1.
} else if(AlphaMap.b) {
Pixel.a= AlphaMap.b; // Copy alpha from Input0, rgb from Input1.
} else {
Pixel= 0; // Alpha blend is 0, output is transparent black.
Pixel.rgb*= Pixel.a; // Apply the alpha channel.
return Pixel;
So there it is. I finally have a Fog of War solution that works
and looks right. I am disappointed that it is not a purely gpu-based
solution, but at least now I have something working. I will need to
rig PxlShader up to my profiler library and see just how much cpu time
that big memset(), ClearFog(), and DrawFog() are taking. Very interested
in knowing.
This little voyage of discovery took about 9 hours.
Under the Scope
A first look at the performance numbers. Keep in mind I am running
in VMware, the optimizer is currently disabled, PxlShader is running
in a small window, there are only 10 objects, and this is the first
At first glance, the ClearFog() looks pretty good. DrawFog() is the
long pole. The entire DrawUpdate() cycle takes 1425us, of which 1000us
is spent in the call to CreateBitmap(). Each call to ClearFog() takes
only 15us, so even though a deep dive into optimizing the hell out of
that function is tempting it would not move the needle on cpu usage.
The big memset() happens at the beginning of DrawUpdate() before the
first call to ClearFog(), which is at most 32us.
This first glance tells me it takes only about 30us to clear pbmFog
and 1000us (more than 30X) to transfer the pixels and create the
bitmap. This is both good and bad news. The good news: Implementing
fog as a hybrid cpu/gpu operation is feasible, I could theoretically
clear almost 1000 fog circles in 30ms. The bad news: Calling
CreateBitmap() inside the DrawUpdate() cycle eats up a full 1ms. This
is a one-time overhead cost that scales with the size of the window,
not the number of objects.
Running natively gives some confusing results.
The big memset() is taking about 80us. The individual ClearFog()
calls are about 114us, and the DrawFog() takes 380us to create the
This is confusing because the cpu operations seem to take longer. I
took another look at the vmware trace and it looks like the ClearFog()
was averaging about 50us, and natively the average is about
100us. I have no idea why vmware would be running faster. This
does not look like a cpu cycle scaling problem, both traces show
MsgTimer() averaging 31ms. Maybe a core priority or scheduling thing?
It is strange.
Setting the cpu times aside, it is not surprising that the
CreateBitmap() transfer runs faster natively.
I ran PxlShader natively with a very large window (3000x2000). The
memset() grew to about 3ms and the CreateBitmap() to 4.67ms. So between
the two of them, that is nearly 25% of my 30ms frame budget.
Ideally, the memset() would occur in a separate thread (on a different
core) after the CreateBitmap() completes. Then it could happen while the
cpu is waiting for the SwapChain Present(). There is no way to hide the
CreateBitmap(), it needs to happen after the last ClearFog() and before
the final blend in DrawFog().
I enabled the optimizer and ran PxlShader natively: memset 43us,
ClearFog 18us, CreateBitmap 205us. This is why the optimizer should
always be enabled, FFS.
I changed the fog to black, field to white, and fixed a minor
bug in the shader.
