nlited

The first step is to adapt the PxlShader app to something better suited to testing Fog of War.

Bouncing Balls

Ball: class Ball { public: Ball(void); ~Ball(void); int Update(float Elapsed, const RECT &Limit); int Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg); private: float RandomVel(bool IsNeg=false); //Data POINT2D ptNow; POINT2D ptPrv; POINT2D Vel; UINT32 Clr; float Radius; }; /*************************************************************************/ /** Bouncing Ball **/ /*************************************************************************/ Ball::Ball(void) { ptNow= ptPrv= { Random()*100,Random()*100 }; Vel= { RandomVel(rand() & 1),RandomVel(rand() & 1) }; Clr= RGB(rand() & 0xFF,rand() & 0xFF,rand() & 0xFF); Radius= 10.0f; } Ball::~Ball(void) { } float Ball::RandomVel(bool IsNeg) { float vel= Random()*0.1f + 0.01f; return(IsNeg ? -vel : +vel); } int Ball::Update(float Elapsed, const RECT &Limit) { int Err= ERR_OK; ptPrv= ptNow; ptNow.x+= Vel.x*Elapsed; ptNow.y+= Vel.y*Elapsed; if(ptNow.x < Limit.left) { Vel.x= RandomVel(); ptNow.x= Limit.left + Vel.x; } else if(ptNow.x >= Limit.right) { Vel.x= -RandomVel(); ptNow.x= Limit.right + Vel.x; } if(ptNow.y < Limit.top) { Vel.y= RandomVel(); ptNow.y= Limit.top + Vel.y; } else if(ptNow.y >= Limit.bottom) { Vel.y= -RandomVel(); ptNow.y= Limit.bottom + Vel.y; } return(Err); } int Ball::Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg) { int Err= ERR_OK; D2D1_ELLIPSE dot= { ptNow, Radius,Radius }; pdcDst->SetTarget(pbmImg); pdcDst->FillEllipse(dot,D2Brush(pdcDst,ColorF(Clr))); return(Err); }

PxlShader and MsgTimer() becomes a bit simpler now that the bouncing ball logic has been extracted.

MsgTimer(): #define BALL_MAX 10 class PxlShader { ... private: ... //Data Ball Balls[BALL_MAX]; // Bouncing balls }; int PxlShader::MsgTimer(void) { UINT64 msTick= GetTickCount64(); float msElapsed= msPrev ? (float)(msTick-msPrev) : 0; msPrev= msTick; Time+= (float)msElapsed/1000.0f; for(UINT n1=0;n1<BALL_MAX;n1++) Balls[n1].Update(msElapsed,rWnd); InvalidateRect(hWnd,0,0); return(1); }

Fog Layer

I no longer need pbmSrc1 and pbmSrc2. The balls will be drawn directly onto pbmImg. I need a new composite bitmap that will serve as the fog layer, which will be the same size as pbmImg.

Fog layer: class PxlShader { ... private: ... //Data ID2D1Bitmap1 *pbmImg; // Final offscreen image (write-only) ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap }; int PxlShader::DrawCreate(void) { int Err= ERR_OK; HRESULT WinErr; D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) }; ... } else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context."); } else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap."); } else if(!pbmFog && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmFog))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog bitmap."); } return(Err); }

Draw Sequence

I don't have a working fog effect yet, so the line that paints the fog is currently commented out. This let's me verify that the balls are bouncing properly.

Draw Sequence: int PxlShader::DrawUpdate(void) { int Err= ERR_OK; HRESULT WinErr; if(!pdcDraw || !pbmImg || !pbmFog) { Err= Warn(ERR_NOT_CREATED,"PxlShader:DrawUpdate: No resources."); } else { pdcDraw->BeginDraw(); DrawClear(); DrawGrid(); DrawBalls(); DrawFog(); if(!SUCCEEDED(WinErr= pdcDraw->EndDraw())) { Err= Error(ERR_DIRECTX,"PxlShader:DrawUpdate: EndDraw() failed. [%X]",WinErr); DoReset= true; } } return(Err); } int PxlShader::DrawClear(void) { int Err= ERR_OK; pdcDraw->SetTransform(Matrix3x2F::Identity()); pdcDraw->SetTarget(pbmFog); pdcDraw->Clear(ColorF(ColorF::DarkGray)); pdcDraw->SetTarget(pbmImg); pdcDraw->Clear(ColorF(ColorF::Black)); return(Err); } int PxlShader::DrawGrid(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmImg); D2Brush brGrid(pdcDraw,ColorF(ColorF::Aquamarine)); for(int n1=0;n1<rWnd.right;n1+= 50) pdcDraw->DrawLine(Point2F((float)n1,0),Point2F((float)n1,(float)rWnd.bottom),brGrid); for(int n1=0;n1<rWnd.bottom;n1+=50) pdcDraw->DrawLine(Point2F(0,(float)n1),Point2F((float)rWnd.right,(float)n1),brGrid); return(Err); } int PxlShader::DrawBalls(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmImg); for(UINT n1=0;n1<BALL_MAX;n1++) { Balls[n1].Draw(pdcDraw,pbmImg); } return(Err); } int PxlShader::DrawFog(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmImg); //pdcDraw->DrawBitmap(pbmFog); return(Err); }

My balls are a-bouncing. Time to figure out how this Fog of War (FoW) Effect is going to work.

The Fog Effect

My strategy is to clear pbmFog to solid gray at the beginning of each draw sequence, then every time I draw an object (a bouncing ball) I will clear a corresponding hole in the fog. When I paint the fog bitmap onto pbmImg as the final step of the draw sequence, the holes should let me see the area around the bouncing balls and obscure everything else.

The holes in the fog are created by reducing the alpha channel, leaving the RGB channels unchanged. This should create a gauzy fog that becomes clearer (more transparent) with repeated clearings.

The pixel operation I want to perform is taking to decimate the alpha channel of pbmFog within a circle. I'm not sure if a pixel shader can use the same texture as both an input and an output. The fog effect will be easy if I can do something like this:
pdcDraw->SetTarget(pbmFog); pFogEffect->SetValueByName(L"ptCenter",ptObj); pFogEffect->SetValueByName(L"ptRadius",Radius); pFogEffect->SetInput(0,pbmFog); pdcDraw->DrawImage(pEffect); Then I can do everything in the shader with a single source.

Fog Effect V1

Clone MyEffect into FogEffect.cpp. FogEffect will have only one input, so I can strip out most of the MapRects code.

FogEffect.cpp: /*************************************************************************/ /** FogEffect.cpp: Fog of War Effect **/ /** (C)2022 nlited systems, cmd **/ /*************************************************************************/ #include <Windows.h> #include <d3d11_1.h> #include <d2d1.h> #include <d2d1_1.h> #include <d2d1helper.h> #include <d2d1effectauthor.h> #include <d2d1effecthelpers.h> #include <d3dcompiler.h> #include "Globals.h" #include "ChipLib.h" #pragma comment(lib,"D3D11.lib") #pragma comment(lib,"D2D1.lib") #pragma comment(lib,"DXGI.lib") #pragma comment(lib,"d3dcompiler.lib") #pragma message(__FILE__": Optimizer disabled.") #pragma optimize("",off) // {0831F65D-D59B-41BB-B072-43E5355DE603} const GUID CLSID_FogEffect= { 0x831f65d, 0xd59b, 0x41bb, { 0xb0, 0x72, 0x43, 0xe5, 0x35, 0x5d, 0xe6, 0x3 } }; // {F7AB0F51-CD65-4515-AB20-6C2F8BE687BA} static const GUID GUID_FogShader= { 0xf7ab0f51, 0xcd65, 0x4515, { 0xab, 0x20, 0x6c, 0x2f, 0x8b, 0xe6, 0x87, 0xba } }; class FogEffect: public ID2D1EffectImpl, public ID2D1DrawTransform { public: //EffectImpl static HRESULT Register(_In_ ID2D1Factory1 *pFactory); static HRESULT __stdcall Create(_Outptr_ IUnknown **ppEffect); IFACEMETHODIMP QueryInterface(REFIID riid, void **ppInterface); IFACEMETHODIMP_(ULONG) AddRef(void) { return(++ctReference); }; IFACEMETHODIMP_(ULONG) Release(void); IFACEMETHODIMP Initialize(_In_ ID2D1EffectContext *pCtx, _In_ ID2D1TransformGraph *pGraph); IFACEMETHODIMP PrepareForRender(D2D1_CHANGE_TYPE Type); IFACEMETHODIMP SetGraph(ID2D1TransformGraph *pGraph); //DrawTransform IFACEMETHODIMP SetDrawInfo(ID2D1DrawInfo *pDraw); IFACEMETHODIMP MapOutputRectToInputRects(const D2D1_RECT_L *prOut, D2D1_RECT_L *prIn, UINT32 ctIn) const; IFACEMETHODIMP MapInputRectsToOutputRect(const D2D1_RECT_L *prIn, const D2D1_RECT_L *prInOpaque, UINT32 ctIn, D2D1_RECT_L *prOut, D2D1_RECT_L *prOutOpaque); IFACEMETHODIMP MapInvalidRect(UINT32 nIn, D2D1_RECT_L rInInvalid, D2D1_RECT_L *prOut) const; IFACEMETHODIMP_(UINT32) GetInputCount(void) const; //Added HRESULT SetCenter(const D2D_VECTOR_2F ptCenter) { Constants.ptCenter= ptCenter; return(S_OK); }; D2D_VECTOR_2F GetCenter(void) const { return(Constants.ptCenter); }; HRESULT SetRadius(FLOAT Radius) { Constants.Radius= Radius; return(S_OK); }; FLOAT GetRadius(void) const { return(Constants.Radius); }; private: FogEffect(void); ~FogEffect(void); // Data DWORD Signature; LONG ctReference; ID2D1EffectContext *pCtx; ID2D1DrawInfo *pDraw; D2D1_RECT_L rIn; D2D1_RECT_L rOut; struct Constants_s { D2D_VECTOR_2F ptCenter; FLOAT Radius; } Constants; }; /*************************************************************************/ /** Public interface **/ /*************************************************************************/ int FogEffectRegister(ID2D1Factory1 *pD2Factory) { return(FogEffect::Register(pD2Factory)); } /*************************************************************************/ /** Private code **/ /*************************************************************************/ FogEffect::FogEffect(void) { Signature= SIGNATURE_FOGEFFECT; ctReference= 1; Constants.ptCenter= { 0,0 }; Constants.Radius= 0; } FogEffect::~FogEffect(void) { Signature|= SIGNATURE_INVALID; } ULONG FogEffect::Release(void) { if(--ctReference > 0) return(ctReference); delete this; return(0); } HRESULT FogEffect::QueryInterface(REFIID riid, void **ppInterface) { HRESULT WinErr= S_OK; void *pInterface= 0; if(riid==__uuidof(ID2D1EffectImpl)) { pInterface= reinterpret_cast<ID2D1EffectImpl*>(this); } else if(riid==__uuidof(ID2D1DrawTransform)) { pInterface= static_cast<ID2D1DrawTransform*>(this); } else if(riid==__uuidof(ID2D1Transform)) { pInterface= static_cast<ID2D1Transform*>(this); } else if(riid==__uuidof(ID2D1TransformNode)) { pInterface= static_cast<ID2D1TransformNode*>(this); } else if(riid==__uuidof(ID2D1ComputeTransform)) { Print(PRINT_DEBUG,"FogEffect:QueryInterface: I am not a compute transform."); WinErr= E_NOINTERFACE; } else if(riid==__uuidof(ID2D1SourceTransform)) { Print(PRINT_DEBUG,"FogEffect:QueryInterface: I am not a source transform."); WinErr= E_NOINTERFACE; } else if(riid==__uuidof(IUnknown)) { pInterface= this; } else { WinErr= E_NOINTERFACE; } if(ppInterface) { *ppInterface= pInterface; if(pInterface) AddRef(); } return(WinErr); } HRESULT FogEffect::Register(ID2D1Factory1 *pFactory) { HRESULT WinErr= S_OK; static const PCWSTR pszXml = L"<?xml version='1.0'?>\r\n" L"<Effect>\r\n" L" \r\n" L" <Property name='DisplayName' type='string' value='FogOfWar'/>\r\n" L" <Property name='Author' type='string' value='nlited systems'/>\r\n" L" <Property name='Category' type='string' value='Experimental'/>\r\n" L" <Property name='Description' type='string' value='Obscuring fog'/>\r\n" L" <Inputs minimum='0' maximum='1'>\r\n" // Source must be specified. L" <Input name='Source1'/>\r\n" L" </Inputs>\r\n" L" \r\n" L" <Property name='ptCenter' type='vector2'>\r\n" L" <Property name='DisplayName' type='string' value='ptCenter'/>\r\n" // L" <Property name='Default' type='vector2' value='{0,0}'/>\r\n" L" </Property>\r\n" L" <Property name='Radius' type='float'>\r\n" L" <Property name='DisplayName' type='string' value='Radius'/>\r\n" L" <Property name='Default' type='float' value='10.0'/>\r\n" L" <Property name='Min' type='float' value='0'/>\r\n" L" <Property name='Max' type='float' value='100.0'/>\r\n" L" </Property>\r\n" L"</Effect>\r\n" ; static const D2D1_PROPERTY_BINDING Bindings[]= { D2D1_VALUE_TYPE_BINDING(L"ptCenter",&SetCenter,&GetCenter) ,D2D1_VALUE_TYPE_BINDING(L"Radius",&SetRadius,&GetRadius) }; if(!SUCCEEDED(WinErr= pFactory->RegisterEffectFromString(CLSID_FogEffect,pszXml,Bindings,ARRAYSIZE(Bindings),Create))) { Error(ERR_DIRECTX,"FogEffect:Register: RegisterEffectFromString() failed. [%X]",WinErr); } return(WinErr); } HRESULT __stdcall FogEffect::Create(IUnknown **ppEffect) { HRESULT WinErr= S_OK; *ppEffect= static_cast<ID2D1EffectImpl*>(new FogEffect); if(!*ppEffect) { WinErr= E_OUTOFMEMORY; } return(WinErr); } HRESULT FogEffect::Initialize(ID2D1EffectContext *_pCtx, ID2D1TransformGraph *pGraph) { HRESULT WinErr= S_OK; ID3DBlob *pCode= 0; ID3DBlob *pError= 0; pCtx= _pCtx; if(!SUCCEEDED(WinErr= D3DReadFileToBlob(L"FogShader.cso",&pCode))) { Warn(ERR_FILE_READ,"FogEffect:Initialize: Unable to read shader. [%X]",WinErr); } else if(!SUCCEEDED(WinErr= pCtx->LoadPixelShader(GUID_FogShader,(BYTE*)pCode->GetBufferPointer(),(UINT32)pCode->GetBufferSize()))) { Warn(ERR_DIRECTX,"FogEffect:Initialize: Unable to create pixel shader. [%X]",WinErr); } else if(!SUCCEEDED(WinErr= pGraph->SetSingleTransformNode(this))) { Warn(ERR_DIRECTX,"FogEffect:Initialize: Unable to set transform node. [%X]",WinErr); } else { Print(PRINT_INFO,"FogEffect:Initialize: OK."); } SafeRelease(pCode); SafeRelease(pError); return(WinErr); } // This is a single-transform, single-node graph, SetGraph() should never be called. HRESULT FogEffect::SetGraph(ID2D1TransformGraph *pGraph) { Warn(ERR_DIRECTX,"FogEffect:SetGraph: Should not be called."); return(E_NOTIMPL); } HRESULT FogEffect::PrepareForRender(D2D1_CHANGE_TYPE Type) { HRESULT WinErr= S_OK; pDraw->SetPixelShaderConstantBuffer((BYTE*)&Constants,sizeof(Constants)); return(WinErr); } // ID2D1DrawTransform HRESULT FogEffect::SetDrawInfo(ID2D1DrawInfo *_pDraw) { HRESULT WinErr= S_OK; pDraw= _pDraw; if(!SUCCEEDED(WinErr= pDraw->SetPixelShader(GUID_FogShader))) { Warn(ERR_DIRECTX,"FogEffect:SetDrawInfo: SetPixelShader() failed. [%X]",WinErr); } return(WinErr); } IFACEMETHODIMP FogEffect::MapInputRectsToOutputRect(const D2D1_RECT_L *prIn, const D2D1_RECT_L *prInOpaque, UINT32 ctIn, D2D1_RECT_L *prOut, D2D1_RECT_L *prOutOpaque) { HRESULT WinErr= S_OK; if(ctIn!=1) { WinErr= Warn(E_INVALIDARG,"FogEffect:MapInToOut: Only 1 input allowed. ctIn=%d",ctIn); } else { rOut= rIn= *prOut= prIn[0]; Zero(*prOutOpaque); } return(WinErr); } HRESULT FogEffect::MapOutputRectToInputRects(const D2D1_RECT_L *prOut, _Out_writes_(ctIn) D2D1_RECT_L *prIn, UINT32 ctIn) const { HRESULT WinErr= S_OK; if(ctIn!=1) { WinErr= Warn(E_INVALIDARG,"FogEffect:MapOutToIn: Only 1 input allowed. ctIn=%d",ctIn); } else { prIn[0]= *prOut; } return(WinErr); } IFACEMETHODIMP FogEffect::MapInvalidRect(UINT32 nIn, D2D1_RECT_L rInInvalid, D2D1_RECT_L *prOutInvalid) const { HRESULT WinErr= S_OK; // Set entire output to invalid *prOutInvalid= rOut; return(WinErr); } IFACEMETHODIMP_(UINT32) FogEffect::GetInputCount(void) const { return(1); } //EOF: FOGEFFECT.CPP

FogShader.hlsl: /*************************************************************************/ /** FogShader.hlsl: Fog of War **/ /** (C)2022 nlited systems, cmd **/ /*************************************************************************/ #define D2D_INPUT_COUNT 2 #include "d2d1effecthelpers.hlsli" cbuffer constants: register(b0) { float2 ptCenter; float Radius; }; D2D_PS_ENTRY(main) { float4 color= D2DGetInput(0); float2 ptPxl= D2DGetInputCoordinate(0).xy; float dist= distance(ptCenter,ptPxl); if(dist <= Radius) { if(color.a < 0.10) { color.a= 0; } else { color.a*= 0.60; } } return color; } //EOF: FOGSHADER.HLSL

Ball::Draw() will now return a point and PxlShader will clear the fog around that point.

ClearFog: class Ball { public: int Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg, POINT2D &ptClear); }; class PxlShader { private: int CreateFog(void); int ClearFog(POINT2D ptCenter, float Radius); //Data ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap ID2D1Effect *pFog; // Fog effect }; int Ball::Draw(ID2D1DeviceContext *pdcDst, ID2D1Bitmap1 *pbmImg, POINT2D &ptClear) { int Err= ERR_OK; D2D1_ELLIPSE dot= { ptNow, Radius,Radius }; pdcDst->SetTarget(pbmImg); pdcDst->FillEllipse(dot,D2Brush(pdcDst,ColorF(Clr))); ptClear= ptNow; return(1); } int PxlShader::DrawBalls(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmImg); for(UINT n1=0;n1<BALL_MAX;n1++) { POINT2D ptClear; if(Balls[n1].Draw(pdcDraw,pbmImg,ptClear)>0) { ClearFog(ptClear,20.0f); } } return(Err); } int PxlShader::ClearFog(POINT2D ptCenter, float Radius) { int Err= ERR_OK; if(pFog && pbmFog) { D2D1_VECTOR_2F vecCenter= { ptCenter.x,ptCenter.y }; pFog->SetInput(0,pbmFog); pFog->SetValueByName(L"ptCenter",vecCenter); pFog->SetValueByName(L"Radius",Radius); pdcDraw->SetTarget(pbmFog); pdcDraw->DrawImage(pFog); } return(Err); } int PxlShader::DrawFog(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmImg); pdcDraw->DrawBitmap(pbmFog); return(Err); }

After about two hours, I have my answer: NO.
0x88990025: Cannot draw with a bitmap that is currently bound as the target bitmap.

A crude work-around is to create a copy of pbmFog, draw the effect onto the copy, then replace the original with the copy.

The problem with this approach is that when I try to replace the original using pdcDraw->DrawBitmap(pbmFog2), the holes are not copied because the alpha is zero! I need a version of DrawBitmap() that is a verbatim copy, not an alpha blend. I tried pdcDraw->DrawImage(D2D1_COMPOSITE_MODE_SOURCE_COPY) without success.

The other problem is that I need pbmFog to accumulate holes. This is a problem because the actual pixel updates do not occur until EndDraw(), which means all the ClearFog() operations until then are copying from the original solid pbmFog.

I am beginning to wonder if I need to keep pbmFog in main memory and use the cpu to clear the holes. That would be disappointing!

SUCCESS! It took about five hours of experimenting, but I finally have a solution. The PxlShader project really helped.

I kept running into the same roadblock: I wanted to modify and copy the alpha channel, not use it as a blend operator. This turned out to be the crux of why Fog was being so elusive. Then I had an epiphany: My pixel shader is performing the blend operation -- there is no reason why I had to use the alpha channel as the multiplier, I could use any channel I wanted! If I used blue to indicate transparency in the fog, all I needed was to have a final operation that would composite the fog and the intermediate image using blue instead of alpha. Then all my "can't copy alpha" problems disappear, since I am now using blue.

The solution was not what I expected. I suspect there is a standard Direct2D color matrix operation that could be used to move blue into alpha, which would enable FoW without requiring a custom pixel shader. But having pixel shaders in my toolbox makes it easier and I can control every aspect of the operation.

This is the PxlShader code to create the fog resources: pbmField, pbmFog, and pFog. Everything is drawn with pdcDraw.

PxlShader Create: class PxlShader { private: ... ID2D1DeviceContext *pdcDraw; // The workhorse, used to draw everything. ID2D1Bitmap1 *pbmImg; // Final offscreen image (write-only) ID2D1Bitmap1 *pbmField; // Composite Field bitmap ID2D1Bitmap1 *pbmFog; // Composite Fog bitmap ID2D1Effect *pFog; // Fog effect }; int PxlShader::DrawCreate(void) { int Err= ERR_OK; HRESULT WinErr; D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) }; D2D1_DEVICE_CONTEXT_OPTIONS DCOptions= D2D1_DEVICE_CONTEXT_OPTIONS_NONE; if(DoReset) ReleaseEverything(); if(!pDXGIDevice && IsErr(DrawCreateDX())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create DXGI device."); } else if(!pSwapChain && IsErr(Err= DrawCreateSwapChain())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create SwapChain."); } else if(!pDXGISurface && !SUCCEEDED(WinErr= pSwapChain->GetBuffer(0,IID_PPV_ARGS(&pDXGISurface)))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to retrieve DXGI surface. [%X]",WinErr); } else if(!pD2Factory && IsErr(Err= DrawCreateD2Factory())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D factory."); } else if(!pD2Device && !SUCCEEDED(WinErr= pD2Factory->CreateDevice(pDXGIDevice,&pD2Device))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D device."); } else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context."); } else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap."); } else if(!pbmField && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmField))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Field bitmap."); } else if(!pbmFog && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmFog))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog bitmap."); } else if(!pFog && IsErr(Err= CreateFog(pdcDraw))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog effect."); } return(Err); } int PxlShader::CreateFog(ID2D1DeviceContext *pdcDst) { int Err= ERR_OK; HRESULT WinErr; if(IsErr(Err= FogEffectRegister(pD2Factory))) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to register FogEffect."); } else if(!SUCCEEDED(WinErr= pdcDst->CreateEffect(CLSID_FogEffect,&pFog))) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create FogEffect. [%X]",WinErr); } else { Print(PRINT_INFO,"PxlShader:CreateFog: OK"); } return(Err); } void PxlShader::ReleaseEverything(void) { SafeRelease(pFog); SafeRelease(pbmFog); SafeRelease(pbmField); SafeRelease(pbmImg); SafeRelease(pdcDraw); SafeRelease(pD2Device); SafeRelease(pD2Factory); SafeRelease(pDXGISurface); SafeRelease(pSwapChain); SafeRelease(pDXGIFactory); SafeRelease(pDXGIDevice); DoReset= false; }

DrawUpdate(): int PxlShader::DrawUpdate(void) { int Err= ERR_OK; HRESULT WinErr; if(!pdcDraw || !pbmImg) { Err= Warn(ERR_NOT_CREATED,"PxlShader:DrawUpdate: No resources."); } else { pdcDraw->BeginDraw(); DrawClear(); DrawGrid(); DrawBalls(); DrawFog(); if(!SUCCEEDED(WinErr= pdcDraw->EndDraw())) { Err= Error(ERR_DIRECTX,"PxlShader:DrawUpdate: EndDraw() failed. [%X]",WinErr); DoReset= true; } } return(Err); }

DrawClear() resets the bitmaps. pbmField and pbmFog are cleared to solid black. pbmImg is cleared to solid fog. The final operation to draw the fog layer actually draws the fog everywhere not obscured, leaving the original fog color.

DrawClear(): int PxlShader::DrawClear(void) { int Err= ERR_OK; pdcDraw->SetTransform(Matrix3x2F::Identity()); pdcDraw->SetTarget(pbmImg); pdcDraw->Clear(ColorF(ColorF::DarkGray)); pdcDraw->SetTarget(pbmFog); pdcDraw->Clear(ColorF(ColorF::Black)); pdcDraw->SetTarget(pbmField); pdcDraw->Clear(ColorF(ColorF::Black)); return(Err); }

The grid and bouncing balls (essentially everything) is drawn to pbmField. I cannot draw to pbmImg because I will need to copy the field image later.

DrawGrid(): int PxlShader::DrawGrid(void) { int Err= ERR_OK; pdcDraw->SetTarget(pbmField); D2Brush brGrid(pdcDraw,ColorF(ColorF::Aquamarine)); for(int n1=0;n1<rWnd.right;n1+= 50) pdcDraw->DrawLine(Point2F((float)n1,0),Point2F((float)n1,(float)rWnd.bottom),brGrid); for(int n1=0;n1<rWnd.bottom;n1+=50) pdcDraw->DrawLine(Point2F(0,(float)n1),Point2F((float)rWnd.right,(float)n1),brGrid); return(Err); } int PxlShader::DrawBalls(void) { int Err= ERR_OK; for(UINT n1=0;n1<BALL_MAX;n1++) { POINT2D ptClear; if(Balls[n1].Draw(pdcDraw,pbmField,ptClear)>0) { ClearFog(ptClear,20.0f); } } return(Err); }

The fog is cleared by drawing a solid blue circle into pbmFog in the position corresponding to pbmField.

ClearFog(): int PxlShader::ClearFog(POINT2D ptCenter, float Radius) { int Err= ERR_OK; HRESULT WinErr= S_OK; pdcDraw->SetTarget(pbmFog); pdcDraw->FillEllipse(Ellipse(ptCenter,Radius,Radius),D2Brush(pdcDraw,ColorF(ColorF::Blue))); pdcDraw->SetTarget(pbmImg); return(Err); }

Then I combine pbmFog and pbmField into pbmImg in DrawFog(). The pixel shader is really just a simple alpha blend, except that it is using the blue channel from Input0 as the multiplier.

DrawFog(): int PxlShader::DrawFog(void) { int Err= ERR_OK; pFog->SetInput(0,pbmFog); pFog->SetInput(1,pbmField); pdcDraw->SetTarget(pbmImg); pdcDraw->DrawImage(pFog); return(Err); }

FogShader.hlsl: /*************************************************************************/ /** FogShader.hlsl: Fog of War **/ /** (C)2022 nlited systems, cmd **/ /*************************************************************************/ #define D2D_INPUT_COUNT 2 #include "d2d1effecthelpers.hlsli" D2D_PS_ENTRY(main) { float4 AlphaMap= D2DGetInput(0); float4 Pixel= D2DGetInput(1); if(AlphaMap.b) { Pixel.a= AlphaMap.b; } else { Pixel= 0; } return Pixel; } //EOF: FOGSHADER.HLSL

This is exciting, but I am really back to where I was before with hard-edges, which I was already able to do without a shader. What happens if I draw the fog bubble with a radial gradient?

The problem I am bumping into is that I want my fog bubbles to accumulate. The fog should become more transparent when the same pixel is cleared more than once. But accumulation implies time and multiple write operations. The whole point of the GPU is to batch the operations into a single massively parallel operation, where everything happens instantaneously -- at the same moment. Everything about the GPU is designed to make accumulation impossible.

It may be that the fog layer simply has to happen on the cpu. And that would be a very slow and tedious operation. Although I could build a lookup table to avoid all the math, if the fog bubbles are always the same size. I would draw a reference gradient fade circle, export the pixels to the cpu, then use that as the lookup table.

Create a Reference Alpha Map

The fog will be updated on the cpu, which means I need to figure out how to avoid doing a lot of math. No circle calculations, no radius calculations, and no alpha scaling. I can avoid all this math by drawing a gradient fade circle once and using it as a reference. So my CreateFog() function becomes a lot more complicated...

PxlShader FogPixels: class PxlShader { private: ... ID2D1Effect *pFog; // Fog effect UINT DefogRadius; // Must be the same for all objects. UINT szDefogMap; // Size (bytes) of pDefogMap[] UINT32 *pDefogMap; // Defogging map UINT szFogPixels; // Size (bytes) of pFogPixels[] UINT32 *pFogPixels; // Fog pixels };

My FogEffect is not being wasted, I still use it for the final blend of the fog and field into pbmImg. I like using blue as my fog alpha channel and I would need to perform a final blend in any case, so it is not adding any overhead. And I have a strong hunch FogEffect will grow in the future.

DefogRadius is now a constant. If I want to handle different defog radii, I will need to create a separate pDefogMap for each one.

pDefogMap[] is the "defog" image extracted to main memory where the cpu can read it. More about this later.

szFogPixels is the size (in bytes) of pFogPixels[]. pFogPixels[] is a buffer in main memory that will be used to create the final fog overlay bitmap. It needs to match the size of pbmImg, although it could be scaled. I am only interested in a single channel, so it could be reduced from UINT32 to BYTE pixels but this would make the final fog bitmap creation more complicated. And these days memory is much cheaper than cpu cycles.

DrawCreate() still calls CreateFog(), but I no longer create pbmFog. I do need to allocate pFogPixels here.

cpu DrawCreate(): int PxlShader::DrawCreate(void) { int Err= ERR_OK; HRESULT WinErr; D2D1_SIZE_U szWnd= { (UINT32)RWID(rWnd), (UINT32)RHGT(rWnd) }; D2D1_DEVICE_CONTEXT_OPTIONS DCOptions= D2D1_DEVICE_CONTEXT_OPTIONS_NONE; if(DoReset) ReleaseEverything(); if(!pDXGIDevice && IsErr(DrawCreateDX())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create DXGI device."); } else if(!pSwapChain && IsErr(Err= DrawCreateSwapChain())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create SwapChain."); } else if(!pDXGISurface && !SUCCEEDED(WinErr= pSwapChain->GetBuffer(0,IID_PPV_ARGS(&pDXGISurface)))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to retrieve DXGI surface. [%X]",WinErr); } else if(!pD2Factory && IsErr(Err= DrawCreateD2Factory())) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D factory."); } else if(!pD2Device && !SUCCEEDED(WinErr= pD2Factory->CreateDevice(pDXGIDevice,&pD2Device))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Direct2D device."); } else if(!pdcDraw && !SUCCEEDED(WinErr= pD2Device->CreateDeviceContext(DCOptions,&pdcDraw))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Draw context."); } else if(!pbmImg && IsErr(Err= CreateBitmapBase(pdcDraw,szWnd,pbmImg))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create base bitmap."); } else if(!pbmField && IsErr(Err= CreateBitmapComposite(pdcDraw,szWnd,pbmField))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create Field bitmap."); } else if(!pFog && IsErr(Err= CreateFog(pdcDraw))) { Err= Warn(ERR_DIRECTX,"PxlShader:DrawCreate: Unable to create fog effect."); } else if(!pFogPixels && IsErr(CreateFogPixels())) { Err= Warn(ERR_NO_MEM,"PxlShader:DrawCreate: Unable to allocate FogPixels."); } return(Err); }

cpu CreateFog(): int PxlShader::CreateFog(ID2D1DeviceContext *pdcDst) { int Err= ERR_OK; HRESULT WinErr; if(IsErr(Err= FogEffectRegister(pD2Factory))) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to register FogEffect."); } else if(!SUCCEEDED(WinErr= pdcDst->CreateEffect(CLSID_FogEffect,&pFog))) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create FogEffect. [%X]",WinErr); } else { Print(PRINT_INFO,"PxlShader:CreateFog: CreateEffect OK"); // Now I need to create my cpu defogging map. ID2D1Bitmap1 *pbmDefog= 0; if(IsErr(Err= CreateBitmapComposite(pdcDst,SizeU(DefogRadius*2,DefogRadius*2),pbmDefog))) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: Unable to create Defog reference bitmap."); } else { pdcDst->BeginDraw(); pdcDst->SetTarget(pbmDefog); // Draw a blue radial gradient circle. FLOAT Radius= (FLOAT)DefogRadius; ID2D1RadialGradientBrush *brFill= 0; ID2D1GradientStopCollection *pStops= 0; D2D1_GRADIENT_STOP Stops[2]= { { 0,ColorF(ColorF::Blue) },{ 1.0f,ColorF(ColorF::Black) } }; pdcDraw->CreateGradientStopCollection(Stops,2,D2D1_GAMMA_2_2,D2D1_EXTEND_MODE_CLAMP,&pStops); pdcDraw->CreateRadialGradientBrush(RadialGradientBrushProperties(Point2F(Radius,Radius),Point2F(0,0),Radius,Radius),pStops,&brFill); pdcDraw->FillEllipse(Ellipse(Point2F(Radius,Radius),Radius,Radius),brFill); brFill->Release(); pStops->Release(); if(!SUCCEEDED(WinErr= pdcDst->EndDraw())) { Err= Warn(ERR_DIRECTX,"PxlShader:CreateFog: EndDraw() failed. [%X]",WinErr); } else if(IsErr(BitmapGetPixels(pdcDst,pbmDefog,szDefogMap,pDefogMap))) { Err= Warn(Err,"PxlShader:CreateFog: Unable to extract DefogMap."); } } SafeRelease(pbmDefog); } return(Err); }

The reference "defog" map is the shape of the my defogged area rendered in blue. The blue channel will eventually be used as the alpha channel by the FogEffect pixel shader to blend the pbmField pixels into the final pbmImg bitmap. Why not use alpha directly? Because it is a pain in the ass to copy from one image to another, and using blue lets me draw the defog reference image and actually see it. Plus, the final blend has to happen anyway so using FogEffect does not add any overhead and will probably prove useful down the road.

So I use all the nice Direct2D stuff to draw a fancy reference defog image, then extract the pixels to main memory. Now the value of the BLUE component of the pixels has all the math baked into it. The pbmDefog bitmap is no longer needed and is thrown away.

cpu BitmapGetPixels(): static void CopyPixels(UINT32 *pDst, const UINT32 *pSrc, UINT wid, UINT hgt, UINT stride) { while(hgt--) { memcpy(pDst,pSrc,wid*4); pDst+= wid; pSrc+= stride/4; } } int PxlShader::BitmapGetPixels(ID2D1DeviceContext *pdcSrc, ID2D1Bitmap1 *pbmSrc, UINT &ctBytes, UINT32 *&pPixels) { int Err= ERR_OK; HRESULT WinErr; D2D1_MAPPED_RECT Map; D2D1_SIZE_F szfSrc= pbmSrc->GetSize(); D2D1_RECT_U rSrc= { 0,0,(UINT)szfSrc.width,(UINT)szfSrc.height }; D2D1_SIZE_U szuSrc= { (UINT)RWID(rSrc), (UINT)RHGT(rSrc) }; D2D1_POINT_2U ptDst= { 0,0 }; D2D1_BITMAP_PROPERTIES1 bmProp= D2D1::BitmapProperties1(); ID2D1Bitmap1 *pbmReadable= 0; ctBytes= szuSrc.width*szuSrc.height*4; pPixels= 0; bmProp.pixelFormat= { DXGI_FORMAT_B8G8R8A8_UNORM, D2D1_ALPHA_MODE_PREMULTIPLIED }; bmProp.bitmapOptions= D2D1_BITMAP_OPTIONS_CPU_READ|D2D1_BITMAP_OPTIONS_CANNOT_DRAW; if(!SUCCEEDED(WinErr= pdcSrc->CreateBitmap(szuSrc,0,0,bmProp,&pbmReadable))) { Err= Warn(ERR_DIRECTX,"PxlShader:BitmapGetPixels: CreateBitmap() failed. [%X]",WinErr); } else if(!SUCCEEDED(WinErr= pbmReadable->CopyFromBitmap(&ptDst,pbmSrc,&rSrc))) { Err= Warn(ERR_DIRECTX,"PxlShader:BitmapGetPixels: CopyFromBitmap() failed. [%X]",WinErr); } else if(!SUCCEEDED(WinErr= pbmReadable->Map(D2D1_MAP_OPTIONS_READ,&Map))) { Err= Warn(ERR_DIRECTX,"PxlShader:BitmapGetPixels: Map() failed. [%X]",WinErr); } else { if(!(pPixels= (UINT32*)MemAlloc("GetPixels",ctBytes))) { Err= Warn(ERR_NO_MEM,"PxlShader:BitmapGetPixels: NoMem(%,u)",ctBytes); } else { CopyPixels(pPixels,(UINT32*)Map.bits,szuSrc.width,szuSrc.height,Map.pitch); } pbmReadable->Unmap(); } SafeRelease(pbmReadable); return(Err); }

CreateFogPixels() is a simple memory allocation. It is a function because it also needs to set szFogPixels.

CreateFogPixels(): int PxlShader::CreateFogPixels(void) { int Err= ERR_OK; szFogPixels= RWID(rWnd)*RHGT(rWnd)*4; if(!(pFogPixels= (UINT32*)MemAlloc("FogPixels",szFogPixels))) { Err= Warn(ERR_NO_MEM,"PxlShader:CreateFogPixels: NoMem %ux%u = %,u bytes",RWID(rWnd),RHGT(rWnd),szFogPixels); } return(Err); }

cpu ReleaseEverything(): void PxlShader::ReleaseEverything(void) { SafeRelease(pFog); SafeRelease(pbmField); SafeRelease(pbmImg); SafeRelease(pdcDraw); SafeRelease(pD2Device); SafeRelease(pD2Factory); SafeRelease(pDXGISurface); SafeRelease(pSwapChain); SafeRelease(pDXGIFactory); SafeRelease(pDXGIDevice); MemFree2(pFogPixels); MemFree2(pDefogMap); DoReset= false; }

DrawUpdate() remains the same, still a single BeginDraw() and EndDraw -- a good thing.

DrawClear() now uses memset() to clear the fog overlay. There is a very strong reason to use 0x00000000 as the initial value for the fog: This lets me clear the very large buffer, which needs to happen every frame, using memset() and not a for loop. The cpu is very fast at memset().

cpu DrawClear(): int PxlShader::DrawClear(void) { int Err= ERR_OK; pdcDraw->SetTransform(Matrix3x2F::Identity()); pdcDraw->SetTarget(pbmImg); pdcDraw->Clear(ColorF(ColorF::DarkGray)); pdcDraw->SetTarget(pbmField); pdcDraw->Clear(ColorF(ColorF::Black)); memset(pFogPixels,0,szFogPixels); return(Err); }

ClearFog() happens entirely on the cpu now, so it needs to be highly optimized. It is essentially a rectangular bitblt with an accumulator; for each pixel, it adds the corresponding blue channel to the blue channel of pFogPixels[], clamping to 0xFF. So pFogPixels[] is acting like a giant accumulator for all the ClearFog() calls for each frame. And because the gradient and circular math is already baked into the blue channel values, all I have to do is loop through and add.

UPDATE: I could just do the add and let the blue overflow into green. Then rely on FogShader to treat any green value as saturated blue. This would make the ClearFog() inner x0 loop both faster and deterministic (better cpu caching). I think that inner if can be optimized away as well.

cpu ClearFog(): int PxlShader::ClearFog(POINT2D ptCenter, float Radius) { int Err= ERR_OK; UINT nDst,nSrc= 0; RECT rFog= { (int)(ptCenter.x-DefogRadius),(int)(ptCenter.y-DefogRadius),(int)(ptCenter.x+DefogRadius),(int)(ptCenter.y+DefogRadius) }; for(int y0=rFog.top;y0<rFog.bottom;y0++) { if(y0>0 && y0<rWnd.bottom) { nDst= y0*RWID(rWnd); for(int x0=rFog.left;x0<rFog.right;x0++) { if(x0>0 && x0<rWnd.right) { UINT32 Defog= pDefogMap[nSrc+(x0-rFog.left)]; pFogPixels[nDst+x0]+= (Defog & 0xFF); } } } nSrc+= DefogRadius*2; } return(Err); } Earlier version with the clamping: int PxlShader::ClearFog(POINT2D ptCenter, float Radius) { int Err= ERR_OK; UINT nDst,nSrc= 0; RECT rFog= { (int)(ptCenter.x-DefogRadius),(int)(ptCenter.y-DefogRadius),(int)(ptCenter.x+DefogRadius),(int)(ptCenter.y+DefogRadius) }; for(int y0=rFog.top;y0<rFog.bottom;y0++) { if(y0>0 && y0<rWnd.bottom) { nDst= y0*RWID(rWnd); for(int x0=rFog.left;x0<rFog.right;x0++) { if(x0>0 && x0<rWnd.right) { UINT32 Defog= pDefogMap[nSrc+(x0-rFog.left)] & 0xFF; UINT32 Pixel= pFogPixels[nDst+x0] & 0xFF; Pixel= (Pixel+Defog > 0xFF) ? 0xFF : Pixel+Defog; pFogPixels[nDst+x0]= Pixel; } } } nSrc+= DefogRadius*2; } return(Err); }

The final draw step is to convert pFogPixels[] into the pbmFog bitmap on the gpu and let the FogEffect pixel shader copy pbmField into pbmImg, using pbmFog as the alpha channel. pbmFog is created and destroyed every frame. Note that CreateBitmapComposite() was changed to allow an optional pPixels argument, which are the source pixels in main memory.

cpu DrawFog(): int PxlShader::DrawFog(void) { int Err= ERR_OK; ID2D1Bitmap1 *pbmFog= 0; if(IsErr(CreateBitmapComposite(pdcDraw,SizeU(RWID(rWnd),RHGT(rWnd)),pbmFog,pFogPixels))) { Err= Warn(Err,"PxlShader:DrawFog: Unable to create pbmFog."); } else { pFog->SetInput(0,pbmFog); pFog->SetInput(1,pbmField); pdcDraw->SetTarget(pbmImg); pdcDraw->DrawImage(pFog); SafeRelease(pbmFog); } return(Err); }

cpu FogShader: /*************************************************************************/ /** FogShader.hlsl: Fog of War **/ /** (C)2022 nlited systems, cmd **/ /*************************************************************************/ #define D2D_INPUT_COUNT 2 #include "d2d1effecthelpers.hlsli" // Input0 is an alpha-map, BLUE is the alpha blend channel. // Input1 is the field image, copied to the output depending on the Input0.blue value. D2D_PS_ENTRY(main) { float4 AlphaMap= D2DGetInput(0); float4 Pixel= D2DGetInput(1); if(AlphaMap.g) { // Any g means b overflowed. Pixel.a= 1.0; // Alpha is saturated, rgb from Input1. } else if(AlphaMap.b) { Pixel.a= AlphaMap.b; // Copy alpha from Input0, rgb from Input1. } else { Pixel= 0; // Alpha blend is 0, output is transparent black. } Pixel.rgb*= Pixel.a; // Apply the alpha channel. return Pixel; } //EOF: FOGSHADER.HLSL

So there it is. I finally have a Fog of War solution that works and looks right. I am disappointed that it is not a purely gpu-based solution, but at least now I have something working. I will need to rig PxlShader up to my profiler library and see just how much cpu time that big memset(), ClearFog(), and DrawFog() are taking. Very interested in knowing.

Under the Scope

A first look at the performance numbers. Keep in mind I am running in VMware, the optimizer is currently disabled, PxlShader is running in a small window, there are only 10 objects, and this is the first look.

At first glance, the ClearFog() looks pretty good. DrawFog() is the long pole. The entire DrawUpdate() cycle takes 1425us, of which 1000us is spent in the call to CreateBitmap(). Each call to ClearFog() takes only 15us, so even though a deep dive into optimizing the hell out of that function is tempting it would not move the needle on cpu usage. The big memset() happens at the beginning of DrawUpdate() before the first call to ClearFog(), which is at most 32us.

This first glance tells me it takes only about 30us to clear pbmFog and 1000us (more than 30X) to transfer the pixels and create the bitmap. This is both good and bad news. The good news: Implementing fog as a hybrid cpu/gpu operation is feasible, I could theoretically clear almost 1000 fog circles in 30ms. The bad news: Calling CreateBitmap() inside the DrawUpdate() cycle eats up a full 1ms. This is a one-time overhead cost that scales with the size of the window, not the number of objects.

The big memset() is taking about 80us. The individual ClearFog() calls are about 114us, and the DrawFog() takes 380us to create the bitmap.

This is confusing because the cpu operations seem to take longer. I took another look at the vmware trace and it looks like the ClearFog() was averaging about 50us, and natively the average is about 100us. I have no idea why vmware would be running faster. This does not look like a cpu cycle scaling problem, both traces show MsgTimer() averaging 31ms. Maybe a core priority or scheduling thing? It is strange.

Setting the cpu times aside, it is not surprising that the CreateBitmap() transfer runs faster natively.

I ran PxlShader natively with a very large window (3000x2000). The memset() grew to about 3ms and the CreateBitmap() to 4.67ms. So between the two of them, that is nearly 25% of my 30ms frame budget.

Ideally, the memset() would occur in a separate thread (on a different core) after the CreateBitmap() completes. Then it could happen while the cpu is waiting for the SwapChain Present(). There is no way to hide the CreateBitmap(), it needs to happen after the last ClearFog() and before the final blend in DrawFog().

I enabled the optimizer and ran PxlShader natively: memset 43us, ClearFog 18us, CreateBitmap 205us. This is why the optimizer should always be enabled, FFS.

I changed the fog to black, field to white, and fixed a minor bug in the shader.