dev.nlited.com

>>

Public Interfaces and Private Code (PIPC)

2017-02-18 21:07:08 chip Page 1964 📢 PUBLIC

Feb 19 2017

See also: HerbSutter.com

The Problem: Exposing Private Details

A fundamental problem with C++ is that is forces the internal details of the implementation of the object to be published in the class declaration. Functions and member data that are intended for strictly internal use must be exposed as "private" members of the public class declaration, including all derived classes. This is required so the compiler knows how much memory to allocate for the final version of the object, which must include both public and private data members (and virtual functions) of the entire family tree of the object.

As a concrete example, below is a simplified version of a class I wrote to help build MySQL queries. (The code has been formatted for simplicity and brevity.) The queries can be very long blocks of text that need to be assembled in pieces. The DbText class uses an internal DbTextHeap class to manage the text buffers. The DbTextHeap class is used nowhere else in the project, but must be included in the declaration so the references to the DbText class can compile.


DbText.h - Typical declaration: class DbTextHeap { public: DbTextHeap(void); ~DbTextHeap(void); Add(WCHAR Chr); Add(const WCHAR *Text); private: __inline bool Grow(UINT ChrCt) { return(ct+ChrCt+1 < sz ? true:Grow2(ChrCt)); }; bool Grow2(UINT ChrCt); UINT sz; //Allocated size (characters) of pHeap[] UINT ct; //Valid characters in pHeap[] WCHAR *pHeap; //Allocated storage }; class DbText { public: DbText(void); ~DbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: DbTextHeap Heap; };

The DbTextHeap class is intended to be used only within the context of DbText. It must be included to allow the code to compile, but its presence only complicates life for the programmer and the project.

This causes big problems in the real world:

Virtual Interfaces

A common approach to defining interfaces is to create an "interface" class where all the functions are virtual, then deriving an "implementation" class which provides the actual functions.

Virtual Function Interface: class IDbText { public: IDbText(void); virtual ~IDbText(void); virtual const WCHAR *Text(void); virtual const WCHAR *Add(const WCHAR *NewText); };

Then somewhere else:

Virtual Function Implementation: class DbTextHeap { ... }; class DbText:IDbText { public: DbText(void); ~IDbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: DbTextHeap Heap; }

This approach does nothing to solve the problem of exposing the private members of the implementation. I still need to expose the presence of DbTextHeap as a private member of the DbText class, which still needs to be visible in a global header file somewhere. This is because the C++ compiler must know the size of the final derived class in order to know how much memory (stack) to allocate when the object is instantiated. IMHO, this approach is simply an admission that there is a problem, with a "solution" that makes the ComSci professors happy without actually solving the real problem.

Straight-C Interfaces

My traditional approach has been to define the public interface as a set of straight-C functions. These functions take an opaque "handle" which is actually a pointer to a private C++ class. The C interface functions cast the handle to a C++ pointer and use it to invoke the class member functions.

DbText.h: typedef void *HTEXT; EXTERNC int DbTextCreate(HTEXT *phText); EXTERNC int DbTextDestroy(HTEXT hText); EXTERNC const WCHAR *DbText(HTEXT hText); EXTERNC const WCHAR *DbTextAdd(HTEXT hText);

This style lets me keep the entire class declaration private, typically embedded in the cpp code file itself.

DbText.cpp: class DbTextHeap { public: DbTextHeap(void); ~DbTextHeap(void); Add(WCHAR Chr); Add(const WCHAR *Text); private: __inline bool Grow(UINT ChrCt) { return(ct+ChrCt+1 < sz ? true:Grow2(ChrCt)); }; bool Grow2(UINT ChrCt); UINT sz; //Allocated size (characters) of pHeap[] UINT ct; //Valid characters in pHeap[] WCHAR *pHeap; //Allocated storage }; class DbText { public: DbText(void); ~DbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: DbTextHeap Heap; }; //Public interface int DbTextCreate(HTEXT *phText) { *phText= (HTEXT)new DbText(); return(ERR_OK); } int DbTextDestroy(HTEXT hText) { delete (DbText*)hText; return(ERR_OK); } const WCHAR *DbText(HTEXT hText) { return(((DbText*)hText)->Text()); } const WCHAR *DbTextAdd(HTEXT hText, const WCHAR *NewText) { return(((DbText*)hText)->Add(NewText)); } //DbText implementation DbText::DbText(void) : Heap(DbTextHeap()) { } DbText::~DbText(void) { } const WCHAR *DbText::Text(void) { return(Heap.Text()); } const WCHAR *DbText::Add(const WCHAR *NewText) { Heap.Add(NewText); return(Heap.Text()); } //DbTextHeap implementation DbTextHeap::DbTextHeap(void) { ct= 0; sz= 100; pHeap= malloc(sz); pHeap[ct]= 0; } DbTextHeap::~DbTextHeap(void) { free(pHeap); } const WCHAR *DbTextHeap::Text(void) { return(pHeap); } void DbTextHeap::Add(const WCHAR *Text) { UINT TextSz= wcslen(Text); if(Grow(TextSz)) { wcscpy(&pHeap[ct],Text); ct+= TextSz; } } bool DbTextHeap::Grow2(UINT ChrCt) { UINT NewSz= sz+ChrCt+1; WCHAR *pNewHeap= (WCHAR*)malloc(NewSz*2); if(!pNewHeap) return(false); if(pHeap) { memcpy(pNewHeap,pHeap,(ct+1)*2); free(pHeap); } sz= NewSz; pHeap= pNewHeap; return(true); }

100% of the private implementation is now contained in a single file, DbText.cpp. No mention of the DbTextHeap class is found in any header file. This lets me work on the private implementation of the class without ever touching the public declaration unless the public interface changes. This style has worked very well. The disadvantage is that a distinct C interface function must be written for every public class function. (This could be viewed as an *advantage* since it is the perfect place to check the validity of the calling parameters, but there is no disguising the tedium of writing them.) This approach relies on the compiler to optimize away the inefficiency of inserting yet another stack frame whose only purpose is to recast a pointer.

Declaring the interface functions as straight-C requires the function names to be globally unique. The object pointer must always be passed explicitly as the first parameter to every function. These combine to make the function calls sometimes long and tedious.

C Interface Calling Example: #include "DbText.h" void MyFunction(void) { HTEXT hText; DbTextCreate(&hText); DbTextAdd(hText,L"select *"); DbTextAdd(hText,L" from parts"); DbTextAdd(hText,L" join part_manf using (PartID)"); DbTextAdd(hText,L" join manf using (ManfID)"); DbQuery(hDb,DbText(hText)); DbTextDestroy(hText); }

The biggest disadvantage of the C interface approach is that it requires explicit calls to the Create and Destroy functions. The backing C++ object is always allocated from the memory heap and must be explicitly freed. This approach cannot support an automatic local (stack) object that is implicitly destroyed when the frame drops out of scope.

C++ Interface Class

Now I want to have the convenience of automatic object creation and destruction without losing the advantages of a truly private implementation. This requires creating two completely independent classes, the public interface class and the private implementation class. It is impossible to derive the private class from the public without exposing the private members; the compiler must be able to allocate memory for only the public class and be done. Therefore, the public class must consist of only public function declarations plus a single private data member. This data is an opaque pointer to the (syntacticly) independent and undefined implementation object.

DbText.h: IDbText declaration: class IDbText { public: IDbText(void); ~IDbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: class DbText *pObj; };

The C++ compiler is perfectly happy with a pointer to a class that is never defined, all it needs to know is how much memory to allocate for the pointer. As long as the private implementation class is visible when the interface functions are linked, everything is fine. The implementation of the interface class consists of thin functions that simply call the implementation class functions through the private object pointer. These interface functions are *very* similar to the straight-C interface, with the advantage that I now have a tiny C++ object that can be declared as an automatic (local, stack) variable with implicit calls to my constructor and destructor functions. Declaring the private object pointer (pObj) as a pointer to a "class defined elsewhere" lets the debugger resolve the reference at runtime, which is much nicer than an opaque handle.

DbText.cpp: IDbText Implementation: #include "DbText.h" class DbTextHeap { ... same as above ... }; class DbText { ... same as above ... }; //IDbText implementation IDbText::IDbText(void) { pObj= new DbText(); } IDbText::~IDbText(void) { delete pObj; } const WCHAR *IDbText::Text(void) { return(pObj->Text()); } const WCHAR *IDbText::Add(const WCHAR *NewText) { return(pObj->Add(NewText)); } //DbText implementation ... Same as above ... //DbTextHeap implementation ... Same as above ...

From the caller's perspective, I am able to keep all the advantages of C++: The function names are shorter since they are within the scope of the interface class, function overloading by parameter type, optional parameters, and automatic creation and destruction.

Calling Example: #include "DbText.h" void MyFunction(void) { IDbText Tbl(); Tbl.Add(L"select *"); Tbl.Add(L"from part"); Tbl.Add(L" join part_manf using (PartID)"); Tbl.Add(L" join manf using (ManfID)"); DbQuery(hDb,DbText.Text()); }

The disadvantage of this approach is that I am now making twice as many memory allocations, one for the (tiny, 4 bytes) interface object and again for the backing implementation object. This may lead to heap fragmentation (although this risk is mostly mitigated by the fact that the vast majority of times the two allocations will occur sequentially), excessive memory consumption (the heap block header is larger than the interface object itself), and performance barnacles (calling malloc twice as often). On the other hand, it is more likely that the interface object will be allocated from the stack as a local variable while the implementation object will always be allocated from the memory heap. This is an extra call to malloc and free that would not happen if the entire public/private construct were allocated as a single entity on the stack.

Writing the C++ interface functions is still a tedious chore that cannot be avoided, although it is mostly a simple copy/paste exercise. The advantages of keeping the implementation truly private are well worth the effort, especially in cases where the implementation involves multiple dependent classes and objects.

Interface with Embedded Object

I can avoid allocating two separate objects by embedding storage for the implementation object inside the interface. I can't include any references to the implementation definition, so I need to declare the object as a fixed number of bytes.

Interface with embedded object: class IDbText { public: IDbText(void); ~IDbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: class DbText *pObj; BYTE Obj[64]; };

When the IDbText object is created, storage for the private DbText object will also be implicitly allocated in IDbText.Obj[]. Now I need to do two things:

  1. Raise an exception or return an error if the size of Obj[] is too small to hold DbText.
  2. Make sure that Obj[] is used to hold DbText rather than allocating additional memory.

The constructor for the interface object has visibility into the definition of the DbText object, so it is easy to compare the number of bytes allocated sizeof(Obj) against the size of the implmentation object sizeof(*pObj). If there are enough bytes, I use the placement version of the new() operator rather than the default new. This variation allows me to provide the pre-allocated memory address for a new object. Note that I need to add #include <new> to compile.

DbText constructor: #include <new> IDbText::IDbText(void) { pObj= 0; memset(Obj,0,sizeof(Obj); if(sizeof(Obj) < sizeof(*pObj)) { BREAK(ERR_TOO_SMALL,"IDbText: Obj(%d) is too small(%d)",sizeof(Obj),sizeof(*pObj)); } else { pObj= new(Obj) DbText(); } }

The BREAK() function should always raise an exception and break to the debugger, since it means the size of the DbText object has outgrown is storage and the size of IDbText.Obj[] needs to be expanded.

A nice side effect of this approach is that I can now rely on the DbText object being cleared to zero in all cases, including local stack instances.

Since the DbText object is not a distinct memory allocation, I need to make sure it is never destroyed using delete. This is easy since DbText can only be created or destroyed by IDbText. The IDbText destructor needs to call the DbText destructor without invoking delete. The DbText object will be implicitly freed when the interface object is freed.

DbText destructor: IDbText::~IDbText(void) { if(pObj) pObj->~DbText(); }

Review

This may seem like a lot of extra work, but in practice it isn't that much. The benefits of having a completely isolated private implementation are more than worth the effort.

The abbreviated and simplified public interface class is declared in a global header:

Public Interface IDbText.h: class IDbText { public: IDbText(void); ~IDbText(void); const WCHAR *Text(void); const WCHAR *Add(const WCHAR *NewText); private: class DbText *pObj; BYTE Obj[64]; };

The definition of the interface functions and the private implmentation class declaration and code are in the single DbText.cpp file. Note that this is also the only place where DbTextHeap is referenced. Changes can be made to DbText and DbTextHeap without affecting any other code in the project, including code that relies on IDbText. The project now only needs to be relinked, not recompiled.

Private Implementation DbText.cpp: #include <new> #include "IDbText.h" class DbTextHeap { public: DbTextHeap(void) { }; ~DbTextHeap(void); WCHAR *Text(void) { return(pHeap); }; void Add(const WCHAR *Text); private: __inline bool Grow(UINT ChrCt) { return(ct+ChrCt+1 < sz ? true:Grow2(ChrCt)); }; bool Grow2(UINT ChrCt); UINT sz; //Allocated size (characters) of pHeap[] UINT ct; //Valid characters in pHeap[] WCHAR *pHeap; //Allocated storage }; class DbText { public: DbText(void) { }; ~DbText(void) { }; const WCHAR *Text(void) { return(Heap.Text()); }; const WCHAR *Add(const WCHAR *NewText); private: DbTextHeap Heap; }; //Public interface IDbText::IDbText(void) { pObj= 0; memset(Obj,0,sizeof(Obj); if(sizeof(Obj) < sizeof(*pObj)) { BREAK(ERR_TOO_SMALL,"IDbText: Obj(%d) is too small(%d)",sizeof(Obj),sizeof(*pObj)); } else { pObj= new(Obj) DbText(); } } IDbText::~IDbText(void) { if(pObj) pObj->~DbText(); pObj= 0; } const WCHAR *IDbText::Text(void) { return(pObj ? pObj->Text():0); } const WCHAR *IDbText::Add(const WCHAR *NewText) { return(pObj ? pObj->Add(NewText):0); } //DbText implementation //Append to the existing text, reallocating pBuf[] if necessary. const WCHAR *DbText::Add(const WCHAR *NewText) { UINT ct= (UINT)wcslen(NewText); if(Heap.Grow(ct)) Heap.Add(NewText); return(Heap.Text()); } //DbTextHeap implementation IDbTextHeap::~IDbTextHeap(void) { MemFree(pHeap); } void IDbTextHeap::Add(const WCHAR *NewText) { wcscpy(&pHeap[ct],NewText); ct= (UINT)wcslen(pHeap); } bool IDbTextHeap::Grow2(UINT ExtraCt) { UINT NewSz= sz+ExtraCt+100; WCHAR *pNew= (WCHAR*)malloc(NewSz*sizeof(pHeap[0])); if(!pNew) return(false); if(pHeap) { memcpy(pNew,pHeap,(ct+1)*sizeof(pHeap[0])); free(pHeap); } sz= NewSz; pHeap= pNew; return(true); }
Example: #include "IDbText.h" void main(void) { IDbText text; text.Add(L"Hello, world!"); wprintf(L"%s\r\n",text.Text()); }

Setting Obj[]

The last piece of the puzzle is setting the size of the Obj[] storage in the interface. Since the interface and implementation classes are syntactly unrelated there is no way to automatically set the size. The compiler only knows of the existence of the implementation class as an object that can be pointed to, nothing about its details or size. I had been relying on a fatal run-time error to tell me the actual size of the implementation class, updating it, recompiling, and running again. This became a nuisance, especially with multiple interfaces inside other interfaces.

A better solution would be to report the size of the implementation class at compile time. But as useful as it would be in many situations, sizeof() is a runtime function and not a compiler directive. The compiler needs to be coerced into coughing up the information through some C++ trickery. The details are better explained by reading
How can I print the result of sizeof() at compile time?
Printing sizeof(T) at compile time

The end result is something like this:

Interface.h: #define MYIMPLEMENTATION_SIZE 16 class MyInterface { public: MyInterface(void); ~MyInterface(void); private: class MyImplementation *pObj; BYTE Obj[MYIMPLEMENTATION_SIZE]; };

Then in my implementation code I add the following two lines of boilerplate code just below the declaration for MyImplemention:

MyImplementation.cpp: class MyImplementation { ... }; //If this generates an error, MYIMPLEMENTATION_SIZE needs to be updated. template<int N> struct _CheckSize { short operator()() { return((N+0x7FFF)-MYIMPLEMENTATION_SIZE); } }; static void CheckSize(void) { _CheckSize<sizeof(MyImplementation)>()(); }

This bit of bizarro code tries to create a struct containing a short with the value of sizeof(MyImplementation)+0x7FFF-MYIMPLEMENTATION_SIZE. If sizeof(MyImplementation) is greater than MYIMPLEMENTATION_SIZE, this will overflow a short's maximum possible value and generate a cascade of compiler errors and warnings:

1>DeviceDisk.cpp(87): error C2220: warning treated as error - no 'object' file generated 1> DeviceDisk.cpp(87): note: while compiling class template member function 'short _CheckSize<432>::operator ()(void)' 1> DeviceDisk.cpp(88): note: see reference to function template instantiation 'short _CheckSize<432>::operator ()(void)' being compiled 1> DeviceDisk.cpp(88): note: see reference to class template instantiation '_CheckSize<432>' being compiled 1>DeviceDisk.cpp(87): warning C4309: 'return': truncation of constant value

Buried in the warning text is the actual size of the implementation class, in this case 432 bytes. I update MYIMPLEMENTATION_SIZE to 432 in Interface.h and recompile. if MYIMPLEMENTATION_SIZE is greater or equal to sizeof(MyImplementation) there are no warnings or errors.

The _CheckSize struct is only referenced within a the function CheckSize(). Since CheckSize() is never actually used, it should optimize away to nothing. (Assuming function-level linking is enabled.)

The final step is to convert CheckSize into a macro to make it easier to replicate.

CheckSize macro: #define CHECK_OBJ_SIZE(name,size) \ template<int N> struct name##_CheckSizeT { short operator()() { return((N+0x7FFF)-size); } }; \ static void name##_CheckSize(void) { name##_CheckSizeT<sizeof(name)>()(); }

The macro is invoked using the name of the implementation class and the interface Obj[] size.

CHECK_SIZE macro: #define MY_OBJ_SIZE 16 class MyInterface { public: MyInterface(void); ~MyInterface(void); private: class MyImplementation *pObj; BYTE Obj[MY_OBJ_SIZE]; }; class MyImplementation { public: MyImplementation(void); ~MyImplementation(void); }; CHECK_OBJ_SIZE(MyImplementation,MY_OBJ_SIZE);

MY_OBJ_SIZE is required because I cannot use sizeof() as a preprocessor argument. (Which is what started this whole thing in the first place.)



WebV7 (C)2018 nlited | Rendered by tikope in 36.399ms | 3.133.108.47