Templates and the C++ syntax tree
In my last few posts I’ve been writing about C++0x templates. Specifically I’ve been talking about how to make template classes easier to read by allowing more-imperative/less-declarative and more-iterative/less-recursive class definitions. I’ve talked about extending the … packing operator and adding code-generating scripting to the compiler.
The C++ compiler already has “scripting” of sorts, in the form of the preprocessor. But this is very limited, a declarative language without recursion (or iteration), at least not without very clever hackery (see the Boost Preprocessor BOOST_PP_.. macros and my earlier post about using BOOST_PP_.. macros).
The usual way of generating C++ code is with templates, which are declarative and allow full recursion. Templates should be all you need, but they are surprisingly difficult to use. This is partly due to syntax, but mainly due to the fact that templates only deal with large syntactic sub-trees (class and function definitions). You cannot use a template if you just want to generate an expression or an argument list. Also templates only have integers and classes (structs) available as data structures (although MPL helps here a little), and that template selection (pattern matching) is primitive (you cannot easily match patterns buried inside structures or lists).
In a previous post I talk about how convenient it would be if we could use the … packing operator to generate member variables for the tuple class.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
// Member variables - not a legal use of ...
private:
TYPEs ...
values;
/* .. etc .. */
};
It’d be nice if you could write a meta-function using templates to generate the sequence of member-declarations necessary (the BNF non-terminal for the sequence is called member-specification).
/* fantasy code -- will not work */
template< typename ... TYPEs >
struct
generate_member_variables
{
// Generate member declarations like this:
// TYPE0 member_0 ;
// TYPE1 member_1 ;
// TYPE2 member_2 ;
// .. etc ..
typedef
std::grammar::
generate_member_specification_from_types
< 0
, "member_"
, std::grammar::list< TYPEs ... >
>::eval
eval;
};
template< typename ... TYPEs >
class
tuple
{
private:
// The compiler will change the following struct
// into a sequence of member variables.
generate_member_variables< TYPEs ... >::eval;
/* .. etc .. */
};
To make this work you’d need a library of meta-functions (templated structs) and classes to represent parts of the C++ syntax tree. The compiler would also have to recognize these snippets of syntax tree and compile them into the program like they were a normal part of the token stream. (You’d also need to be able to manipulate lists, tokens, literals, identifiers, and strings at compile-time.)
In the above example we assume a meta-function library that lives in namespace std::grammar and can build a struct that the compiler will recognize as a sequence of member variable declarations.
With these changes you could specify member variables something like this.
/* fantasy code -- will not work */
using std::grammar::member_specification;
using std::grammar::member_declarator;
using std::grammar::type_name;
using std::grammar::identifier;
using std::grammar::list;
struct
my_struct
{
// example 1
// generates these member vars:
// int m_id;
// double m_price;
// char* p_name;
codegen typename
member_specification
< list< int, double, char* >
, list< "m_id", "m_price", "p_name" >
>::eval;
// example 2
// generates these member vars:
// my_class1 member_var1;
// my_class2 member_var2;
codegen typename
member_specification
< list
< member_declarator
< type_name< "my_class1" >
, identifier< "member_var1" >
>
, member_declarator
< type_name< "my_class2" >
, identifier< "member_var2" >
>
>
>::eval;
/* .. etc .. */
};
This example introduces the keyword codegen which tells the compiler to expect a special type. A little like typedef, ‘cept not really. When the compiler sees codegen it knows treat the type as a C++ syntax tree and expand it into inline code.
The C++ compiler could be much more code-generation friendly. Defining meta-classes and meta-functions to represent and manipulate parsed C++ code is a good first step, along with adding a way to inline these syntax-tree objects into the code. Boost MPL provides a good of example of how those meta-objects can be defined.
This would let you build small parts of the syntax tree, like parameter lists, expressions, and initialization lists that you could reuse. These in turn would make it easier to generate larger class and function templates.
Variadic tuple<..> — imperative meta programming
Since my last two posts (here and here) are about how … might unpack and generate code if it were very smart, I thought I’d bring up a more difficult case. How do you build templated constructors inside the variadic template tuple class?
Remember, this is all fantasy code. The … operator will not actually be able to do any of this. Also remember this problem can be solved by defining tuple using a recursive first/rest typelist scheme (see the current Boost implementation).
At first glance it might appear that it’s easy to define a constructor. Just do this.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
// Member variables - not a legal use of ...
private:
TYPEs ...
values;
// Constructor
public:
tuple( TYPEs && ... args)
: values( std::forward( args)) ...
{ }
/* .. etc .. */
};
This is pretty good, but we’d like to make all the arguments optional. So we could do this.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
// Member variables
private:
TYPEs ...
values;
// Constructor
public:
tuple( TYPEs && ... args = TYPEs( ))
: values( std::forward( args)) ...
{ }
/* .. etc .. */
};
I think the double … expansion in the parameter list works. I’m pretty sure you only specify … immediately before the variable to be packed, and that all the already-packed variables in the expression will be unpacked at the same time. So you don’t specify … again after TYPEs( ).
In any case, this still needs improvement. This requires that all TYPEs have a copy constructor even if they are going to be default constructed. We really want a bunch of constructors, taking zero, one, two, etc arguments.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
// Member variables
private:
TYPEs ...
values;
// Generate a family of constructors (not legal C++).
// If TYPEs has three types <A, B, C> this
// generates four constructors:
// constructor( )
// constructor( A &&)
// constructor( A &&, B &&)
// constructor( A &&, B &&, C &&)
for... ( size_t
N = 0
; N <= sizeof...( TYPEs )
; ++ N )
{
public:
tuple( TYPEs && unpack...( N ) args)
: values( std::forward( args)) ...
{ }
}
/* .. etc .. */
};
Yikes, that’s not C++, and we’re not in Kansas anymore. In this fantasy code for… is a meta keyword, interpreted by the compiler and producing a series of constructors. And unpack…(N) only unpacks the first N items in the packed variable TYPEs, and not the entire list.
Instead of for… in the above example we could use plain old for and say the compiler should run code that is not in a code-defining context. But we’d still need to quote or emit the tokens inside the loop above. Something like this.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
/* .. etc .. */
for ( size_t
N = 0
; N <= sizeof...( TYPEs )
; ++ N )
{
emit {
public:
tuple( TYPEs && unpack...( N ) args)
: values( std::forward( args)) ...
{ }
}
}
/* .. etc .. */
};
This implementation is still incomplete however. The constructor methods themselves should be templated so you can construct tuples from values with convertible types. With that in mind, here is the last implementation.
/* fantasy code -- will not work */
template< typename ... TYPEs >
class
tuple
{
// Member variables
private:
TYPEs ...
values;
// Generate a family of constructors.
// If TYPEs has three types <A, B, C> this
// generates four constructors with compatible
// types:
// template< >
// constructor( )
// template< typename AC >
// constructor( AC &&)
// template< typename AC, typename BC >
// constructor( AC &&, BC &&)
// template< typename AC, typename BC, typename CC >
// constructor( AC &&, BC &&, CC &&)
for... ( size_t
N = 0
; N <= sizeof...( TYPEs )
; ++ N )
{
public:
template< typename generate...( N ) ARG_TYPEs >
tuple( ARG_TYPEs && ... args)
: values( std::forward( args)) ...
{ }
}
/* .. etc .. */
};
This introduces the fantasy generate...(N) meta function to generate a template parameter list of N typenames. This also unpacks values and args at the same time (and in parallel) even though values is packed from TYPEs while args is packed from ARG_TYPES. And args usually packs fewer names than values.
This kind of imperative code generation is sometimes easier to create and understand than the declarative templates C++ currently provides. But it also adds more syntax and keywords to C++, which is plenty complicated already. And a declarative solution that isn't as restrictive as C++ templates may be a better than this imperative suggestion. So I'm not advocating any of this stuff. I'm just thinking out loud, writing it down to see what it looks like.
Using BOOST_PP_.. macros to define an array class with constructors
Sometimes when programming you just want an array of values, and you want to be able to construct the individual elements of the array, which isn’t easy with a raw array.
You could use std::tr1::tuple<..>, or even std::pair<..> as your “array”, although it will have to be small (tuples are limited to 10 values). Or you could work with uninitialized bytes and invoke the constructors explicitly with new( pvoid_raw_memory) type( ctor_params) or std::uninitialized_fill_n(..) or allocator<type>.construct(..).
You cannot use boost::array<..> if you want to use the element constructors. boost::array<..> has no constructors (besides the auto-generated copy and default ctors). This design choice was made so boost::array<..> objects can be initialized with the plain-old-data (pod) initializer lists.
You could define a quick class with a bunch of repeated elements that works like an array. Templates and macros make this job easier. This article shows how you can an array-like class with macros.
I’m going to use the Boost Preprocessor BOOST_PP_.. macros to declare repetitious parts of the class.
# include <boost/static_assert.hpp> # include <boost/preprocessor/cat.hpp> # include <boost/preprocessor/repetition/repeat.hpp> # include <boost/preprocessor/repetition/enum.hpp> # include <boost/preprocessor/repetition/enum_binary_params.hpp> # include <boost/preprocessor/facilities/intercept.hpp>
These headers give us the tools to define the macro DEFINE_CLASS_ARRAY(..) which declares a class that acts like a fixed-size array except it has a constructor that takes as many args as the array is big. So an array with 57 elements will have a constructor function with 57 parameters.
# define PRINT_CLASS_ARRAY_INIT_ELEM_( Z, N, D) \
BOOST_PP_CAT( _, N) ( BOOST_PP_CAT( init, N) )
# define PRINT_CLASS_ARRAY_MEMBER_DECL_( Z, N, D) \
element_type BOOST_PP_CAT( _, N) ;
# define DEFINE_CLASS_ARRAY( \
CLASS_NAME, \
ELEM_COUNT, \
ELEM_TYPE, \
PARAM_TYPE, \
DEFAULT_VALUE \
) \
class CLASS_NAME \
{ \
public: \
typedef ELEM_TYPE element_type; \
\
CLASS_NAME \
( BOOST_PP_ENUM_BINARY_PARAMS( ELEM_COUNT \
, PARAM_TYPE init \
, = DEFAULT_VALUE BOOST_PP_INTERCEPT \
) ) \
: BOOST_PP_ENUM( ELEM_COUNT \
, PRINT_CLASS_ARRAY_INIT_ELEM_ \
, ~ \
) \
{ } \
\
public: \
BOOST_PP_REPEAT( ELEM_COUNT \
, PRINT_CLASS_ARRAY_MEMBER_DECL_ \
, ~ \
) \
} /* end of macro DEFINE_CLASS_ARRAY(..) */
Here is how you might use the above macro to define a class that acts like an array of 5 integers.
// Define the class
DEFINE_CLASS_ARRAY(
five_ints_class, 5, int, const int&, 3);
// Same size as an array of 5 ints.
BOOST_STATIC_ASSERT(
sizeof( five_ints_class) == sizeof( int[ 5 ]));
five_ints_class five_ints_0;
five_ints_class five_ints_1( 0);
five_ints_class five_ints_2( 0, 1);
five_ints_class five_ints_3( 0, 1, 2);
five_ints_class five_ints_4( 0, 1, 2, 3);
five_ints_class five_ints_5( 0, 1, 2, 3, 4);
// Too many args:
// five_ints_class five_ints_6( 0, 1, 2, 3, 4, 5);
// Cannot init with pod because it has a ctor:
// five_ints_class from_pod = { 1,2,3,4,5 };
This is just a quick hack and could be improved to (partly) mimic a standard container with additional typedefs, operators, methods, and associated iterators.
These array classes have no virtuals, no supertype, and no members besides the array elements. The resulting class is the same size as a raw array, and you could cautiously cast between the two types.
This defines classes with ctors that have default parameter values, but you could easily leave the defaults off. Or you could define the constructor as a template method and get rid of PARAM_TYPE from the macro. Or you could assume PARAM_TYPE is const ELEM_TYPE&.