Anatomy of Persistence

#include <some_header_files>
    int main(int argc, char *argv[]) {
       sn::some_type_t object;
       write( somewhere, object, ... );
       ...
       for( size_t i=0; i<huge_number; i+=batch_size)
          read( somewhere, object, ...);
    }

take a program
with an object, having some memory layout
and intention to save its state to some device
or retrieve the data

Properties of the Object

can be categorized by

Content: homogeneous vs heterogeneous
Placement in memory: Contiguous vs Non contiguous
How much space is used in total
Number of dimensions or rank

Type trait utility?

std::is_compound != heterogeneous
std::vector<int>().data() points to a contiguous memory, but when T = std::string it doesn't

purely trait based approach requires the type available upfront, making it less powerful then if we could detect the presence of certain methods

heterogeneous types such as plain old struct, class, union...

require access to field names and may reside in non-contiguous memory. possible layouts:

Table approach, where each row is a heterogeneous datatype leads to fast indexing by rows.
OTOH accessing a single field will lead to I/O bandwidth loss.

Contiguous memory: 1 block heterogeneous

Column layout provides efficient access by fields, with the added implementation complexity of each dataset per columns

non-contiguous memory: 3 separate blocks, each homogeneous

 //a vector of pod struct
    struct coo_t {
       size_t row;
       size_t column;
       double value;
    };
    std::vector<coo_t> sparse_matrix;

 // each field of the struct is a vector
    struct csc_t {
       std::vector<size_t> rowind; // row indices
       std::vector<size_t> colptr; // start of new columns
       std::vector<double> values; // nonzero values
    };
    csc_t sparse_matrix;

NOTE: coordinate of points is efficient for incremental construction. Whereas Armadillo C++ arma::SpMat<T> uses Compressed Sparse Column representation

We need Introspection / Reflection method to retrieve field names of C++ class types

C++ mailing list and papers categorized as related to reflection

2019 8 entries

P1390R0 Suggested Reflection TS NB Resolutions Matúš Chochlík, Axel Naumann, and David Sankel
P1390R1 Reflection TS NB comment resolutions: summary and rationale Matúš Chochlík, Axel Naumann, and David Sankel
N4818 Working Draft, C++ Extensions for Reflection David Sankel
P1733R0 User-friendly and Evolution-friendly Reflection: A Compromise David Sankel, Daveed Vandevoorde
P1749R0 Access control for reflection Yehezkel Bernat
P1240R1 Scalable Reflection in C++ Daveed Vandevoorde, Wyatt Childers, Andrew Sutton, Faisal Vali, Daveed Vandevoorde
P1887R0 Typesafe Reflection on attributes Corentin Jabot

2018 13 entries

P0194R5 Static reflection by Matúš Chochlík, Axel Naumann, David Sankel
P0670R2 Static reflection of functions by Matúš Chochlík, Axel Naumann, David Sankel
P0954R0 What do we want to do with reflection? Bjarne Stroustrup
P0993R0 Value-based Reflection Andrew Sutton, Herb Sutter
P0572R2 Static reflection of bit fields Alex Christensen
P0670R3 Function reflection Matúš Chochlík, Axel Naumann, David Sankel
P1240R0 Scalable Reflection in C++ Andrew Sutton, Faisal Vali, Daveed Vandevoorde

2017 14 entries

P0194R3 Static reflection by Matúš Chochlík, Axel Naumann, David Sankel
P0385R2 Static reflection: Rationale, design and evolution by Matúš Chochlík, Axel Naumann, David Sankel
P0578R0 Static Reflection in a Nutshell by Matúš Chochlík, Axel Naumann, David Sankel
P0590R0 A design static reflection: Andrew Sutton, Herb Sutter
P0598R0 Reflect Through Values Instead of Types by Daveed Vandevoorde

2016 19 entries

P0194R0 Static reflection (revision 4) Matus Chochlik, Axel Naumann
P0255R0 C++ Static Reflection via template pack expansion Cleiton Santoia Silva, Daniel Auresco
P0256R0 C++ Reflection Light Cleiton Santoia Silva
P0327R0 Product types access Vicente J. Botet Escriba
P0341R0 parameter packs outside of templates Mike Spertus
Static reflection: Rationale, design and evolution Matúš Chochlík, Alex Naumann

introspection/reflection is non-trivial, not yet available as a language feature

LLVM/CLANG lib tooling based static reflection

...
    StatementMatcher h5templateMatcher = callExpr( allOf(
       hasDescendant( declRefExpr( to( varDecl().bind("variableDecl")  ) ) ),
       hasDescendant( declRefExpr( to(
          functionDecl( allOf(
            eachOf(
    		hasName("h5::write"), hasName("h5::create"), hasName("h5::read"),
                hasName("h5::append"),
    		hasName("h5::awrite"), hasName("h5::acreate"), hasName("h5::aread")
    	),
    ... ));

identify the relevant nodes
marked by I/O operators
visit the structure in reverse topological order
emit the templates describing the class with fields and types

P0993r0 Value-based Reflection, Andrew Sutton, Herb Sutter:

static reflection is a programming facility that exposes read-only data about entities in a translation unit compile-time values. Static reflection does not require support for runtime compilation since reflection values can be used with existing generative facilities (i.e., templates) or additional generative facilities (i.e., metaprogramming) to produce new code.

dynamic reflection:provides information for navigating source-code data structures at runtime. Language supporting dynamic reflection also tend to make additional facilities available for generating and JIT-compiling new code. Dynamic reflection and code generation are not in the scope of this work.

How about Containers? Let's take a look at N4436 C++ Detection Idiom

It is possible to identify if a container is STL like, provides direct access to its contiguous storage -- as std::vector<T> does, or alternatively iterators for scatter/gather operations

identify if there is direct access to contiguous memory
or iterator for non-contiguous layouts


    template <typename T> using value_type_f = typename T::value_type;
    template <typename T> using data_f = decltype(std::declval <T>().data());
    template <typename T> using size_f = decltype(std::declval <T>().size());
    template <typename T> using begin_f = decltype(std::declval <T>().begin());
    template <typename T> using end_f = decltype(std::declval <T>().end());
    template <typename T> using cbegin_f = decltype(std::declval <T>().cbegin());
    template <typename T> using cend_f = decltype(std::declval <T>().cend());

    template <typename T> using value = compat::detected_or <T, value_type_f, T>;
    template <typename T> using has_value_type = compat::is_detected <value_type_f, T>;
    template <typename T> using has_data = compat::is_detected <data_f, T>;
    template <typename T> using has_direct_access = compat::is_detected <data_f, T>;
    template <typename T> using has_size = compat::is_detected <size_f, T>;
    template <typename T> using has_begin = compat::is_detected <begin_f, T>;
    template <typename T> using has_end = compat::is_detected <end_f, T>;
    template <typename T> using has_cbegin = compat::is_detected <cbegin_f, T>;
    template <typename T> using has_cend = compat::is_detected <cend_f, T>;

    template <typename T> using has_iterator = std::integral_constant <bool, has_begin <T>::value && has_end <T>::value >;
    template <typename T> using has_const_iterator = std::integral_constant <bool, has_cbegin <T>::value && has_cend <T>::value >;

_{credit: WG21 N4436 C++ Detection Idiom by Walter Brown}

C++ Linear Algebra Systems calling BLAS/LAPACK specialized containers

are dedicated category, as they all must provide mechanism to pass/receive data to/from some BLAS system call, however the naming varies from system to system.

The differences can be mitigated with a combination of

type traits
feature detection idiom

library	direct access	vector size
arma	memptr()	n_elem
eigen	data()	size()
blaze	data()	n/a
blitz	data()	size()
itpp	_data()	length()
ublas	data().begin()	n/a
dlib	(0,0)	size()

H5CPP

LLVM based static reflection tool
C++ templates with CRUD like operators

take a header file with POD struct


typedef unsigned long long int MyUInt;
namespace sn {
	namespace example {
		struct Record {
			MyUInt               field_01;
			char                 field_02;
			double            field_03[3];
			other::Record field_04[4];
		};
	}
}

typedefs are fine
nested namespace are OK
mapped to : H5T_NATIVE_CHAR
H5Tarray_create(H5T_NATIVE_DOUBLE,1, ... )
first `other::Record` is parsed: type_hid_t = ...
then the generated type is used: H5Tarray_create(type_hid_t, ...)

write your program

write your cpp program as if `generated.h` were already written 
#include "some_header_file.h"
#include <h5cpp/core>
	#include "generated.h"
#include <h5cpp/io>
int main(){
	std::vector<sn::example::Record> stream =
		...
	h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);
	h5::pt_t pt = h5::create<sn::example::Record>(
		fd, "stream of struct",
		h5::max_dims{H5S_UNLIMITED,7}, h5::chunk{4,7} | h5::gzip{9} );
	...
}

sandwich the not-yet existing `generated.h`
write the TU translation unit as usual
using the POD type with one of the H5CPP CRUD like operators:
h5::create | h5::write | h5::read | h5::append | h5::acreate | h5::awrite | h5::aread
will trigger the `h5cpp` compiler to generate code

A header file with HDF5 Compound Type descriptors:

#ifndef H5CPP_GUARD_ErRrk
#define H5CPP_GUARD_ErRrk
namespace h5{
    template<> hid_t inline register_struct(){
        hsize_t at_00_[] ={7};            hid_t at_00 = H5Tarray_create(H5T_NATIVE_FLOAT,1,at_00_);
        hsize_t at_01_[] ={3};            hid_t at_01 = H5Tarray_create(H5T_NATIVE_DOUBLE,1,at_01_);
        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (sn::typecheck::Record));
        H5Tinsert(ct_00, "_char",	HOFFSET(sn::typecheck::Record,_char),H5T_NATIVE_CHAR);
		...
		H5Tclose(at_03); H5Tclose(at_04); H5Tclose(at_05); 
        return ct_02;
    };
}
H5CPP_REGISTER_STRUCT(sn::example::Record);
#endif

random include guards
within namespace
template specialization for h5::operators
compound types are recursively created
calls the template specialization when h5::operator needs it

## templates <small>overview</small>
```cpp
[file]      h5::fd_t h5::open( const std::string& path,  H5F_ACC_RDWR | H5F_ACC_RDONLY [, const h5::fapl_t& fapl] );
[group]     h5::gr_t h5::gopen( const h5::fd_t | h5::gr_t& location, const std::string& path [, const h5::gapl_t& gapl] );
[dataset]   h5::ds_t h5::open( const h5::fd_t | h5::gr_t& location, const std::string& path [, const h5::dapl_t& dapl] );
[attribute] h5::at_t h5::aopen(const h5:ds_t | h5::gr_t& node, const std::string& name [, const & acpl] );
```
```
[file]      h5::fd_t h5::create( const std::string& path, H5F_ACC_TRUNC | H5F_ACC_EXCL, [, const h5::fcpl_t& fcpl] [, const h5::fapl_t& fapl]);
[group]     h5::fd_t h5::gcreate( const h5::fd_t | const h5::gr_t, const std::string& name
            [, const h5::lcpl_t& lcpl] [, const h5::gcpl_t& gcpl] [, const h5::gapl_t& gapl]);
[dataset]   template <typename T> h5::ds_t h5::create<T>( 
    const h5::fd_t | const h5::gr_t& location, const std::string& dataset_path, dataspace, 
    [, const h5::lcpl_t& lcpl] [, const h5::dcpl_t& dcpl] [, const h5::dapl_t& dapl] );
[attribute] template <typename T> h5::at_t acreate<T>( const h5::ds_t | const h5::gr_t& | const h5::dt_t& node, const std::string& name
    [, const h5::current_dims{...} ] [, const h5::acpl_t& acpl]);
```
```
[dataset] template <typename T> T h5::read( const h5::ds_t& ds
    [, const h5::offset_t& offset]  [, const h5::stride_t& stride] [, const h5::count_t& count]
    [, const h5::dxpl_t& dxpl ] ) const;
template <typename T> h5::err_t h5::read( const h5::ds_t& ds, T& ref 
    [, const [h5::offset_t& offset]  [, const h5::stride_t& stride] [, const h5::count_t& count]
    [, const h5::dxpl_t& dxpl ] ) const;
[attribute] template <typename T> T aread( const h5::ds_t& | const h5::gr_t& | const h5::dt_t& node, 
    const std::string& name [, const h5::acpl_t& acpl]) const;
template <typename T> T aread( const h5::at_t& attr [, const h5::acpl_t& acpl]) const;
```
```
[dataset] template <typename T> void h5::write( dataset,  const T& ref
    [,const h5::offset_t& offset] [,const h5::stride_t& stride]  [,const& h5::dxcpl_t& dxpl] );
template <typename T> void h5::write( dataset, const T* ptr
    [,const hsize_t* offset] [,const hsize_t* stride] ,const hsize_t* count [, const h5::dxpl_t dxpl ]);

[attribute] template <typename T> void awrite( const h5::ds_t& | const h5::gr_t& | const h5::dt_t& node, 
    const std::string &name, const T& obj  [, const h5::acpl_t& acpl]);
template <typename T> void awrite( const h5::at_t& attr, const T* ptr [, const h5::acpl_t& acpl]);
```

rich set of
HDF5
property lists

Comma Separated Values to HDF5

#include "csv.h"
#include "struct.h"
#include <h5cpp/core>      // has handle + type descriptors
	#include "generated.h" // uses type descriptors
#include <h5cpp/io>        // uses generated.h + core 

int main(){
	h5::fd_t fd = h5::create("output.h5",H5F_ACC_TRUNC);
	h5::ds_t ds = h5::create<input_t>(fd,  "simple approach/dataset.csv",
				 h5::max_dims{H5S_UNLIMITED}, h5::chunk{10} | h5::gzip{9} );
	h5::pt_t pt = ds;
	ds["data set"] = "monroe-county-crash-data2003-to-2015.csv";
	ds["cvs parser"] = "https://github.com/ben-strasser/fast-cpp-csv-parser";

	constexpr unsigned N_COLS = 5;
	io::CSVReader<N_COLS> in("input.csv"); // number of cols may be less, than total columns in a row, we're to read only 5
	in.read_header(io::ignore_extra_column, "Master Record Number", "Hour", "Reported_Location","Latitude","Longitude");
	input_t row;                           // buffer to read line by line
	char* ptr;      // indirection, as `read_row` doesn't take array directly
	while(in.read_row(row.MasterRecordNumber, row.Hour, ptr, row.Latitude, row.Longitude)){
		strncpy(row.ReportedLocation, ptr, STR_ARRAY_SIZE); // defined in struct.h
		h5::append(pt, row);
	}
}

CSV header only library by Ben Strasser, and a type definition for the record
h5cpp includes
translation unit, the program
create HDF5 container, and dataset
decorate it with attributes
do I/O operations within a loop

Attributes:

do the right thing. Here are some examples, and come with an easy to use operator:

h5::ds_t ds = h5::write(fd,"some dataset with attributes", ... );
ds["att_01"] = 42 ;
ds["att_02"] = {1.,3.,4.,5.};
ds["att_03"] = {'1','3','4','5'};
ds["att_04"] = {"alpha", "beta","gamma","..."};
ds["att_05"] = "const char[N]";
ds["att_06"] = u8"const char[N]áééé";
ds["att_07"] = std::string( "std::string");
ds["att_08"] = record; // pod/compound datatype
ds["att_09"] = vector; // vector of pod/compound type
ds["att_10"] = matrix; // linear algebra object

obtain a handle by h5::create | h5::open | h5::write
rank N objects, even compound types when h5cpp compiler used
arrays of various element types
mapped to rank 0 variable length character types

## C++ string types to HDF5 type

|C++ type      |         |HDF5 type   |compact | contiguous | chunked |
|--------------|---------|------------|--------|------------|---------|
|`std::string`   |scalar    |VL string   |yes|yes|maybe|
|`std::string[N]`|array    |VL string   |yes|yes|maybe|
|`std::array<std::string,N>`|array |VL string|yes|yes|maybe|
|`const char* var[N]`| array|VL string | yes |yes|maybe|
|`std::vector<std::string>`|hypercube|VL string|yes|yes|yes|
|string literal|scalar |FL string|yes|yes|maybe|
|`char[M]`| scalar| FL string| yes | yes |maybe|
|`char[N][M]` | array | FL string| yes|yes|maybe|
|`char[N][M][..]` | array | FL string| yes|yes|maybe|
|`std::initializer_list<std::string>{}` | array | VL string| yes|yes|maybe|
|`std::initializer_list<char[N]>{}` | array | FL string| yes|yes|maybe|

- FL is null terminated fixed length
- VL is null terminated variable length
- `maybe` you have to define dimensions and chunks ie:</br> `h5::create<char[10]>(h5::fd_t,.., h5::current_dims{..}, h5::chunk{..})`

**observe:** `[unsigned | signed] char` are mapped to `H5T_NATIVE_CHAR`

## Testing against C++17 compilers

```
steven@honshu:~/projects/h5cpp/tests$ make test-with-compilers
-------------------------------------------------------------------------------- ----------
compiler                                                                         error code
-------------------------------------------------------------------------------- ----------
Intel(R) oneAPI DPC++ Compiler 2021.1-beta03 (2019.10.0.1121)Target: x86_64-unkn  [  OK  ]
icpc (ICC) 19.1.0.166 20191121Copyright (C) 1985-2019 Intel Corporation.  All ri  [ FAIL ]
g++-7 (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0Copyright (C) 2017 Free Software Founda  [  OK  ]
g++-8 (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0Copyright (C) 2018 Free Software Founda  [  OK  ]
g++-9 (Ubuntu 9-20190428-1ubuntu1~18.04.york0) 9.0.1 20190428 (prerelease) [gcc-  [  OK  ]
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)Target: x86_64-pc-linux-gnu  [  OK  ]
clang version 7.1.0-svn353565-1~exp1~20190406090509.61 (branches/release_70)Targ  [  OK  ]
clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)Target: x86_64-pc-li  [  OK  ]
clang version 9.0.0-2~ubuntu18.04.2 (tags/RELEASE_900/final)Target: x86_64-pc-li  [  OK  ]
clang version 10.0.1-++20200507062652+bab8d1790a3-1~exp1~20200507163249.158 Targ  [  OK  ]
pgc++ 19.10-0 LLVM 64-bit target on x86-64 Linux -tp haswell PGI Compilers and T  [ FAIL ]
-------------------------------------------------------------------------------- ----------
```