CP013: Add further proposed wording and minor interface changes.

Gordon · Gordon · commit 9a7d7c182eb6 · 2018-02-04T20:52:16.000Z
* Add structure for proposed wording on existing interface.
* Change execution_context::resource() and this_system::resource() to
return a const reference.
diff --git a/affinity/cpp-20/d0796r1.md b/affinity/cpp-20/d0796r1.md
@@ -17,12 +17,13 @@
 ### Revision 1
 
 * Introduce proposed wording.
+* Have `excution_context::resource()` and `this_system::resource()` return a `const execution_resource &`.
 
 # Abstract
 
 This paper provides an initial meta-framework for the drives toward memory affinity for C++, given the direction from Toronto 2017 SG1 meeting that we should look towards defining affinity for C++ before looking at inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
 
-## Affinity Matters
+# Motivation
 
 Processor and memory binding, also called 'affinity', can help the performance of an application for many reasons. Keeping a process bound to a specific thread and local memory region optimizes cache affinity and reduces context switching and unnecessary scheduler activity. Since memory accesses to remote locations incur higher latency and lower bandwidth, control of thread placement to enforce affinity within parallel applications is crucial to fuel all the cores and to exploit the full performance of the memory subsystem on Non-Uniform Memory Architectures (NUMA).
 
@@ -60,13 +61,13 @@ std::for_each(par, std::begin(a), std::end(a),
 ```
 *Listing 1: Motivational example*
 
-Now with the affinity interface we propose below and in future, we will hopefully find that there is significant increase in memory bandwidth when we have multiple threads by as much as 2x GB/s as thread count increases (using system call madvise on Sun systems to implement next touch policy to migrate the data close to the next executing thread). 
+Now with the affinity interface we propose below and in future, we will hopefully find that there is significant increase in memory bandwidth when we have multiple threads by as much as 2x GB/s as thread count increases (using system call madvise on Sun systems to implement next touch policy to migrate the data close to the next executing thread).
 
 The goal was that this would enable scaling up for heterogeneous and distributed computing in future. Indeed OpenMP [14] where one of the author participated in the design of its affinity model, has plans to integrate its affinity model with its heterogeneous model.[21]
 
-## Background Research: State of the Art
+# Background Research: State of the Art
 
-The problem of effectively partitioning a system’s topology is one which has been so for some time, and there are a range of third party libraries / standards which provides APIs to solve the problem. In order to standardise this process for the C++ standard we must carefully look at all of these. Below is a list of the libraries and standards which define an interface for affinity:
+The problem of effectively partitioning a system’s topology is one which has been so for some time, and there are a range of third party libraries / standards which provides APIs to solve the problem. In order to standardise this process for C++ we must carefully look at all of these approaches and identify which we wish to adopt. Below is a list of the libraries and standards which this proposal will draw from:
 
 * [Portable Hardware Locality][hwloc]
 * [SYCL 1.2][sycl-1-2-1]
@@ -87,7 +88,7 @@ Libraries such as the Portable Hardware Locality (hwloc) [9] provide a low level
 
 Some systems will provide additional user control through explicit binding of threads to processors through environment variables consumed by various compilers, system commands (e.g. Linux: taskset, numactl; Windows: start /affinity), or system calls for example Solaris has `pbind()`, Linux has `sched_setaffinity()` and Windows has `SetThreadAffinityMask()`.
 
-# Problem Space
+## Problem Space
 
 In this paper we describe the problem space of affinity for C++, the various challenges which need to be addressed in defining a partitioning and affinity interface for C++ and some suggested solutions:
 
@@ -102,79 +103,171 @@ There are some additional challenges which we have been investigating but are no
 * Migrating data from memory allocated in one partition to another
 * Defining memory placement algorithms or policies
 
+# Proposed Wording
 
-## Proposed Wording
+## Header `<execution>` synopsis
 
-### Header synopsis
+    namespace std {
+    namespace experimental {
+    namespace execution {
 
-```cpp
-namespace std {
-namespace experimental {
-namespace execution {
+    /* Execution resource */
 
-/* Execution resource */
+    struct execution_resource {
 
-struct execution_resource {
+      execution_resource() = delete;
+      execution_resource(const execution_resource &) = delete;
+      execution_resource(execution_resource &&) = delete;
+      execution_resource &operator=(const execution_resource &) = delete;
+      execution_resource &operator=(execution_resource &&) = delete;
+      ~execution_resource() = delete;
 
-  execution_resource() = delete;
-  execution_resource(const execution_resource &) = delete;
-  execution_resource(execution_resource &&) = delete;
-  execution_resource &operator=(const execution_resource &) = delete;
-  execution_resource &operator=(execution_resource &&) = delete;
+      size_t concurrency() const noexcept;
+      size_t partition_size() const noexcept;
 
-  size_t concurrency() const noexcept;
-  size_t partition_size() const noexcept;
+      const execution_resource &partition(size_t i) const noexcept;
+      const execution_resource &member_of() const noexcept;
 
-  const execution_resource &partition(size_t i) const noexcept;
-  const execution_resource &member_of() const noexcept;
+      std::string name() const noexcept;
 
-  std::string name() const noexcept;
+      bool can_place_memory() const noexcept;
+      bool can_place_agent() const noexcept;
 
-  bool can_place_memory() const noexcept;
-  bool can_place_agent() const noexcept;
+    };
 
-};
+    /* Execution context */
 
-/* Execution context */
+    struct execution_context {
 
-struct execution_context {
+      using executor_type = __unspecfied__;
 
-  using executor_type = __unspecfied__;
+      template <typename ExecutionResource>
+      execution_context(ExecutionResource &&execResource);
 
-  template <typename ExecutionResource>
-  execution_context(ExecutionResource &&execResource);
+      ~execution_context();
 
-  execution_resource &resource();
+      const execution_resource &resource() const noexcept;
 
-  executor_type executor() noexcept;
+      executor_type executor() noexcept;
 
-};
+    };
 
-/* This system */
+    /* This system */
 
-namespace this_system {
-  execution_resource &resource();
-}
+    namespace this_system {
+      const execution_resource &resource();
+    }
+
+    }  // execution
+    }  // experimental
+    }  // std
 
-}  // execution
-}  // experimental
-}  // std
-```
 *Listing 2: Header synopsis*
 
-### Querying a System’s Topology
+## Class `execution_resource`
+
+The `execution_resource` class provides an abstraction over a software or hardware resource capable of memory allocation, execution of light weight exeution agents or both.
+
+### `execution_resource` constructors
+
+	execution_resource() = delete;
+
+
+[*Note:* An implementation of `execution_resource` is permitted to provide non-public constructors to allow other objects to construct them. *--end note*]
+
+### `execution_resource` assignment
+
+The `execution_resource` class is not is not `CopyConstructible` (C++Std [copyconstructible]).
+
+    execution_resource(const execution_resource &) = delete;
+    execution_resource(execution_resource &&) = delete;
+    execution_resource &operator=(const execution_resource &) = delete;
+    execution_resource &operator=(execution_resource &&) = delete;
+
+### `execution_resource` destructor
+
+The `execution_resource` class is not is not `Destructible` (C++Std [destructible]).
+
+	~execution_resource() = delete;
+
+### `execution_resource` operations
+
+      size_t concurrency() const noexcept;
+
+*Returns:*
+
+      size_t partition_size() const noexcept;
+
+*Returns:*
+
+      const execution_resource &partition(size_t i) const noexcept;
+
+*Returns:*
+
+      const execution_resource &member_of() const noexcept;
+
+*Returns:*
+
+      std::string name() const noexcept;
+
+*Returns:*
+
+      bool can_place_memory() const noexcept;
+
+*Returns:*
+
+      bool can_place_agent() const noexcept;
+
+*Returns:*
+
+## Class `execution_context`
+
+The `execution_context` class provides an abstraction for managing a number of light weight execution agents executing work on one or more `execution_resource`s.
+
+### `execution_context` member aliases
+
+	using executor_type = __unspecfied__;
+
+*Requires:*
+
+### `execution_context` constructors
+
+	template <typename ExecutionResource>
+	execution_context(ExecutionResource &&execResource);
+
+### `execution_context` destructor
+
+    ~execution_context();
+
+### `execution_context` operators
+
+	const execution_resource &resource() const noexcept;
+
+*Returns:*
+
+	executor_type executor() noexcept;
+
+*Returns:*
+
+## Free functions
+
+	const this_system::execution_resource &resource();
+
+*Returns:*
+
+## Querying a System’s Topology
 
 The first task in allowing C++ applications to leverage memory locality is to provide the ability to query a **system** for its **resource topology** (commonly represented as a tree or graph) and traverse its **execution resources**.
 
-### Execution resource
+## Execution resource
 
 The capability of querying underlying **execution resources** of a given **system** is particularly important towards supporting affinity control in C++. The current proposal for executors [5] leaves the **execution resource** largely unspecified. This is intentional: **execution resources** will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those.
 
 There is current work on extending the executors proposal to describe a typical interface for an **execution context** [8]. In this paper a typical **execution context** is defined with an interface for construction and comparison, and for retrieving an **executor**, waiting on submitted work to complete and querying the underlying **execution resource**.
 
 Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed.
 
-### Level of abstraction
+## Level of abstraction
 
 An important consideration when defining a unified interface for querying the **resource topology** of a **system** is what level of abstraction should such an interface have and at what granularity the **execution resources** of the topology be described.
 
@@ -186,7 +279,7 @@ As both the level of abstraction of an **execution resource** and the granularit
 |------------|
 | Should the interface for querying a system’s resource topology be completely abstract or should it provide specific components of the hardware architecture? |
 
-### Representation
+## Representation
 
 Nowadays, there are various APIs and libraries that enable this functionality. One of the most commonly used is the Portable Hardware Locality (hwloc) [9]. Hwloc presents the hardware as a tree, where the root node represents the whole machine and subsequent levels represents different partitions depending on different hardware characteristics. The picture below shows the output of the hwloc visualization tool (lstopo) on a 2-socket Xeon E5300 server. Note that each socket is represented by a package in the graph. Each socket contain its own cache memories, but both share the same NUMA memory region. Note also that different I/O units are visible underneath: Placement of these units w.r.t to memory and threads can be critical to performance. The ability of placing threads and/or allocating memory appropriately on the different components of this system is an important part of the process of application development, especially as hardware architectures get more complex. The documentation of lstopo [22] shows more interesting examples of topologies that can be encountered on today systems.
 
@@ -199,7 +292,7 @@ However, systems are becoming increasingly non-hierarchical and a traditional tr
 | Should the interface for querying a system’s resource topology support non-hierarchical architectures. |
 | *What kind of shape do we want for expressing the topology abstraction?* |
 
-### Extended Execution Resource Interface
+## Extended Execution Resource Interface
 
 Below is a proposed interface for the generalization of the **execution resource** based on the definition of `thread_execution_resource_t` [8] with some extensions.
 
@@ -313,7 +406,7 @@ for (int i = 0; i < resource.partition_size(); i++) {
 | Should the interface provide a way of creating an execution context from an execution resource? |
 | *Is what is defined here a suitable solution?* |
 
-### Importance of topology discovery
+## Importance of topology discovery
 
 For traditional single CPU systems the execution resources reasoned about using standard constructs such as std::thread, std::this_thread and thread local storage. This is because the C++ memory model requires that  a system have **at least one thread of execution, some memory and some I/O capabilities**. This means that for these systems some assumptions can be made about the topology could be made during at compile-time, for example the fact that developers can query always the hardware concurrency available as there is always at least 1 thread or the fact that you can always use thread local storage.
 
@@ -329,7 +422,7 @@ Note that this is different from devices that go online or offline during execut
 | *When do we enable the device discovery process? Can we change the system topology after executors have been created?* |
 | *Should be provide an interface for providing a call-back on topology change?* |
 
-### Lifetime considerations
+## Lifetime considerations
 
 As the execution context would provide a partitioning interface which returns objects describing the components of the system topology of an execution resource it’s important to consider the lifetime of these objects.
 
@@ -339,7 +432,7 @@ For these reasons **resources** must always outlive any **execution context** wh
 
 ### Scaling to heterogeneous and distributed systems
 
-The initial solution should target systems with a single addressable memory region, i.e. a system which does not have discrete non-accessible memory regions such as a discrete GPU or FPGA. However in the interest of maintaining a unified interface going forward the initial solution should be designed with the latter in mind and should be scalable to support these systems in the future. In particular to support heterogeneous systems it’s important that the abstraction allows the interface for querying the **resource topology** of the **system** in order to perform device discovery. 
+The initial solution should target systems with a single addressable memory region, i.e. a system which does not have discrete non-accessible memory regions such as a discrete GPU or FPGA. However in the interest of maintaining a unified interface going forward the initial solution should be designed with the latter in mind and should be scalable to support these systems in the future. In particular to support heterogeneous systems it’s important that the abstraction allows the interface for querying the **resource topology** of the **system** in order to perform device discovery.
 
 ## Querying the Relative Affinity of Partitions
 
@@ -384,15 +477,15 @@ If a particular policy or algorithm requires to access placement information, th
 
 # Future Work
 
-### Migrating data from memory allocated in one partition to another
+## Migrating data from memory allocated in one partition to another
 
 In some cases for performance it is important to bind a memory allocation to a memory region for the duration of an a tasks execution, however in other cases it’s important to be able to migrate the data from one memory region to another. This is outside the scope of this paper, however we would like to investigate this in a future paper.
 
 | Straw Poll |
 |------------|
 | Should the interface provide a way of migrating data between partitions? |
 
-### Defining memory placement algorithms or policies
+## Defining memory placement algorithms or policies
 
 With the ability to place memory with affinity comes the ability to define algorithms or memory policies which describe at a higher level how memory is distributed across large systems. Some examples of these are pinned, first touch and scatter. This is outside the scope of this paper, however we would like to investigate this in a future paper.