# cpp-taskflow **Repository Path**: chuck_chen/cpp-taskflow ## Basic Information - **Project Name**: cpp-taskflow - **Description**: cpp-taskflow 是一个开源的 C++ 并行任务编程库,cpp-tastflow 非常快,只包含头文件,可以帮你快速编写包含复杂任务依赖的并行程序 - **Primary Language**: C/C++ - **License**: Not specified - **Default Branch**: master - **Homepage**: https://www.oschina.net/p/cpp-taskflow - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 34 - **Created**: 2020-07-22 - **Last Updated**: 2024-11-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Taskflow [![Codacy Badge](https://api.codacy.com/project/badge/Grade/3bbdc89f9a7a41eaa17559fab8a64cde)](https://app.codacy.com/gh/taskflow/taskflow?utm_source=github.com&utm_medium=referral&utm_content=taskflow/taskflow&utm_campaign=Badge_Grade_Dashboard) [![Linux Build Status](https://travis-ci.com/taskflow/taskflow.svg?branch=master)](https://travis-ci.com/taskflow/taskflow) [![Windows Build status](https://ci.appveyor.com/api/projects/status/rbjl16i6c9ahxr16?svg=true)](https://ci.appveyor.com/project/tsung-wei-huang/taskflow) [![Wiki](image/api-doc.svg)][wiki] [![TFProf](image/tfprof.svg)](https://taskflow.github.io/tfprof/) [![Cite](image/cite-arXiv.svg)](https://arxiv.org/abs/2004.10908v2) Taskflow helps you quickly write parallel tasks programs in modern C++ :exclamation: Starting from [v2.5.0](https://github.com/taskflow/taskflow/releases/tag/2.5.0), we have renamed cpp-taskflow to ***taskflow*** to broaden its support and future application scopes. The core codebase remains *unchanged*. You may only need to [change the remote URL](https://help.github.com/en/github/using-git/changing-a-remotes-url) to this new repository. Thank your for the support! # Why Taskflow? Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads. ![](image/performance.png) Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient *work-stealing* scheduler to optimize your multithreaded performance. | [Static Tasking](#get-started-with-taskflow) | [Dynamic Tasking](#dynamic-tasking) | | :------------: | :-------------: | | ![](image/static_graph.svg) | | Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools. | [Conditional Tasking](#conditional-tasking) | | :-----------------: | | ![](image/condition.svg) | Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope. | [Taskflow Composition](#composable-tasking) | | :---------------: | |![](image/framework.svg)| Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing. | [Concurrent CPU-GPU Tasking](#concurrent-cpu-gpu-tasking) | | :-----------------: | | ![](image/cudaflow.svg) | Taskflow provides visualization and tooling needed for profiling Taskflow programs. | [Taskflow Profiler](https://taskflow.github.io/tfprof) | | :-----------------: | | ![](image/tfprof.png) | We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out [Who is Using Taskflow](#who-is-using-taskflow) and what our users say: + *"Taskflow is the cleanest Task API I've ever seen." [Damien Hocking @Corelium Inc](http://coreliuminc.com)* + *"Taskflow has a very simple and elegant tasking interface. The performance also scales very well." [Glen Fraser][totalgee]* + *"Taskflow lets me handle parallel processing in a smart way." [Hayabusa @Learning](https://cpp-learning.com/cpp-taskflow/)* + *"Taskflow improves the throughput of our graph engine in just a few hours of coding." [Jean-Michaël @KDAB](https://ossia.io/)* + *"Best poster award for open-source parallel programming library." [Cpp Conference 2018][Cpp Conference 2018]* + *"Second Prize of Open-source Software Competition." [ACM Multimedia Conference 2019](https://tsung-wei-huang.github.io/img/mm19-ossc-award.jpg)* See a quick [presentation][Presentation] and visit the [documentation][wiki] to learn more about Taskflow. Technical details can be referred to our [arXiv paper](https://arxiv.org/abs/2004.10908v2). # Table of Contents * [Get Started with Taskflow](#get-started-with-taskflow) * [Create a Taskflow Application](#create-a-taskflow-application) * [Step 1: Create a Taskflow](#step-1-create-a-taskflow) * [Step 2: Define Task Dependencies](#step-2-define-task-dependencies) * [Step 3: Execute a Taskflow](#step-3-execute-a-taskflow) * [Dynamic Tasking](#dynamic-tasking) * [Conditional Tasking](#conditional-tasking) * [Composable Tasking](#composable-tasking) * [Concurrent CPU-GPU Tasking](#concurrent-cpu-gpu-tasking) * [Step 1: Create a cudaFlow](#step-1-create-a-cudaflow) * [Step 2: Compile and Execute a cudaFlow](#step-2-compile-and-execute-a-cudaflow) * [Visualize a Taskflow Graph](#visualize-a-taskflow-graph) * [API Reference](#api-reference) * [System Requirements](#system-requirements) * [Compile Unit Tests, Examples, and Benchmarks](#compile-unit-tests-examples-and-benchmarks) * [Who is Using Taskflow?](#who-is-using-taskflow) # Get Started with Taskflow The following example [simple.cpp](./examples/simple.cpp) shows the basic Taskflow API you need in most applications. ```cpp #include // Taskflow is header-only int main(){ tf::Executor executor; tf::Taskflow taskflow; auto [A, B, C, D] = taskflow.emplace( [] () { std::cout << "TaskA\n"; }, // task dependency graph [] () { std::cout << "TaskB\n"; }, // [] () { std::cout << "TaskC\n"; }, // +---+ [] () { std::cout << "TaskD\n"; } // +---->| B |-----+ ); // | +---+ | // +---+ +-v-+ A.precede(B); // A runs before B // | A | | D | A.precede(C); // A runs before C // +---+ +-^-+ B.precede(D); // B runs before D // | +---+ | C.precede(D); // C runs before D // +---->| C |-----+ // +---+ executor.run(taskflow).wait(); return 0; } ``` Compile and run the code with the following commands: ```bash ~$ g++ simple.cpp -I path/to/include/taskflow/ -std=c++17 -O2 -lpthread -o simple ~$ ./simple TaskA TaskC <-- concurrent with TaskB TaskB <-- concurrent with TaskC TaskD ``` # Create a Taskflow Application Taskflow defines a very expressive API to create task dependency graphs. Most applications are developed through the following three steps: ## Step 1: Create a Taskflow Create a taskflow object to build a task dependency graph: ```cpp tf::Taskflow taskflow; ``` A task is a callable object for which [std::invoke][std::invoke] is applicable. Use the method `emplace` to create a task: ```cpp tf::Task A = taskflow.emplace([](){ std::cout << "Task A\n"; }); ``` ## Step 2: Define Task Dependencies You can add dependency links between tasks to enforce one task to run before or after another. ```cpp A.precede(B); // A runs before B. ``` ## Step 3: Execute a Taskflow To execute a taskflow, you need to create an *executor*. An executor manages a set of worker threads to execute a taskflow through an efficient *work-stealing* algorithm. ```cpp tf::Executor executor; ``` The executor provides a rich set of methods to run a taskflow. You can run a taskflow multiple times, or until a stopping criteria is met. These methods are non-blocking with a [std::future][std::future] return to let you query the execution status. Executor is *thread-safe*. ```cpp executor.run(taskflow); // runs the taskflow once executor.run_n(taskflow, 4); // runs the taskflow four times // keeps running the taskflow until the predicate becomes true executor.run_until(taskflow, [counter=4](){ return --counter == 0; } ); ``` You can call `wait_for_all` to block the executor until all associated taskflows complete. ```cpp executor.wait_for_all(); // block until all associated tasks finish ``` Notice that the executor does not own any taskflow. It is your responsibility to keep a taskflow alive during its execution, or it can result in undefined behavior. In most applications, you need only one executor to run multiple taskflows each representing a specific part of your parallel decomposition.
[↑]
# Dynamic Tasking Another powerful feature of Taskflow is *dynamic* tasking. Dynamic tasks are those tasks created during the execution of a taskflow. These tasks are spawned by a parent task and are grouped together to a *subflow* graph. To create a subflow for dynamic tasking, emplace a callable with one argument of type `tf::Subflow`. ```cpp // create three regular tasks tf::Task A = tf.emplace([](){}).name("A"); tf::Task C = tf.emplace([](){}).name("C"); tf::Task D = tf.emplace([](){}).name("D"); // create a subflow graph (dynamic tasking) tf::Task B = tf.emplace([] (tf::Subflow& subflow) { tf::Task B1 = subflow.emplace([](){}).name("B1"); tf::Task B2 = subflow.emplace([](){}).name("B2"); tf::Task B3 = subflow.emplace([](){}).name("B3"); B1.precede(B3); B2.precede(B3); }).name("B"); A.precede(B); // B runs after A A.precede(C); // C runs after A B.precede(D); // D runs after B C.precede(D); // D runs after C ``` By default, a subflow graph joins its parent node. This ensures a subflow graph finishes before the successors of its parent task. You can disable this feature by calling `subflow.detach()`. For example, detaching the above subflow will result in the following execution flow: ```cpp // create a "detached" subflow graph (dynamic tasking) tf::Task B = tf.emplace([] (tf::Subflow& subflow) { tf::Task B1 = subflow.emplace([](){}).name("B1"); tf::Task B2 = subflow.emplace([](){}).name("B2"); tf::Task B3 = subflow.emplace([](){}).name("B3"); B1.precede(B3); B2.precede(B3); // detach the subflow to form a parallel execution line subflow.detach(); }).name("B"); ``` A subflow can be nested or recursive. You can create another subflow from the execution of a subflow and so on.
[↑]
# Conditional Tasking Taskflow supports *conditional tasking* for users to implement *general* control flow with cycles and conditionals. A *condition task* evalutes a set of instructions and returns an integer index of the next immediate successor to execute. The index is defined with respect to the order of its successor construction. ```cpp tf::Task init = tf.emplace([](){ }).name("init"); tf::Task stop = tf.emplace([](){ }).name("stop"); // creates a condition task that returns 0 or 1 tf::Task cond = tf.emplace([](){ std::cout << "flipping a coin\n"; return rand() % 2; }).name("cond"); // creates a feedback loop init.precede(cond); cond.precede(cond, stop); // cond--0-->cond, cond--1-->stop executor.run(tf).wait(); ```
[↑]
# Composable Tasking A powerful feature of `tf::Taskflow` is composability. You can create multiple task graphs from different parts of your workload and use them to compose a large graph through the `composed_of` method. ```cpp tf::Taskflow f1, f2; auto [f1A, f1B] = f1.emplace( []() { std::cout << "Task f1A\n"; }, []() { std::cout << "Task f1B\n"; } ); auto [f2A, f2B, f2C] = f2.emplace( []() { std::cout << "Task f2A\n"; }, []() { std::cout << "Task f2B\n"; }, []() { std::cout << "Task f2C\n"; } ); auto f1_module_task = f2.composed_of(f1); f1_module_task.succeed(f2A, f2B) .precede(f2C); ``` Similarly, `composed_of` returns a task handle and you can use `precede` to create dependencies. You can compose a taskflow from multiple taskflows and use the result to compose a larger taskflow and so on.
[↑]
# Concurrent CPU-GPU Tasking Taskflow enables concurrent CPU-GPU tasking by leveraging [Nvidia CUDA Toolkit][cuda-toolkit]. You can harness the power of CPU-GPU collaborative computing to implement heterogeneous decomposition algorithms. ## Step 1: Create a cudaFlow A `tf::cudaFlow` is a graph object created at runtime similar to dynamic tasking. It manages a task node in a taskflow and associates it with a [CUDA Graph][cudaGraph]. To create a cudaFlow, emplace a callable with an argument of type `tf::cudaFlow`. ```cpp tf::Taskflow taskflow; tf::Executor executor; const unsigned N = 1<<20; // size of the vector std::vector hx(N, 1.0f), hy(N, 2.0f); // x and y vectors at host float *dx{nullptr}, *dy{nullptr}; // x and y vectors at device tf::Task allocate_x = taskflow.emplace([&](){ cudaMalloc(&dx, N*sizeof(float));}); tf::Task allocate_y = taskflow.emplace([&](){ cudaMalloc(&dy, N*sizeof(float));}); tf::Task cudaflow = taskflow.emplace([&](tf::cudaFlow& cf) { tf::cudaTask h2d_x = cf.copy(dx, hx.data(), N); // host-to-device x data transfer tf::cudaTask h2d_y = cf.copy(dy, hy.data(), N); // host-to-device y data transfer tf::cudaTask d2h_x = cf.copy(hx.data(), dx, N); // device-to-host x data transfer tf::cudaTask d2h_y = cf.copy(hy.data(), dy, N); // device-to-host y data transfer // launch saxpy<<<(N+255)/256, 256, 0>>>(N, 2.0f, dx, dy) tf::cudaTask kernel = cf.kernel((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy); kernel.succeed(h2d_x, h2d_y) .precede(d2h_x, d2h_y); }); cudaflow.succeed(allocate_x, allocate_y); // overlap data allocations executor.run(taskflow).wait(); ``` Assume our kernel implements the canonical saxpy operation (single-precision A·X Plus Y) using the CUDA syntax. ```cpp // saxpy (single-precision A·X Plus Y) kernel __global__ void saxpy( int n, float a, float *x, float *y ) { // get the thread index int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) { y[i] = a*x[i] + y[i]; } } ``` ## Step 2: Compile and Execute a cudaFlow Name you source with the extension `.cu`, let's say `saxpy.cu`, and compile it through [nvcc][nvcc]: ```bash ~$ nvcc saxpy.cu -I path/to/include/taskflow -O2 -o saxpy ~$ ./saxpy ``` Our source autonomously enables cudaFlow for compilers that support CUDA.
[↑]
# Visualize a Taskflow Graph You can dump a taskflow through a `std::ostream` in [GraphViz][GraphViz] format using the method `dump`. There are a number of free [GraphViz tools][AwesomeGraphViz] you could find online to visualize your Taskflow graph. ```cpp tf::Taskflow taskflow; tf::Task A = taskflow.emplace([] () {}).name("A"); tf::Task B = taskflow.emplace([] () {}).name("B"); tf::Task C = taskflow.emplace([] () {}).name("C"); tf::Task D = taskflow.emplace([] () {}).name("D"); tf::Task E = taskflow.emplace([] () {}).name("E"); A.precede(B, C, E); C.precede(D); B.precede(D, E); taskflow.dump(std::cout); // dump the graph in DOT to std::cout ``` When you have tasks that are created at runtime (e.g., subflow, cudaFlow), you need to execute the graph first to spawn these tasks and dump the entire graph. ```cpp tf::Executor executor; tf::Taskflow taskflow; tf::Task A = taskflow.emplace([](){}).name("A"); // create a subflow of two tasks B1->B2 tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) { tf::Task B1 = subflow.emplace([](){}).name("B1"); tf::Task B2 = subflow.emplace([](){}).name("B2"); B1.precede(B2); }).name("B"); A.precede(B); executor.run(tf).wait(); // run the taskflow to spawn subflows tf.dump(std::cout); // dump the graph including dynamic tasks ```
[↑]
# API Reference The official [documentation][wiki] explains a complete list of Taskflow API. Here, we highlight commonly used methods. ## Taskflow API The class `tf::Taskflow` is the main place to create a task dependency graph. ### *emplace/placeholder* You can use `emplace` to create a task from a target callable. ```cpp tf::Task task = taskflow.emplace([] () { std::cout << "my task\n"; }); ``` When a task cannot be determined beforehand, you can create a placeholder and assign the callable later. ```cpp tf::Task A = taskflow.emplace([](){}); tf::Task B = taskflow.placeholder(); A.precede(B); B.work([](){ /* do something */ }); ``` ### *parallel_for* The method `parallel_for` creates a subgraph that applies the callable to each item in the given range of a container. ```cpp auto v = {'A', 'B', 'C', 'D'}; auto [S, T] = taskflow.parallel_for( v.begin(), // iterator to the beginning v.end(), // iterator to the end [] (int i) { std::cout << "parallel " << i << '\n'; } ); // add dependencies via S and T. ``` You can specify a *chunk* size (default one) in the last argument to force a task to include a certain number of items. ```cpp auto v = {'A', 'B', 'C', 'D'}; auto [S, T] = taskflow.parallel_for( v.begin(), // iterator to the beginning v.end(), // iterator to the end [] (int i) { std::cout << "AB and CD run in parallel" << '\n'; }, 2 // at least two items at a time ); ``` In addition to iterator-based construction, `parallel_for` has another overload of index-based loop. The first three argument of this overload indicates starting index, ending index (exclusive), and step size. ```cpp // [0, 11) with a step size of 2 auto [S, T] = taskflow.parallel_for( 0, 11, 2, [] (int i) { std::cout << "parallel_for on index " << i << std::endl; }, 2 // at least two items at a time ); // will print 0, 2, 4, 6, 8, 10 (three partitions, {0, 2}, {4, 6}, {8, 10}) ``` ## Task API Each time you create a task, the taskflow object adds a node to the present task dependency graph and return a *task handle* to you. You can access or modify the attributes of the associated task node. ### *name* The method `name` lets you assign a human-readable string to a task. ```cpp A.name("my name is A"); ``` ### *work* The method `work` lets you assign a callable to a task. ```cpp A.work([] () { std::cout << "hello world!"; }); ``` ### *precede/succeed* The method `precede/succedd` lets you add a preceding/succeeding link between tasks. ```cpp // A runs before B, C, D, and E A.precede(B, C, D, E); ``` The method `succeed` is similar to `precede` but operates in the opposite direction. ### *empty/has_work* A task is empty if it is not associated with any graph node. ```cpp tf::Task task; // assert(task.empty()); ``` A placeholder task is associated with a graph node but has no work assigned yet. ``` tf::Task task = taskflow.placeholder(); // assert(!task.has_work()); ``` ## Executor API The class `tf::Executor` is used for executing one or multiple taskflow objects. ### *run/run_n/run_until* The run series are *thread-safe* and *non-blocking* calls to execute a taskflow. Issuing multiple runs on the same taskflow will automatically synchronize to a sequential chain of executions. ```cpp executor.run(taskflow); // runs a graph once executor.run_n(taskflow, 5); // runs a graph five times executor.run_until(taskflow, my_pred); // keeps running until the my_pred becomes true executor.wait_for_all(); // blocks until all tasks finish ``` The first run finishes before the second run, and the second run finishes before the third run.
[↑]
# System Requirements To use the latest [Taskflow](https://github.com/taskflow/taskflow/archive/master.zip), you only need a [C++14][C++14] compiler. + GNU C++ Compiler at least v5.0 with -std=c++14 + Clang C++ Compiler at least v4.0 with -std=c++14 + Microsoft Visual Studio at least v15.7 (MSVC++ 19.14); see [vcpkg guide](https://github.com/taskflow/taskflow/issues/143) + AppleClang Xode Version at least v8 + Nvidia CUDA Toolkit and Compiler ([nvcc][nvcc]) at least v10.0 with -std=c++14 Taskflow works on Linux, Windows, and Mac OS X. See the [C++ compiler support](https://en.cppreference.com/w/cpp/compiler_support) status.
[↑]
# Compile Unit Tests, Examples, and Benchmarks Taskflow uses [CMake](https://cmake.org/) to build examples and unit tests. We recommend using out-of-source build. ```bash ~$ cmake --version # must be at least 3.9 or higher ~$ mkdir build ~$ cd build ~$ cmake ../ ~$ make & make test # run all unit tests ``` ## Examples The folder `examples/` contains several examples and is a great place to learn to use Taskflow. | Example | Description | | ------- | ----------- | | [simple.cpp](./examples/simple.cpp) | uses basic task building blocks to create a trivial taskflow graph | | [visualization.cpp](./examples/visualization.cpp)| inspects a taskflow through the dump method | | [parallel_for.cpp](./examples/parallel_for.cpp)| parallelizes a for loop with unbalanced workload | | [subflow.cpp](./examples/subflow.cpp)| demonstrates how to create a subflow graph that spawns three dynamic tasks | | [run_variants.cpp](./examples/run_variants.cpp)| shows multiple ways to run a taskflow graph | | [composition.cpp](./examples/composition.cpp)| demonstrates the decomposable interface of taskflow | | [observer.cpp](./examples/observer.cpp)| demonstrates how to monitor the thread activities in scheduling and running tasks | | [condition.cpp](./examples/condition.cpp) | creates a conditional tasking graph with a feedback loop control flow | | [cuda/saxpy.cu](./examples/cuda/saxpy.cu) | uses cudaFlow to create a saxpy (single-precision A·X Plus Y) task graph | | [cuda/matmul.cu](./examples/cuda/matmul.cu) | uses cudaFlow to create a matrix multiplication workload and compares it with a CPU basline | ## Benchmarks Please visit [benchmarks](benchmarks/benchmarks.md) to learn to compile the benchmarks.
[↑]
# Who is Using Taskflow? Taskflow is being used in both industry and academic projects to scale up existing workloads that incorporate complex task dependencies. - [OpenTimer][OpenTimer]: A High-performance Timing Analysis Tool for Very Large Scale Integration (VLSI) Systems - [DtCraft][DtCraft]: A General-purpose Distributed Programming Systems using Data-parallel Streams - [Firestorm][Firestorm]: Fighting Game Engine with Asynchronous Resource Loaders (developed by [ForgeMistress][ForgeMistress]) - [Shiva][Shiva]: An extensible engine via an entity component system through scripts, DLLs, and header-only (C++) - [PID Framework][PID Framework]: A Global Development Methodology Supported by a CMake API and Dedicated C++ Projects - [NovusCore][NovusCore]: An emulating project for World of Warraft (Wrath of the Lich King 3.3.5a 12340 client build) - [SA-PCB][SA-PCB]: Annealing-based Printed Circuit Board (PCB) Placement Tool - [LPMP](https://github.com/LPMP/LPMP): A C++ framework for developing scalable Lagrangian decomposition solvers for discrete optimization problems - [Heteroflow](https://github.com/Heteroflow/Heteroflow): A Modern C++ Parallel CPU-GPU Task Programming Library - [OpenPhySyn](https://github.com/The-OpenROAD-Project/OpenPhySyn): A plugin-based physical synthesis optimization kit as part of the OpenRoad flow - [OSSIA](https://ossia.io/): Open-source Software System for Interactive Applications - [deal.II](https://github.com/dealii/dealii): A C++ software library to support the creation of finite element code - [PyRepScan](https://github.com/Intsights/PyRepScan): A Git Repository Leaks Scanner Python Library written in C++ [More...](https://github.com/search?q=taskflow&type=Code)
[↑]
# Contributors Taskflow is being actively developed and contributed by the [these people](https://github.com/taskflow/taskflow/graphs/contributors). Meanwhile, we appreciate the support from many organizations for our developments. | [][UofU] | [][UIUC] | [][CSL] | [][NSF] | [][DARPA IDEA] | | :---: | :---: | :---: | :---: | :---: | # License Taskflow is licensed under the [MIT License](./LICENSE). * * * [Tsung-Wei Huang]: https://tsung-wei-huang.github.io/ [Chun-Xun Lin]: https://github.com/clin99 [Martin Wong]: https://ece.illinois.edu/directory/profile/mdfwong [Gitter badge]: ./image/gitter_badge.svg [GitHub releases]: https://github.com/taskflow/taskflow/releases [GitHub issues]: https://github.com/taskflow/taskflow/issues [GitHub insights]: https://github.com/taskflow/taskflow/pulse [GitHub pull requests]: https://github.com/taskflow/taskflow/pulls [GitHub contributors]: https://github.com/taskflow/taskflow/graphs/contributors [GraphViz]: https://www.graphviz.org/ [AwesomeGraphViz]: https://dreampuf.github.io/GraphvizOnline/ [OpenMP Tasking]: https://www.openmp.org/spec-html/5.0/openmpsu99.html [TBB FlowGraph]: https://www.threadingbuildingblocks.org/tutorial-intel-tbb-flow-graph [OpenTimer]: https://github.com/OpenTimer/OpenTimer [DtCraft]: https://github.com/tsung-wei-huang/DtCraft [totalgee]: https://github.com/totalgee [damienhocking]: https://github.com/damienhocking [ForgeMistress]: https://github.com/ForgeMistress [Patrik Huber]: https://github.com/patrikhuber [DARPA IDEA]: https://www.darpa.mil/news-events/2017-09-13 [KingDuckZ]: https://github.com/KingDuckZ [NSF]: https://www.nsf.gov/ [UIUC]: https://illinois.edu/ [CSL]: https://csl.illinois.edu/ [UofU]: https://www.utah.edu/ [wiki]: https://taskflow.github.io/taskflow/index.html [release notes]: https://taskflow.github.io/taskflow/Releases.html [PayMe]: https://www.paypal.me/twhuang/10 [C++17]: https://en.wikipedia.org/wiki/C%2B%2B17 [C++14]: https://en.wikipedia.org/wiki/C%2B%2B14 [email me]: mailto:twh760812@gmail.com [Cpp Conference 2018]: https://github.com/CppCon/CppCon2018 [ChromeTracing]: https://www.chromium.org/developers/how-tos/trace-event-profiling-tool [IPDPS19]: https://tsung-wei-huang.github.io/papers/ipdps19.pdf [WorkStealing Wiki]: https://en.wikipedia.org/wiki/Work_stealing [std::invoke]: https://en.cppreference.com/w/cpp/utility/functional/invoke [std::future]: https://en.cppreference.com/w/cpp/thread/future [cuda-zone]: https://developer.nvidia.com/cuda-zone [nvcc]: https://developer.nvidia.com/cuda-llvm-compiler [cuda-toolkit]: https://developer.nvidia.com/cuda-toolkit [cudaGraph]: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html [Firestorm]: https://github.com/ForgeMistress/Firestorm [Shiva]: https://shiva.gitbook.io/project/shiva [PID Framework]: http://pid.lirmm.net/pid-framework/index.html [NovusCore]: https://github.com/novuscore/NovusCore [SA-PCB]: https://github.com/choltz95/SA-PCB [Presentation]: https://taskflow.github.io/ [chrome://tracing]: chrome://tracing