From 81652c983642cc50327f3ec9a98365fb36d515d4 Mon Sep 17 00:00:00 2001 From: cloudflywu Date: Tue, 10 Mar 2020 10:40:37 +0800 Subject: [PATCH] Add english documents and add links in readme. --- Readme.md | 3 +- doc/en/CPPCodingTalkAboutPointer.md | 266 +++++++++++++++++ doc/en/DeveloperGuide4Utility.md | 431 ++++++++++++++++++++++++++++ 3 files changed, 699 insertions(+), 1 deletion(-) create mode 100644 doc/en/CPPCodingTalkAboutPointer.md create mode 100644 doc/en/DeveloperGuide4Utility.md diff --git a/Readme.md b/Readme.md index b69ef3006c..74f5a97f0b 100644 --- a/Readme.md +++ b/Readme.md @@ -42,6 +42,7 @@ Open the front end, back end, and compilation optimization of compilers. Support - [Naive RC Insertion Principle](doc/en/NaiveRcInsertionDescription.md) - [Virtual Table and Interface Table Design](doc/en/VtableItableDescription.md) - [Phase Design](doc/en/CompilerPhaseDescription.md) + - [Application Manual of Maple General Modules](doc/en/DeveloperGuide4Utility.md) - [Environment Configuration](doc/en/DevelopmentPreparation.md) @@ -50,7 +51,7 @@ Open the front end, back end, and compilation optimization of compilers. Support - [Programming Specifications](doc/en/ProgrammingSpecifications.md) - +- [Discussion of Pointer in C++ Programming](doc/en/CPPCodingTalkAboutPointer.md) ## License - [LICENSE](license/LICENSE) diff --git a/doc/en/CPPCodingTalkAboutPointer.md b/doc/en/CPPCodingTalkAboutPointer.md new file mode 100644 index 0000000000..2e822efaa5 --- /dev/null +++ b/doc/en/CPPCodingTalkAboutPointer.md @@ -0,0 +1,266 @@ +# Discussion of Pointer in C++ Programming + +# Background + +The use of pointers in `C/C++` is flexible and accompanied by many security risks, which poses higher requirements on programmers. This document will discuss how to use raw pointers in `C/C++` and ultimately establish a coding paradigm. + +# Raw Pointer vs Reference + +## Member access. + +When an object member is accessed, the raw pointer may be null (the validity of a pointer is logically ensured by a closed object or function). Therefore, invalid pointers must be checked, whereas the reference cannot be null and does not need to be checked. + +## Container members. + +In the semantics of `C++`, references express alias relationships, which do not occupy memory theoretically. (In practice, references are implemented internally as pointers in a compiler.) A reference is not an object in itself, which is different from a pointer. A pointer can be a container member, but a reference cannot. + +# Raw Pointer vs Smart Pointer + +## Destroy a heap object. + +```c++ +class Int { + ... +private: + int data; +} + +void test(int *in) { + Int* tmp = new Int(); + ... + goto LABEL; + ... + + delete tmp; +LABEL: +} +``` + +The use of resources (heap objects, stack objects, and file resources) complies with the principle that **"resources that are released in the same scope as they are acquired"** in Resource Acquisition Is Initialization (RAII), which minimizes the possibility of resource leakage. + +A segment of processing logic and sub-function calling are usually involved between `new` and `delete` of a raw pointer. The intermediate processing logic may encounter exceptions or jumps. (The current object will not go beyond authority to restrict the behavior of the intermediate processing logic, which exceeds the management scope of `new`.) The resource release is skipped due to exceptions or jumps, causing resource leakage (for example, the `tmp` object in the `test` function in the preceding example). + +The smart pointer is reconstructed to `auto tmp = std::make_unique();`. When the `tmp` object is constructed, the `delete` behavior is bound and the current scope is destroyed, preventing resource leakage. + +## Management permission vs use permission. + +```c++ +int *delete(int *in); +``` + +Management permission: Destroy and rebuild objects. + +Use permission: Access and modify objects. + +As shown in the preceding example, when a raw pointer is used to transfer parameters, the use of the **management permission** or **use permission** cannot be determined by the input parameter `in` and output parameter because the raw pointer implies an attribute of transferring the ownership (possibly or not). Additional information is required when this function is called: Will the `in` parameter be destroyed by the `delete` function? Does the return value need to be destroyed by the caller? + +```c++ +std::unique_ptr delete(std::unique_ptr &in); +``` + +A smart pointer is used to specify a role of a parameter in an interface. For example, `std::unique_ptr& in` indicates that the `delete` function has the **use permission**, and the return value indicates that the `delete` function transfers the ownership. + +# Pointer Normalization + +## A destruction method must be immediately bound to an object created by `new`. + +Bad example: + +```c++ +Object *obj = new Object(); +... +delete obj; +``` + +Good example: + +```c++ +std::unique_ptr obj = std::make_unique(); +``` + +## A release method must be immediately bound to the applied resources. + +Bad example: + +```c++ +FILE *file = open("xxx.txt"); +... +file->close(); +``` + +Good example: (This example is commonly used. The best way is to encapsulate an application class `open`.) + +```c++ +template +class ResourceGuard { + public: + ResourceGuard(T *_obj, Func _func) : obj(_obj), func(_func) {} + + ~ResourceGuard() { obj.func(); } + private: + T *obj; + Func func; +} + +FILE* file = open("xxx.txt"); +auto fileGuard = ResourceGuard>(file, FILE::close); +... +``` + +## Use a reference instead of a pointer when the value is not null. + +Bad example: + +```c++ +void func1(int *in) { + if (in == nullptr) return; + ... +} + +void func2() { + int *p = nullptr; + ... + if (p != nullptr) { + func1(p); + } +} +``` + +Good example: + +```c++ +void func1(int &in) { + ... +} + +void func2() { + int *p = nullptr; + ... + if (p != nullptr) { + func1(*p); + } +} +``` + +## As a container member (without the management permission), use an encapsulated reference container instead of a pointer when the value is not null. + +Bad example: + +```c++ +void func(std::vector &in) { + for (auto *p : in) { + if (p == nullptr) { + continue; + } + ... + } +} +``` + +Good example: + +```c++ +template +class Ref { + public: + Ref() = delete; + Ref(T &ref) : data(&ref) {} + + ... + + operator T() const noexcept { + return *data; + } + + private: + T *data; +} + +template +using ref_vector = std::vector>; +void func(ref_vector &in) { + for (auto p : in) { + int &data = p; + ... + } +} +``` + +## As a container member (with the management permission), use a container that has the lifecycle management permission instead of a pointer container. + +Bad example: + +```c++ +std::vector data; +... +for (auto *p : data) { + delete p; +} +``` + +Good example: + +```c++ +template +class ptr_vector { +public: + ~ptr_vector() { + for (auto *p : data) { + delete p; + } + } + +private: + std::vector data; +} + +ptr_vector data; +... +``` + +## Explicitly transfer an object management permission and specify an object use permission. + +`move` semantics is added to `C++11`, and `auto_ptr` is discarded. `unique_ptr` is used to explicitly transfer the ownership so that the lifecycle management methods of stack objects and heap objects can be unified. + +Example of stack object transfer: + +```c++ +std::vector func() { + std::vector data; + data.push_back(0); + return std::move(data); +} +``` + +Example of fuzzy heap object transfer: + +```c++ +Object *func() { + std::unique_ptr data = std::make_unique(); + Object &rData = ToRef(data); + rData.push_back(0); + return data.release(); +} +``` + +Example of clear heap object transfer: + +```c++ +std::unique_ptr func() { + std::unique_ptr data = std::make_unique(); + Object &rData = ToRef(data); + rData.push_back(0); + return std::move(data); +} +``` + +## Scenarios where pointers should be used. + +1. When a third-party library function transfers in or out a pointer, `unique_ptr.get()` or `unique_ptr.release()` must be used to construct input parameters before the function is called. After output parameters are obtained, `unique_ptr` must be used to catch or check whether the output parameters are null and convert the output parameters to references. +2. As a container member (without the management permission), a null pointer is designed in the application scenario. However, the null pointer must be checked and converted to a reference immediately before use. Pointer diffusion is not supported. + +# Remarks + +`Ref` and `ref_vector` have been developed. `Ref` is defined as `SafePtr` because `operator.` cannot be reloaded. + +The `ResourceGuard` and `ptr_vector` are being developed and are mainly used as examples in this document. diff --git a/doc/en/DeveloperGuide4Utility.md b/doc/en/DeveloperGuide4Utility.md new file mode 100644 index 0000000000..d1262234c1 --- /dev/null +++ b/doc/en/DeveloperGuide4Utility.md @@ -0,0 +1,431 @@ +# Application Manual of Maple General Modules + +# Cast + +## `instance_of` and `safe_cast`. + +In principle, `RTTI`, that is, `dynamic_cast`, must be disabled for the use of `C++` in `maple`. The compiler system is complex. It will make the object relationship more complex to cast from a class object to a subclass object by class design. Therefore, the `maple` code implementation introduces the following designs: + +```c++ +SubClass *sub = nullptr; +if (base.Type() == SubType) { + sub = static_cast(base); +} +``` + +A property field is designed to implement the binding between a class and its subclass, achieving the same effect as `dynamic_cast`. + +However, this method has some disadvantages. First, whereas the binding relationship between `SubType` and `SubClass` is static and determined by the designer, the caller needs to make the relationship explicit, resulting in strong dependency. Second, not all scenarios are as intuitive as type comparison. Callers are prone to make mistakes in complex scenarios, causing shotgun surgery in later code rectification. Therefore, the `safe_cast` is designed. The designer registers the casting relationship. A caller only needs to call the `dynamic_cast` method. + +### Registration method. + +Use the `REGISTER_SAFE_CAST` macro to complete the registration. The declaration is as follows: + +```c++ +#define REGISTER_SAFE_CAST(type, condition) +``` + +`type` is a subclass type (assumed as `B`), and `condition` is a Boolean expression that matches `B` and all its subclasses. Example: + +```c++ +class A; +class B : A; +class C : B; +REGISTER_SAFE_CAST(B, from.Kind() == B || from.Kind() == C); +REGISTER_SAFE_CAST(C, from.Kind() == C); +``` + +`from` is a formal parameter name of a type transferred by an expression. + +*Note:* + +*- The registration supports the casting from a subclass to a class and from a class to a subclass.* + +*- `condition` can be any Boolean expression. However, the designer must ensure that it complies with the inheritance relationship to avoid casting of non-inheritance relationships.* + +*- For a complex `condition`, using `kind ==` to indicate an inheritance relationship tree is not recommended. Optimization needs to be considered. For example, the range of `kind` needs to be organized, and a specific bit flag needs to be used to achieve quick matching.* + +#### Application scenarios. + +1. For scenarios where only one type is matched: + +```c++ +SubClass *sub = safe_cast(base); +if (sub == nullptr) { + // TODO +} +``` + +2. For scenarios where multiple types are matched: + +```c++ +if (instanceof(base)) { + auto *sub = static_cast(base); + // TODO +} else if (instanceof(base)) { + auto *sub = static_cast(base); + // TODO +} +... +``` + +*Note:* + +*- In scenarios where types have been correctly identified, such as `switch case`, use `static_cast`.* + +*- The return value of `safe_cast` is always a pointer, indicating whether the casting is successful.* + +*- If the input is a pointer, both `instance_of` and `safe_cast` check whether the input is null. Therefore, if `base` is not null, the reference is preferentially transferred.* + +# Container + +## `Ptr` + +`Ptr` simulates the behavior of a native pointer, but removes support for array operations. + +In this method, data is validated in the construction and assignment operations, so that a feature of a pointer object needs to be validated only during construction, and does not need to be validated again during transfer and use, thereby reducing overheads of repeated check. + +```c++ +template +using PtrCheckerType = void (*)(const T*); + +template +constexpr void CheckNothing(const T*) {} + +template Check = CheckNothing> +class Ptr; +``` + +As shown in the preceding example, the default validation behavior of `Ptr` is not checked. + +```c++ +template < typename T> +using XXXPtr = utils::Ptr>; +``` + +In the preceding example, the validated pointer objects in different scenarios can be defined, which are called security pointers. + +*Note: The preceding behavior is extended based on the requirements of `safe_ptr`. However, `safe_ptr` is more complex than the common `Ptr` scenario because it is specific to the validity of a pointer rather than features of a pointer object.* + +## `safe_ptr` + +For details, see the definition of `Ref` in the CPPCodingTalkAboutPointer. The `operator.` in `c++` cannot be overloaded. As a result, the `Ref` object cannot be built. However, the `safe_ptr` object, which is a non-null pointer object equivalent to the `Ref` object, can be defined. + +### Scenario: Container members such as arrays and dictionaries. + +For details, see the CPPCodingTalkAboutPointer. + +To solve the problem that `operator.` cannot be overloaded, the `ToRef` tool capability in `Utility` will be extended, ensuring that the pointer does not have overhead and is securely casted into a reference. It may have the following features: + +```c++ +template +constexpr T &ToRef(safe_ptr ptr) { + return *ptr; +} +``` + +*Note: For containers that are frequently used, containers of the `ref_xxx` series will be encapsulated to replace the `safe_ptr` application scenarios.* + +### Scenario: Object members. + +This is an unexpected scenario discovered after `safe_ptr` is developed. + +The compiler cannot generate default copies and transfers for classes of reference data members. However, the copy and transfer capabilities are required in many scenarios. In this case, the common method is to store the classes as pointer members. However, pointer members bring the following problems: + +1. When a pointer member is accessed, the validity of the pointer needs to be identified. +2. The behavior can be controllable by defining member roles during refactoring and evolution. However, the role that uses the member may not be notified, which may further cause hidden bugs. + +If `safe_ptr` is used to replace a raw pointer and `ToRef` is used to convert the reference at the `safe_ptr` dereference position (`ToRef` of a raw pointer has extra overheads), the preceding problems can be avoided. + +*Note: `safe_ptr` is designed as a container or object member. `&` is used for transferring function parameters.* + +## `mpl_iterator` + +The original name of `mpl_iterator` is `iterator`, but the name is the same as that of `using iterator` in the container. Therefore, the `mpl` prefix is added. + +Taking the design of the `ref_xxx` container and possible expansion of the small talk series in the future into consideration, repeated iterator design is troublesome. Therefore, a unified iterator container that continuously evolves is abstracted, which uses `mpl_iterator_traits` to quickly implement the iterator of the new container with a few configurations. + +Based on the iterator requirements of `ref_vector`, members of `mpl_iterator_traits` are designed as follows: + +```c++ +template +struct mpl_iterator_traits { + using iterator_category = typename std::iterator_traits::iterator_category; + using value_type = typename std::iterator_traits::value_type; + using difference_type = typename std::iterator_traits::difference_type; + using pointer = typename std::iterator_traits::pointer; + using reference = typename std::iterator_traits::reference; + + static reference operator_dereference(Iterator iter) { + return *iter; + }; + + static Iterator operator_arrow(Iterator iter) { + return iter; + } + + static reference operator_bracket(Iterator iter, difference_type n) { + return iter[n]; + } +}; + +``` + +## `ref_vector` + +For details, see the definition of `ref_vector` in the CPPCodingTalkAboutPointer. + +Use the pointer array: `std::vector`. + +```c++ + int a = 0, b = 1; + + std::vector data; + data.push_back(&a); + data.push_back(&b); + ASSERT_EQ(*data[0], 0); + ASSERT_EQ(*data[1], 1); +``` + +Refactored as a reference array: `utils::ref_vector`. +```c++ + int a = 0, b = 1; + + ref_vector data; + data.push_back(a); + data.push_back(b); + ASSERT_EQ(data[0], 0); + ASSERT_EQ(data[1], 1); +``` + +## `Index` + +`Index` is designed to meet the static security requirements of basic types. + +For example, `GStrIdx`, `UStrIdx`, and `U16StrIdx` are involved in calculation in service code, and their interfaces are similar. Their bottom layers are `uint32`. If all formal parameters are defined as `uint32`, it will be a disaster. The caller must be careful, but it is difficult to avoid transferring incorrect data. So static type conflict is used and the compiler checks the correctness of the use, which will greatly reduce the errors. + +To define a static type, you only need to define different tags, as shown in the following example: + +```c++ +class GStrTag; +using GStrIdx = utils::Index; + +class UStrTag; +using UStrIdx = utils::Index; + +class U16StrTag; +using U16StrIdx = utils::Index; +``` + +# Generalize Pattern + +## `ObjectFactory` + +The `ObjectFactory` is an encapsulation for an abstract factory and is used to solve the following problems: + +1. Eliminate large functions with high cyclomatic complexity in code, such as `switch... case` and `if... else if...`. These functions are difficult to maintain and extend and have low flexibility.* (Note: Analyze semantics and design well instead of rigidly solving problems, for example, use template mode to implement design isolation instead of code isolation.)* + +2. The use of a factory will invert the dependency, making it easier to isolate the caller from the designer and the designer from each other. + +Example: + +```c++ +// Defines a key to mark the product type to be created by the factory. +enum class ObjectType { + kPlus, + kMinus +}; + +//Defines interfaces and interface protocols that all products must implement. +class Base { + public: + virtual ~Base() = default; + virtual int32_t DoIt(int32_t lhs, int32_t rhs) const = 0; +}; + +// Defines a factory by using ObjectType as the search keyword. Base indicates the interface protocol, and int32_t indicates the parameter type of all product constructor functions. +using TestObjectFactory = ObjectFactory; + +// Key, interface, and factory must be visible to both the registrant and caller. Therefore, the key may be in the .h file or summarized in the .cpp file. +// Products can be registered in different .cpp files as long as they can be registered in the factory. +// Defines a product. +class ObjectPlus : public Base { + public: + // Constructor function agreed upon during factory definition. + explicit ObjectPlus(int32_t base) : base(base) {} + virtual ~ObjectPlus() = default; + + // Conventions for defining interfaces. + virtual int32_t DoIt(int32_t lhs, int32_t rhs) const override { + return base + lhs + rhs; + } + + private: + int32_t base; +}; + +// Defines the product, which may be in another .cpp file. +class ObjectMinus : public Base { + public: + explicit ObjectMinus(int32_t base) : base(base) {} + virtual ~ObjectMinus() = default; + + virtual int32_t DoIt(int32_t lhs, int32_t rhs) const override { + return base + lhs - rhs; + } + + private: + int32_t base; +}; + +// Registers a product. The product registration mode varies depending on the distribution and loading of keys, interfaces, factories, and products. Ensure that the registration is successful. +// Initializes the static variable to ensure registration. +bool RegisterObject() { + RegisterFactoryObject(); + RegisterFactoryObject(); +} +static auto testObjectFactory = RegisterObject(); + +TEST(TestFactory, ObjectFactory) { + // Obtains a product object. Check whether the value is null. The example is omitted. + auto obj = CreateProductObject(ObjectType::kPlus, 10); + ASSERT_EQ(obj->DoIt(1, 2), 13); + obj = CreateProductObject(ObjectType::kMinus, 10); + ASSERT_EQ(obj->DoIt(1, 2), 9); +} +``` + +## `FunctionFactory` + +Similar to `ObjectFactory`, `FunctionFactory` is used to simplify the complexity of abstract factories. In most scenarios, `FunctionFactory` is easier to compile and use. + +Example: + +```c++ +// Defines a key to mark the product type to be created by the factory. +enum class FunctionType { + kPlus, + kMinus +}; + +// Defines the interface and factory by using the FunctionType as the search keyword and the int32_t(int32_t, int32_t) as the function protocol. +using TestFunctionFactory = FunctionFactory; + +// Defines a product. +int32_t Plus(int32_t lhs, int32_t rhs) { + return lhs + rhs; +} + +// Defines the product, which may be in another .cpp file. +int32_t Minus(int32_t lhs, int32_t rhs) { + return lhs - rhs; +} + +// Registers a product. The product registration mode varies depending on the distribution and loading of keys, interfaces, factories, and products. Ensure that the registration is successful. +// Uses the singleton pattern to ensure registration. +bool RegisterFunction() { + RegisterFactoryFunction(FunctionType::kPlus, Plus); + RegisterFactoryFunction(FunctionType::kMinus, Minus); +} +void AutoFunctionLoader() { + static auto testObjectFactor = RegisterFunction(); +} + +TEST(TestFactory, TestAll) { + // Loads the product. + AutoFunctionLoader(); + + // Obtains a product object. Check whether the value is null. The example is omitted. + auto func = CreateProductFunction(FunctionType::kPlus); + ASSERT_EQ(func(1, 2), 3); + func = CreateProductFunction(FunctionType::kMinus); + ASSERT_EQ(func(1, 2), -1); +} +``` + + +# Utility + +## `ToRef` + +In the scenario where pointers transfer parameters in the new code and refactored code of Ark Compiler, it is expected that all pointers are validated and then transferred to the called function in reference mode. In most cases, the called function should not bear the risk of null pointers in function parameters and the overhead of judgment. + +The common format is as follows (In the example, `DoIt` and `Run` can be used as third-party interfaces and cannot be changed): + +```c++ +A *DoIt(B &b); +void Run(B *b) { + CHECK_NULL_FATAL(b); + // ... + A *a = DoIt(*b); + CHECK_NULL_FATAL(a); + a->Do; +} +``` + +If most pointers are transferred into a reference type once obtained: + +```c++ +A *DoIt(B &b); +void Run(B *b) { + B &bRef = utils::ToRef(b); + // ... + A &a = utils::ToRef(DoIt(bRef)); + a.Do; +} +``` + +If `b` is used only once: + +```c++ +A *DoIt(B &b); +void Run(B *b) { + // ... + A &a = utils::ToRef(DoIt(utils::ToRef(b))); + a.Do; +} +``` + +## `bit_field_v`&`lbit_field_v` + +Using a bit to mark a state switch combination is a design method that saves memory and can be efficiently encoded. Generally, the enumeration definition or constant definition is written as follows: + +```c++ +enum BBAttr : uint32 { + kBBAttrIsEntry = 0x02, + kBBAttrIsExit = 0x04, + kBBAttrWontExit = 0x08, + kBBAttrIsTry = 0x10, + kBBAttrIsTryEnd = 0x20, + kBBAttrIsJSCatch = 0x40, + kBBAttrIsJSFinally = 0x80, + kBBAttrIsCatch = 0x0100, + kBBAttrIsJavaFinally = 0x0200, + kBBAttrArtificial = 0x0400, + kBBAttrIsInLoop = 0x0800, + kBBAttrIsInLoopForEA = 0x1000 +}; +``` + +This design is obviously intended to use bits to record some attribute information, but the bits are obscure and difficult to maintain and read. + +Therefore, a clearer design is required. + +```c++ +enum BBAttr : uint32 { + kBBAttrIsEntry = utils::bit_field_v<1>, + kBBAttrIsExit = utils::bit_field_v<2>, + kBBAttrWontExit = utils::bit_field_v<3>, + kBBAttrIsTry = utils::bit_field_v<4>, + kBBAttrIsTryEnd = utils::bit_field_v<5>, + kBBAttrIsJSCatch = utils::bit_field_v<6>, + kBBAttrIsJSFinally = utils::bit_field_v<7>, + kBBAttrIsCatch = utils::bit_field_v<8>, + kBBAttrIsJavaFinally = utils::bit_field_v<9>, + kBBAttrArtificial = utils::bit_field_v<10>, + kBBAttrIsInLoop = utils::bit_field_v<11>, + kBBAttrIsInLoopForEA = utils::bit_field_v<12> +}; +``` + +`bit_field_v`:`uint32` and `lbit_field_v`:`uint64` are supported. In the future, `sbit_field_v`:`uint16` and `bbit_field_v`:`uint8` will be added as required. -- Gitee