clang9适配一阶段总结


1. 概述

截止2021年11月25日,clang9完成sdk/gtest/dsopt模块的编译。

参照下面的脚本下载了所有[TR-16607] clang9交叉编译工具链制作和验证 - Enflame Company JIRA相关的修改,包含merged和当前还是open状态的修改:

怎么从gerrit批量导出详细的patch - 周荣华_Ronghua - enflame wiki

特地说明一下,gerrit的query命令里面不能有括号,所以实际如果存在多个条件的复杂联合时,默认是AND运算,如果想使用OR运算的话,需要把多个可选表达式用OR连接起来。

简单统计了一下,新增3924行代码,删除4164行代码:

PS D:\code> grep "^+[^+]" .\diffrecord.txt |wc
   3924   24785  152346
PS D:\code> grep "^-[^-]" .\diffrecord.txt |wc
   4164   23159  147430

前期修改的时候,由于打开了-Werr选项,所以有一些是不太重要的告警,由于告警实在太多,后期将-Werr临时先关闭了,只保留了部分特定的Werr选项。

另外,由于tops下面的代码中从大的整型向小的整型隐式转换的非常多,后面还用-Wno-c++11-narrowing临时关闭了相关告警。

2. 问题发现和解决的方法

如果每次发现一个问题之后,修改完之后,再走全量编译,通常非常耗时,下面的方法可以获取单个的编译或者链接命令,便于针对性验证。

2.1. cmake的编译命令获取

cmake有编译字典,在cmake_build(敲cmake命令的目录,可能是其他目录)目录下会生成一个“compile_commands.json”文件,里面记录了所有.c/.cc/.cpp生成.o的目录和完整命令,例如想知道

hlir_utils_test.cc的编译命令,可以用下面的途径获取:
grep hlir_utils_test.cc compile_commands.json
  "command": "/opt/efb/clang9/bin/clang++  -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING -D_GLIBCXX_USE_CXX11_ABI=0 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include/dtu -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib/umd/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/ef_log/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/sdk -I/home/ronghua.zhou/clang1_build/tops/sdk/lib -I/home/ronghua.zhou/clang1_build/tops/sdk/lib/cpu_ops -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/mlir/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/eigen_archive -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_absl -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_protobuf/src -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/dtu_sdk/bazel-bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/llvm-project/llvm/utils/unittest/googlemock/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/com_googlesource_code_re2 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest  -O3 -g0 -DNDEBUG -fPIE   -m64 -march=x86-64 -mtune=generic -Werror=array-bounds -Werror=empty-body -Werror=format-extra-args -Werror=incompatible-pointer-types -Werror=array-bounds-pointer-arithmetic -Werror=c++-compat -Werror=shift-count-overflow -Werror=sizeof-pointer-memaccess -Werror=for-loop-analysis -Werror=unused-label -Werror=delete-incomplete -Werror=empty-translation-unit -Werror=unused-local-typedef -Werror=gnu-case-range -Werror=mismatched-new-delete -Werror=infinite-recursion -Werror=unreachable-code -Werror=sometimes-uninitialized -Werror=c++14-binary-literal -Werror=implicit-fallthrough -Werror=constant-logical-operand -Werror=exceptions -fcxx-exceptions -Werror=extra-tokens -Werror=format -Werror=format-security -Werror=header-guard -Werror=literal-conversion -Werror=null-conversion -Werror=pointer-bool-conversion -Werror=shift-overflow -Werror=tautological-constant-out-of-range-compare -Werror=tautological-pointer-compare -Werror=varargs -Wdouble-promotion -Wno-error=extern-c-compat -Wall -Wno-c++11-narrowing -Wextra -fsanitize=address -fno-omit-frame-pointer -std=gnu++14 -std=gnu++14 -o sdk/tests/hlir/cc_tests/CMakeFiles/hlir_utils_test.dir
hlir_utils_test.cc.o -c /home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc",
  "file": "/home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc"
 

2.2. bazel的编译命令获取

?https://github.com/vincent-picaud/Bazel_and_CompileCommands

上面这个开源项目提到可以用–experimental_action_listener=//tools/actions:generate_compile_commands_listener到bazel命令的方式来实现接收编译命令,但我用了几次没有成功,最终改为在编译过程中用原始的ps命令来获取,例如想获取hlir_utils_test.ccbian编译命令可以用下面的命令:

ps -elf |grep hlir_utils_test.cc

另外,bazel命令后面加上-s参数也可以达到获取后续编译命令的效果。

2.3. 链接命令的获取

如果知道链接的具体目标文件,可以参照2.2的方法用ps命令获取,例如要链接libdtu_sdk.so,可以用下面命令获取链接命令:

ps -elf |grep libdtu_sdk.so

如果不清楚链接的具体目标,在链接对象不多的情况下可以用“ps -elf”获取一个全集,从全集里面可以看到很多“ld @/tmp/response-xxx.txt”的进程,将当前所有的/tmp/response*拷贝到别的目录下,研究下这些文件用来链接生成什么目标的,这些文件里面会有完整的链接命令和参数,通过这个文件可以得到链接命令。

3. 实际修改分类

3.1. 编译选项的修改

3.1.1. 增加的选项

-fcxx-exceptions :因为dsopt使用了异常,clang的异常处理默认关闭,需要打开。

-Wno-c++11-narrowing :tops下面的代码中从大的整型向小的整型隐式转换的非常多,临时关闭,等各个组件消除了相关问题之后再打开,clang里面把从大整型到小整型的隐式转换当做错误处理。

3.1.2. 删除的选项

-Werror : 告警实在太多,要求消除所有告警不现实,临时先删除该选项。

3.1.3. 修改的选项

set (CMAKE_CXX_STANDARD 14) :原来的默认标准是17,和TensorFlow的默认标准14冲突,也和gcc的默认标准14冲突,改成c++14。

-fno-canonical-system-headers :这个参数仅gcc支持,clang不支持,所以把它从所有编译器都打开,改到仅gcc打开。

3.1.4. bazel的选项说明

bazel的编译选项分copt/cxxopt/conlyopt,其中copt是c和c++公用的选项,cxxopt是仅c++才是用的选项,conlyopt是仅c才有的选项,如果用错了,会出现很多告警。

3.1.5. CMAKE的CMAKE_TOOLCHAIN_FILE变量在rerun的时候,有一定概率会把搜索路径下的工具链配置文件加上全路径,导致直接STREQUAL判断失败

解决方案是用MATCHES代替STREQUAL,通配是否增加全路径的情况:

CMakeLists.txt Expand source

3.2. 模板相关错误

3.2.1. use 'template' keyword to treat 'cast' as a dependent template name

clang里面对在一个模板实例化后的对象中调用一个需要动态翻译的函数,需要使用template显示说明,否则会报错。参照ISO C++03 14.2/4:

When the name of a member template specialization appears after . or -> in a postfix-expression, or after nested-name-specifier in a qualified-id, and the postfix-expression or qualified-id explicitly depends on a template-parameter (14.6.2), the member template name must be prefixed by the keyword template. Otherwise the name is assumed to name a non-template.

例如hlir的SinkTransposeWithScalarBroadcast类里面调用了mlir::RankedTensorType、mlir::ShapedType的cast方法

 
diff --git a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
index c82fa217a21..9952ddbc470 100644
--- a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
+++ b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
@@ -237,11 +237,14 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern {
     }
     llvm::SmallVector4> new_operands(root->getNumOperands(), {});
     for (auto& it : broadcast_ops) {
-      auto transposedTy = getTransposedType(std::get<1>(it)
-                                                ->getResult(0)
-                                                .getType()
-                                                .cast(),
-                                            prePermutation);
+      // fix error:
+      // use 'template' keyword to treat 'cast' as a dependent template name
+      auto transposedTy =
+          getTransposedType(std::get<1>(it)
+                                ->getResult(0)
+                                .getType()
+                                .template cast(),
+                            prePermutation);
       auto new_attr = llvm::cast(std::get<1>(it))
                           .broadcast_dimensionsAttr();
       if (new_attr) {
@@ -251,7 +254,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern {
           new_data[i] = layout[data[i]];
         }
         new_attr = mlir::DenseIntElementsAttr::get(
-            new_attr.getType().cast(),
+            new_attr.getType().template cast(),
             llvm::makeArrayRef(new_data));
       }
       mlir::Operation* transpose_bs_op =
@@ -274,7 +277,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern {
     mlir::Operation* ret_transpose = rewriter.create(
         root->getLoc(), root->getResult(0).getType(), new_root->getResult(0),
         mlir::DenseIntElementsAttr::get(
-            permutation.getType().cast(), layout));
+            permutation.getType().template cast(), layout));
     root->replaceAllUsesWith(ret_transpose);
   }

注意,如果不是模板实例化的函数,不需要加template,同一个类里面也存在不需要处理的函数调用,例如同一个文件里面的ss对象是非模板实例化的,类型是固定的mlir::Operation*,ss在调用存在多态的cast函数时就不需要使用temple进行前置声明:

 
mlir::Operation* ss = op.getOperation();
auto new_operand_ty = getTransposedType(operand_ty, prePermutation);
auto new_source_ty = getTransposedType(source_ty, prePermutation);
auto new_result_ty = getTransposedType(
    ss->getResult(0).getType().cast(),
    prePermutation);

同样的问题也存在于factor模块的factor_profiler_pass.cc中:

diff --git a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
index 43419fd305a..ad23a709f20 100644
--- a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
+++ b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
@@ -55,11 +55,11 @@ mlir::Value getFirstOperand(mlir::Value op) {
  
 template 
 int getSrcCompressed(T op) {
-  return op.template dma_src_compressedAttr().getInt();
+  return op.dma_src_compressedAttr().getInt();
 }
 template 
 int getDstDecompressed(T op) {
-  return op.template dma_dst_decompressAttr().getInt();
+  return op.dma_dst_decompressAttr().getInt();
 }
  
 #define DISABLE_DMA_COMPRESS_ATTR_GETTER(OP) \
@@ -84,11 +84,11 @@ DISABLE_DMA_COMPRESS_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template 
 int getReverseLr(T op) {
-  return op.template dma_reverse_lrAttr().getInt();
+  return op.dma_reverse_lrAttr().getInt();
 }
 template 
 int getReverseTb(T op) {
-  return op.template dma_reverse_tbAttr().getInt();
+  return op.dma_reverse_tbAttr().getInt();
 }
  
 #define DISABLE_REVERSE_ATTR_GETTER(OP) \
@@ -114,7 +114,7 @@ DISABLE_REVERSE_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template 
 int getDmaType(T op) {
-  return op.template dma_typeAttr().getInt();
+  return op.dma_typeAttr().getInt();
 }
  
 #define DISABLE_DMA_TYPE_GETTER(OP) \
@@ -142,8 +142,8 @@ std::string formatDmaAttrs(int direction, int src_compressed,
 template 
 void extractDmaMetaInfoTo(T op, dtu_activity_data &data) {
   auto &args = data.args;
-  mlir::Value from = getFirstOperand(op.template from());
-  mlir::Value to = getFirstOperand(op.template to());
+  mlir::Value from = getFirstOperand(op.from());
+  mlir::Value to = getFirstOperand(op.to());
   auto engine_type = getDmaType(op);
   auto direction = op.dma_directionAttr().getInt();

3.2.2. 二义性

部分模板实例化的时候,如果同一个调用用模板函数A和模板函数B都能正常匹配到,clang会报二义性错误,gcc不报错。

例如下面的EraseHelp,原来的版本定义了两种原型,其实对存在多个模板类型需要使用TypeSequence进行原型定义的时候,编译器其实不知道是该先把Last抽出来计算,还是先把Inner抽出来计算,如果这2个函数的实现逻辑不一样的话,在gcc里面居然没报错,不知道是随机找到一个匹配的原型就调用,还是用第一个或者最后一个原型来调用。

constexpr static auto EraseHelp(TypeSequence, TypeSequence);

constexpr static auto EraseHelp(TypeSequence, TypeSequence);
diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 3cf2bc7994a..0e645fd1e7e 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -157,12 +157,9 @@ struct EraseSeqIf {
     using type = decltype(EraseHelp(LeftSeq(), TypeSequence()));
     return type();
   }
-  template 
-  constexpr static auto EraseHelp(TypeSequence, TypeSequence) {
-    using type = typename std::conditional::value,
-                                           TypeSequence,
-                                           TypeSequence>::type;
-    return type();
+  template 
+  constexpr static auto EraseHelp(TypeSequence, TypeSequence<>) {
+    return TypeSequence();
   }
   using type = decltype(EraseHelp(TypeSequence<>(), TypeSequence()));
 };

3.3. 类型不匹配

3.3.1. 大整型向小整型的隐式转换

例如sdk/tests/llir/dataflow1_pingpang_buffer_test.cc里面定义的func_entry是int64_t类型,但实际调用函数的时候,函数原型要求的入参是uint32_t,会触发int64_t → uint32_t的隐式转换:

diff --git a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
index fa824f03d9a..70298b1fb59 100644
--- a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
+++ b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
@@ -522,7 +522,7 @@ TEST(Pavo2xCDMAPattern1Test, Pavo2xCDMAPattern1WithPingpangTest) {
                              {{0}, {1}, {2}, {3}, {4}, {5}}, 1, 1, 1, -1, -1,
                              output_queues_l1);
  
-    int64_t func_entry = 0;
+    uint32_t func_entry = 0;
     // trigger sip
     for (uint64_t idx = 0; idx < SIP_COUNT; ++idx) {
       std::string sip_name = std::string("sip") + std::to_string(idx);

其他类似的有:

sdk/tests/llir/dataflow1_test.cc

sdk/tests/llir/dataflow2_test.cc

sdk/tests/llir/dataflow3_test.cc

sdk/tests/llir/dataflow5_test.cc

sdk/tests/llir/dataflow5_test_1xcdma.cc

sdk/tests/llir/dataflow7_test.cc

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/utils/llir_test_util.cc

sdk/tests/llir/utils/llir_test_util.h

3.3.2. 有符号向无符号的隐式转换

-1转换为无符号整型:

diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 0e645fd1e7e..f84360269f3 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -122,10 +122,9 @@ struct FindIf {
  
 template