创建 Docker 容器

1
2
3
4
5
# 1. 要求映射usb文件夹到docker /dev/bus/usb:/dev/bus/usb
docker run -it --privileged=true -v /dev/bus/usb:/dev/bus/usb -v /home/yanghuan/workspace/aiot_benchmark:/root/workspace --net=host --name=aiot_benchmark --hostname=aiot_benchmark ubuntu:20.04 /bin/bash

# 2. 安装android udev规则,此时 cat /etc/group 确保有plugdev用户组
apt-get install android-sdk-platform-tools-common

1681447491778.png

1
2
3
4
5
6
7
8
9
10
11
# 3.将当前用户加入plugindev用户组
usermod-a -G plugdev root
# 4. 关闭的退出容器
exit
# 5. 宿主机上重新登录docker
docker start aiot_benchmark
docker attach aiot_benchmark
# 6. 确保已加入用户组,并检查连接情况
id
lsusb
# 没有lsusb的,apt install usbutils

1681448105689.png

安装 Android SDK

安装 sdkmanager

参考sdkmanager文档中描述

  1. 从 Android Studio 下载页面中下载最新的“command line tools only”软件包,然后将其解压缩。

  2. 将解压缩的 cmdline-tools 目录移至您选择的新目录,例如 android_sdk。这个新目录就是您的 Android SDK 目录。

  3. 在解压缩的 cmdline-tools 目录中,创建一个名为 latest 的子目录。

  4. 将原始 cmdline-tools 目录内容(包括 lib 目录、bin 目录、NOTICE.txt 文件和 source.properties 文件)移动到新创建的 latest 目录中。现在,您就可以从这个位置使用命令行工具了。

  5. 加入环境变量

    1
    2
    echo "export PATH=$PATH:/root/android_sdk/cmdline-tools/latest/bin" >> /root/.bashrc
    source ~/.bashrc
  6. 安装 open jdk 11

    1
    apt-get install openjdk-11-jdk -y
  7. 安装 platfrom-tools 和其他包

    1
    2
    3
    # 设置代理
    export https_proxy="http://172.16.101.180:7890"
    sdkmanager --install "platform-tools" "platforms;android-29" "ndk;25.0.8775105"

1681449532864.png

  1. 设置 adb 环境变量

    1
    2
    echo "export PATH=$PATH:/root/android_sdk/platform-tools" >> /root/.bashrc
    source ~/.bashrc
  2. 连接设备(确保没有其他的 adb 服务了)
    1681450096396.png

CoDL技术介绍及代码复现

1 CoDL MobiSys2021 技术介绍

CoDL是一个移动设备上并行DL推理框架,它基于Xiaomi MACE框架开发,主要利用手机SoC的CPU和GPU并行加速模型推理过程,而当前的主要的推理框架还是依次只利用一个设备去推理.

1.1 挑战和解决方案

  1. 减少异构处理器之间的数据共享开销
  2. 如何为异构处理器恰当地分配OP

为了充分利用异构处理器来加速Model的每一个OP,论文中提到了两个技术来解决上面的挑战.

  1. 混合类型优化的数据共享(hybrid type-friendly data sharing),这个技术允许每个处理器能够使用其最高效的数据类型去进行推理,因为文章中设计了实验,其结果表明如果异构处理器共用统一类型的数据结构会导致推理效率并不高效且不合理.为了减少共享的开销,同时还采用了hybrid-dimension partitioningoperator chain.
  2. 非线性和并发感知的延迟预测(non-linearity and concurrency-aware latency prediction).通过构建一个轻量且准确的延迟预测器来指导OP切分来确保合理性.

有空再详细介绍介绍CoDL的技术实现

2 代码复现

CoDL的代码已经开源在了GitHub上,地址是:https://github.com/csu-eis/CoDL/

2.1 CoDL运行流程

CoDL运行流程(摘自论文).png

2.1.1 离线阶段(offline)

在离线阶段,CoDL设计了一个轻量的延迟预测器指导在在线阶段的OP切分,它会考虑到data sharing的开销.

2.1.2 在线阶段(online)

在线阶段分成了两个部分,一个是OP切分(operator partitioner),另一个是OP协同执行(operator co-executor).

  1. operator partitioner,这主要负责去找对于输入model的优化的OP 切分的计划,同样OP的权重也会被预分配CPU和GPU上,从而避免推理时再去转换.
  2. operator co-executor,这主要是就是根据切分计划去对OP执行进行同步,并对不同的处理器采用不同的数据类型.

2.2 基于CoDL加速模型

废话不多说了,开干.

2.2.1 构建Docker环境和可执行文件

  1. 编译镜像

    1
    2
    3
    4
    git clone https://github.com/csu-eis/CoDL.git
    cd CoDL
    # 基于dockerfile 编译出docker
    docker build -t codl/codl -f ./Dockerfile .
  2. 创建环境

    1
    2
    3
    4
    # 这里的{worksapce}设置为自己电脑上的工作区绝对路径
    sudo docker run -it --name codl-u16 --privileged -v /dev/bus/usb:/dev/bus/usb -v {worksapce}:/root/worksapce --hostname codl codl/codl:latest /bin/bash

    # sudo docker run -it --name codl-u16 --privileged -v /dev/bus/usb:/dev/bus/usb -v /home/yanghuan/Workspace/transsion:/root/worksapce --hostname codl codl/codl:latest /bin/bash

    这里我们需要注意一个电脑上只能同时开启一个adb server.请使用adb kill-server命令关闭其他正在运行的adb sever,确保只有docker里面的adb server正在运行

    1
    2
    3
    4
    5
    6
    #查看一下手机序列号
    adb devices
    #结果
    #root@codl:~/codl-mobile# adb devices
    #List of devices attached
    #3a9c4f5 device
  3. 编译可执行文件并push到手机

    1
    2
    3
    4
    # 确保进入了docker 容器
    cd ~/codl-mobile/
    # 编译可执行文件
    bash tools/codl/build_executable_files.sh

    构建成功如下图
    1684947363647.png

    push 到手机

    1
    2
    3
    4
    # 确保pwd还在~/codl-mobile/
    bash tools/codl/push_executable_files.sh 3a9c4f5
    # 确保push成功
    adb -s 3a9c4f5 shell "ls /data/local/tmp/codl"

    push成功如下图
    1684947603851.png

2.2.2 构建延迟预测器

  1. 收集延迟数据

    1
    2
    3
    4
    # 确保还在docker 容器内
    cd /root/codl-eval-tools/codl-lat-collect-and-eval/
    # 注意,如果我们需要对其他的Soc进行测试,我们需要改写collect_all_op_latency.sh,这里的.sh只支持[sdm855, sdm865, sdm888,kirin990]这几种
    bash tools/collect_all_op_latency.sh sdm865 3a9c4f5
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    |-- t_conv2d_cpu_direct.csv
    |-- t_conv2d_cpu_gemm.csv
    |-- t_conv2d_cpu_winograd.csv
    |-- t_conv2d_cpu_winograd_combined.csv
    |-- t_conv2d_cpu_winograd_gemm.csv
    |-- t_conv2d_gpu_direct.csv
    |-- t_data_sharing.csv
    |-- t_fc_cpu_gemv.csv
    |-- t_fc_gpu_direct.csv
    |-- t_mulayer_conv2d_cpu.csv
    |-- t_mulayer_conv2d_gpu.csv
    |-- t_mulayer_fc_cpu.csv
    |-- t_mulayer_fc_gpu.csv
    |-- t_mulayer_pooling_cpu.csv
    |-- t_mulayer_pooling_gpu.csv
    |-- t_pooling_cpu_direct_max.csv
    `-- t_pooling_gpu_direct_max.csv
  2. 训练延迟预测器

nn-meter——预测流程分析

nn-Meter预测流程

  1. 输入预测指令
    1
    nn-meter predict --predictor RedmiK30Pro_cpu_tflite27 --predictor-version 1.0 --onnx /root/workspace/nn-Meter/workspace/models/mobilenetv3small_0.onnx
    这条指令将会被nn_meter/utils/nn_meter_cli/predictor.py#apply_latency_predictor_cli这个函数接收到。函数内进行了参数解析,主要解析了–predictor, –predictor-version,–onnx/–tensorflow等参数。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    def apply_latency_predictor_cli(args):
    # specify model type
    if args.tensorflow:
    input_model, model_type, model_suffix = args.tensorflow, "pb", ".pb"
    elif args.onnx:
    input_model, model_type, model_suffix = args.onnx, "onnx", ".onnx"
    elif args.nn_meter_ir:
    input_model, model_type, model_suffix = args.nn_meter_ir, "nnmeter-ir", ".json"
    elif args.torchvision: # torch model name from torchvision model zoo
    input_model_list, model_type = args.torchvision, "torch"
    ...

    # load predictor
    predictor = load_latency_predictor(args.predictor, args.predictor_version)

    ...
    # predict latency
    result = {}
    for model in input_model_list:
    latency = predictor.predict(model, model_type) # in unit of ms
    result[os.path.basename(model)] = latency
    logging.result(f'[RESULT] predict latency for {os.path.basename(model)}: {latency} ms')

    return result

  2. 加载预测器load_latency_predictor
    step 1中,解析出–predictor, –predictor-version参数后会去加载相关的predictor文件,这里调用了nn_meter/predictor/nn_meter_predictor.py#load_latency_predictor函数根据用户目录下cache过的文件路径去找到预测器和融合规则。这些文件要么是官方默认提供的,要么是用户自己客制化的。找到然后返回一个nnMeterPredictor对象。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    def load_latency_predictor(predictor_name: str, predictor_version: float = None):
    user_data_folder = get_user_data_folder()
    pred_info = load_predictor_config(predictor_name, predictor_version)
    if "download" in pred_info:
    kernel_predictors, fusionrule = loading_to_local(pred_info, os.path.join(user_data_folder, 'predictor'))
    else:
    kernel_predictors, fusionrule = loading_customized_predictor(pred_info)

    return nnMeterPredictor(kernel_predictors, fusionrule)

  3. 预测器对象预测模型延时predictor.predict
    在step 2中得到一个预测器后,我们就会调用nn_meter/predictor/nn_meter_predictor.py#nnMeterPredictor.predict函数去预测模型了。
    1. self.kd.load_graph(graph)这里先将模型转换成graph,解析成kernels,这步的实现在nn_meter/kernel_detector/kernel_detector.py#KernelDetector.load_graph,这里考虑了融合规则,如何发现op的组合符合融合规则的,那么这些OP将会组合在一起去预测。

    2. nn_predict(self.kernel_predictors, self.kd.get_kernels()),将step 3.1中检测出来的kernels送入kernel预测器进行单逐一预测,nn_meter/predictor/prediction/predict_by_kernel.py#nn_predict,这里主要是提取kernels的特征,其实就是conv2d这些op的参数,例如输入输出维度、卷积和大小等。然后就是逐层根据op/kernel name去选择预测器加载特征去预测延时了。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      def predict(
      self, model, model_type, input_shape=(1, 3, 224, 224), apply_nni=False
      ):
      logging.info("Start latency prediction ...")
      if isinstance(model, str):
      graph = model_file_to_graph(model, model_type, input_shape, apply_nni=apply_nni)
      else:
      graph = model_to_graph(model, model_type, input_shape=input_shape, apply_nni=apply_nni)

      # logging.info(graph)
      self.kd.load_graph(graph)

      py = nn_predict(self.kernel_predictors, self.kd.get_kernels()) # in unit of ms
      logging.info(f"Predict latency: {py} ms")
      return py

nn-meter——构建CNN推理预测器

1 nn-meter构建流程

2 构建tflite预测器

2.1 环境搭建

  1. follow它的readme提示,准备nn-meter的安装

    1
    2
    3
    4
    5
    6
    7
    git clone https://github.com/microsoft/nn-Meter
    cd nn-Meter
    conda create -n nnmeter_tflite python=3.8
    # 当前nn-meter#8006ed6eaa62816c70737c9ff26a7445589bd36e支持到了2.11版本
    pip install -r docs/requirements/requirements_builder.txt
    # 安装nn-Meter
    pip install .
  2. 将tflite的benchmark工具推送到手机设备上
    我们从nn-meter上下载benchmark文件,我选择了tflite_benchmark_tools_v2.7.zip文件。

    1
    2
    3
    4
    5
    6
    7
    # 创建几个临时文件夹给nn-Meter存放文件
    adb shell "mkdir -p /mnt/sdcard/tflite_model"
    adb shell "mkdir -p /mnt/sdcard/tflite_kernel"
    # 推送benchmark文件到远程手机上
    adb push benchmark_model_cpu_gpu_v2.7 /data/local/tmp
    # 给benchmark设置可执行权限
    adb shell chmod +x /data/local/tmp/benchmark_model_cpu_gpu_v2.7
  3. 创建workspace,准备后端

    1
    nn-meter create --tflite-workspace /root/workspace/nn-Meter/workspace/RedmiK30Pro-sd865-tflite2.7cpu

    创建完之后,会出现configs/*.yaml文件,主要需要修改backend_config.yaml,其余两个不需要啥修改。

    • backend_config.yaml, 设置远程手机上的目录、benchmark位置,以及远程手机的地址(序列号或者IP),这个参数结合2.1#step 2。
      1
      2
      3
      4
      REMOTE_MODEL_DIR: /mnt/sdcard/tflite_bench
      BENCHMARK_MODEL_PATH: /data/local/tmp/benchmark_model_cpu_gpu_v2.7
      DEVICE_SERIAL: '3a9c4f5'
      KERNEL_PATH: /mnt/sdcard/tflite_kernel
    • predictorbuild_config.yaml,设置预测器相关的参数。
    • ruletest_config.yaml,设置OP融合规则相关的参数。

2.2 测试融合规则

在配置完环境和参数后,我们可以运行.py脚本自动化的执行OP融合测试和预测器了。nn-Meter提供了一些端到端的测试代码和分步的测试代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 参考文档: https://github.com/microsoft/nn-Meter/blob/main/docs/builder/test_fusion_rules.md#end-to-end-demo
workspace ="/root/workspace/nn-Meter/workspace/RedmiK30Pro-sd865-tflite2.7cpu"

from nn_meter.builder import profile_models, builder_config
builder_config.init(workspace) # initialize builder config with workspace
from nn_meter.builder.backends import connect_backend
from nn_meter.builder.backend_meta.fusion_rule_tester import generate_testcases, detect_fusion_rule

# generate testcases
origin_testcases = generate_testcases()

# connect to backend
backend = connect_backend(backend_name='tflite_cpu')

# run testcases and collect profiling results
profiled_results = profile_models(backend, origin_testcases, mode='ruletest')

# determine fusion rules from profiling results
detected_results = detect_fusion_rule(profiled_results)

执行结束后,我们的{workspace}/fusion_rule_test/文件夹下会出现测试结果。

2.3 构建kernel预测器

1
2
3
4
5
6
7
8
9
# 参考文档: https://github.com/microsoft/nn-Meter/blob/main/docs/builder/build_kernel_latency_predictor.md#end-to-end-demo
workspace ="/root/workspace/nn-Meter/workspace/RedmiK30Pro-sd865-tflite2.7cpu"

from nn_meter.builder import builder_config
builder_config.init(workspace)

# build latency predictor for kernel
from nn_meter.builder import build_latency_predictor
build_latency_predictor(backend="tflite_cpu")

2.4 构建model预测器

同样根据文档步骤将2.2和2.3的OP融合规则和Kernel Predictor放到一个文件夹下,同时增加一个yaml配置文件,就可以注册一个Model Latency Predictor了。

  1. 拷贝文件和重命名
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    # 1. 将finegrained2.pkl复制到指定目录然后rename
    cp workspace/RedmiK30Pro-sd865-tflite2.7cpu/predictor_build/results/predictors/*finegrained2.pkl /root/workspace/nn-Meter/workspace/predictor/redmik30p_sd865_tflite2.7cpu

    #!/bin/bash
    # 遍历当前目录下所有的文件
    for file in *
    do
    # 判断文件名是否以"_finegrained2.pkl"结尾
    if [[ $file == *_finegrained2.pkl ]]
    then
    # 替换文件名中的"_finegrained2.pkl"为".pkl"
    new_name=${file/_finegrained2.pkl/.pkl}
    # 重命名文件
    echo "$new_name"
    mv "$file" "$new_name"
    fi
    done

    # 2. 融合规则
    cp workspace/RedmiK30Pro-sd865-tflite2.7cpu/fusion_rule_test/results/detected_fusion_rule.json /root/workspace/nn-Meter/workspace/predictor/redmik30p_sd865_tflite2.7cpu/fusion_rules.json
    目录树如下
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    redmik30p_sd865_tflite2.7cpu
    |-- add.pkl
    |-- addrelu.pkl
    |-- avgpool.pkl
    |-- bn.pkl
    |-- bnrelu.pkl
    |-- channelshuffle.pkl
    |-- concat.pkl
    |-- conv-bn-relu.pkl
    |-- dwconv-bn-relu.pkl
    |-- fc.pkl
    |-- fusion_rules.json
    |-- global-avgpool.pkl
    |-- hswish.pkl
    |-- maxpool.pkl
    |-- relu.pkl
    |-- se.pkl
    |-- split.pkl
  2. 写一个yaml文件索引文件位置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    name: redmik30p_sd865_tflite2.7cpu
    version: 1.0
    category: cpu
    package_location: /root/workspace/nn-Meter/workspace/predictor/redmik30p_sd865_tflite2.7cpu
    kernel_predictors:
    - conv-bn-relu
    - dwconv-bn-relu
    - fc
    - global-avgpool
    - hswish
    - relu
    - se
    - split
    - add
    - addrelu
    - maxpool
    - avgpool
    - bn
    - bnrelu
    - channelshuffle
    - concat
  3. 注册预测器
    1
    2
    3
    4
    # 注册
    nn-meter register --predictor /root/workspace/nn-Meter/workspace/predictor/redmik30p_sd865_tflite2.7cpu.yaml
    #
    nn-meter --list-predictors
    成功注册后一般会显示
    1
    2
    3
    4
    5
    6
    7
    (nn-Meter) Successfully register predictor: redmik30p_sd865_tflite2.7cpu
    (nn-Meter) Supported latency predictors:
    (nn-Meter) [Predictor] cortexA76cpu_tflite21: version=1.0
    (nn-Meter) [Predictor] adreno640gpu_tflite21: version=1.0
    (nn-Meter) [Predictor] adreno630gpu_tflite21: version=1.0
    (nn-Meter) [Predictor] myriadvpu_openvino2019r2: version=1.0
    (nn-Meter) [Predictor] redmik30p_sd865_tflite2.7cpu: version=1.0

3 测试

3.1 预测值和实际值的差别

  1. 基于Tensorflow2 API导出一个resnet50的模型

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41

    import tensorflow as tf
    from tensorflow.keras.applications.resnet50 import ResNet50
    from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
    # 加载模型
    model = ResNet50(weights='imagenet')

    full_model = tf.function(lambda x: model(x))
    shape = [1,224,224,3] # model.inputs[0].shape
    full_model = full_model.get_concrete_function(
    tf.TensorSpec(shape, model.inputs[0].dtype))

    # Get frozen ConcreteFunction
    frozen_func = convert_variables_to_constants_v2(full_model)
    frozen_func.graph.as_graph_def()

    layers = [op.name for op in frozen_func.graph.get_operations()]
    print("-" * 50)
    print("Frozen model layers: ")
    for layer in layers:
    print(layer)

    print("-" * 50)
    print("Frozen model inputs: ")
    print(frozen_func.inputs)
    print("Frozen model outputs: ")
    print(frozen_func.outputs)

    # Save frozen graph from frozen ConcreteFunction to hard drive
    tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
    logdir="./frozen_models",
    name="frozen_graph.pb",
    as_text=False)


    # 将模型转换为 TensorFlow Lite 格式,并保存为 .tflite 文件
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    tflite_model = converter.convert()
    with open('resnet50.tflite', 'wb') as f:
    f.write(tflite_model)

  2. 用nn-Meter预测

    1

  3. benchmark运行

    1
    2
    3
    4
    /data/local/tmp/benchmark_model_cpu_gpu_v2.7 --num_threads=4 \
    --graph=/mnt/sdcard/tflite_models/resnet50.tflite \
    --warmup_runs=30 \
    --num_runs=50
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    STARTING!
    Log parameter values verbosely: [0]
    Min num runs: [50]
    Num threads: [4]
    Min warmup runs: [30]
    Graph: [/mnt/sdcard/tflite_models/resnet50.tflite]
    #threads used for CPU inference: [4]
    Loaded model /mnt/sdcard/tflite_models/resnet50.tflite
    INFO: Initialized TensorFlow Lite runtime.
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    INFO: Replacing 75 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions.
    The input model file size (MB): 102.161
    Initialized session in 98.471ms.
    Running benchmark for at least 30 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
    count=30 first=104448 curr=87126 min=86737 max=104448 avg=88622.5 std=3079

    Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
    count=50 first=87163 curr=89038 min=86939 max=93704 avg=88199.2 std=1353

    Inference timings in us: Init: 98471, First inference: 104448, Warmup (avg): 88622.5, Inference (avg): 88199.2
    Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
    Memory footprint delta from the start of the tool (MB): init=134.562 overall=208.699

3.2 预测一些未训练的kernel模型

这里我选了一个SSD模型,它的算子种类如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Add
BatchNormalization
Cast
Concat
Constant
ConstantOfShape
Conv
Exp
Gather
MaxPool
Mul
NonMaxSuppression
ReduceMin
Relu
Reshape
Shape
Slice
Softmax
Squeeze
Sub
TopK
Transpose
Unsqueeze

执行预测后nn-meter predict --predictor redmik30p_sd865_tflite2.7cpu --predictor-version 1.0 --onnx /root/workspace/nn-Meter/workspace/models/ssd-12.onnx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(nn-Meter) Start latency prediction ...
(nn-Meter) Empty shape information with Constant_339
(nn-Meter) Empty shape information with Shape_340
(nn-Meter) Empty shape information with Gather_341
(nn-Meter) Empty shape information with Constant_342
(nn-Meter) Empty shape information with Constant_343
(nn-Meter) Empty shape information with Unsqueeze_344
(nn-Meter) Empty shape information with Unsqueeze_345
(nn-Meter) Empty shape information with Unsqueeze_346
(nn-Meter) Empty shape information with Concat_347
(nn-Meter) Empty shape information with Reshape_348
(nn-Meter) Empty shape information with Constant_350
(nn-Meter) Empty shape information with Shape_351
(nn-Meter) Empty shape information with Gather_352
(nn-Meter) Empty shape information with Constant_353
(nn-Meter) Empty shape information with Constant_354
...
(nn-Meter) Empty shape information with Unsqueeze_scores
Traceback (most recent call last):
File "/root/anaconda3/envs/nnmeter_tflite/bin/nn-meter", line 8, in <module>
sys.exit(nn_meter_cli())
File "/root/anaconda3/envs/nnmeter_tflite/lib/python3.8/site-packages/nn_meter/utils/nn_meter_cli/interface.py", line 266, in nn_meter_cli
args.func(args)
File "/root/anaconda3/envs/nnmeter_tflite/lib/python3.8/site-packages/nn_meter/utils/nn_meter_cli/predictor.py", line 56, in apply_latency_predictor_cli
latency = predictor.predict(model, model_type) # in unit of ms
File "/root/anaconda3/envs/nnmeter_tflite/lib/python3.8/site-packages/nn_meter/predictor/nn_meter_predictor.py", line 111, in predict
self.kd.load_graph(graph)
File "/root/anaconda3/envs/nnmeter_tflite/lib/python3.8/site-packages/nn_meter/kernel_detector/kernel_detector.py", line 19, in load_graph
new_graph = convert_nodes(graph)
File "/root/anaconda3/envs/nnmeter_tflite/lib/python3.8/site-packages/nn_meter/kernel_detector/utils/ir_tools.py", line 14, in convert_nodes
type = node["attr"]["type"]
KeyError: 'type'

爆出的Empty shape information发生在nn_meter/ir_converter/onnx_converter/converter.py#OnnxConverter.fetch_attrs函数中。这导致返回的attr变量为空,最终报错。
nn-Meter在计算时间要获取OP的输入输出shape,这里的shape等算子不是传统的OP,所有报错了。这里总结了这个模型报错的的算子类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Add
Cast
Concat
Constant
ConstantOfShape
Exp
Gather
Mul
NonMaxSuppression
ReduceMin
Reshape
Shape
Slice
Softmax
Squeeze
Sub
TopK
Transpose
Unsqueeze

感觉这里的问题有点复杂,有些算子是nn-Meter训练过的,但是还是报错了比如Add,Concat.

3.3 nn-Meter的问题

  1. 对于Tensorflow模型来说,nn-Meter可能会在ShapeInference出现问题.
  2. nn-Meter目前对于仅支持一些CNN常用的算子.
  3. nn-Meter支持的模型数据类型只有float类型和int32类型.

4. 参考资料

  1. https://github.com/microsoft/nn-Meter
  2. https://blog.csdn.net/ouening/article/details/104335552

5. 参考代码

  1. 打印onnx模型的OP
    1
    2
    3
    4
    5
    6
    7
    8
    import onnx
    model_file = "/root/workspace/nn-Meter/workspace/models/mobilenetv3small_0.onnx" # ONNX模型文件路径
    model = onnx.load(model_file)
    op_types = set()
    for node in model.graph.node:
    op_types.add(node.op_type)
    op_types = list(op_types)
    [print(op_type) for op_type in op_types]

Focal Loss的Pytorch实现

1 公式推导

1.1 交叉熵(cross entropy)

  1. 信息论中认为事件$X$中概率小的可能性$x_i$如果发生了将会包含了更多的信息量。假设$x_i$就是指的某个可能性,而$P(X=x_i)=p(x_i)$是该可能性发生的概率,所以对信息量的定义就是
    $$
    I(x_i) = \log{\frac{1}{p(x_i)}}
    $$

  2. 而熵的概念就是对于事件的所有可能性的期望,$N$指的是样本数量
    $$
    H(X) = -\sum_{i=1}^{N}{p(x_i)\log{p(x_i)}}
    $$

  3. 交叉熵为我们提供了一种表达两种概率分布的差异的方法。
    $X$和$Y$的分布越不相同, $X$相对于$Y$的交叉熵将越大于$Y$的熵
    $$
    H_{Y}(X) = -\sum_{i=1}^{N}{p(y_i)\log{p(x_i)}}
    $$

  4. 多分类的交叉熵损失函数
    假设$N$个样本,$K$个分类,$I(y_i=k)$记作$y_{i,k}$,这一般是gt,要么为1,要么为0
    $$
    l(X)=-\frac{1}{N}\sum_{i=1}^{N}{\sum_{k=1}^{K}{y_{i,k}\log{x_{i,k}}}}
    $$

1.2 平衡交叉熵函数(balanced cross entropy)

$$
l(X)=-\frac{1}{N}\sum_{i=1}^{N}{\sum_{k=1}^{K}{\alpha_{k}y_{i,k}\log{x_{i,k}}}}
$$
$\alpha_{k}$为样本分布比例

1.3 Focal Loss

如果数据集中的分类样本不均匀,会导致损失函数中多数类别的权重会提高,少数样本的参数学习会很困难。
$$
l(X)=-\frac{1}{N}\sum_{i=1}^{N}{\sum_{k=1}^{K}{\alpha_{k}(1-x_{i,k})^{\gamma}y_{i,k}\log{x_{i,k}}}}
$$

focal loss相比balanced cross entropy而言,二者都是试图解决样本不平衡带来的模型训练问题,后者从样本分布角度对损失函数添加权重因子,前者从样本分类难易程度出发,使loss聚焦于难分样本。

2 代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# pytorch实现
class FocalLoss(nn.Module):
def __init__(self, gamma = 2, alpha = 1, size_average = True):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.alpha = alpha
self.size_average = size_average
self.elipson = 0.000001

def forward(self, outputs, labels):
# 先计算CE Loss
ce_loss = torch.nn.functional.cross_entropy(outputs, labels, reduction='none')
# 消掉log
pt = torch.exp(-ce_loss)
# mean over the batch
focal_loss = (self.alpha * (1-pt)**self.gamma * ce_loss).mean()
return focal_loss

3 参考资料

  1. 损失函数:交叉熵详解
  2. 交叉熵的原理
  3. Focal loss及多分类任务实现
docker容器配置ADB开发环境

docker容器配置ADB开发环境

创建 Docker 容器

1
2
3
4
5
# 1. 要求映射usb文件夹到docker /dev/bus/usb:/dev/bus/usb
docker run -it --privileged=true -v /dev/bus/usb:/dev/bus/usb -v /home/yanghuan/workspace/aiot_benchmark:/root/workspace --net=host --name=aiot_benchmark --hostname=aiot_benchmark ubuntu:20.04 /bin/bash

# 2. 安装android udev规则,此时 cat /etc/group 确保有plugdev用户组
apt-get install android-sdk-platform-tools-common

1681447491778.png

1
2
3
4
5
6
7
8
9
10
11
# 3.将当前用户加入plugindev用户组
usermod-a -G plugdev root
# 4. 关闭的退出容器
exit
# 5. 宿主机上重新登录docker
docker start aiot_benchmark
docker attach aiot_benchmark
# 6. 确保已加入用户组,并检查连接情况
id
lsusb
# 没有lsusb的,apt install usbutils

1681448105689.png

安装 Android SDK

安装 sdkmanager

参考sdkmanager文档中描述

  1. 从 Android Studio 下载页面中下载最新的“command line tools only”软件包,然后将其解压缩。

  2. 将解压缩的 cmdline-tools 目录移至您选择的新目录,例如 android_sdk。这个新目录就是您的 Android SDK 目录。

  3. 在解压缩的 cmdline-tools 目录中,创建一个名为 latest 的子目录。

  4. 将原始 cmdline-tools 目录内容(包括 lib 目录、bin 目录、NOTICE.txt 文件和 source.properties 文件)移动到新创建的 latest 目录中。现在,您就可以从这个位置使用命令行工具了。

  5. 加入环境变量

    1
    2
    echo "export PATH=$PATH:/root/android_sdk/cmdline-tools/latest/bin" >> /root/.bashrc
    source ~/.bashrc
  6. 安装 open jdk 11

    1
    apt-get install openjdk-11-jdk -y
  7. 安装 platfrom-tools 和其他包

    1
    2
    3
    # 设置代理
    export https_proxy="http://172.16.101.180:7890"
    sdkmanager --install "platform-tools" "platforms;android-29" "ndk;25.0.8775105"

1681449532864.png

  1. 设置 adb 环境变量

    1
    2
    echo "export PATH=$PATH:/root/android_sdk/platform-tools" >> /root/.bashrc
    source ~/.bashrc
  2. 连接设备(确保没有其他的 adb 服务了)
    1681450096396.png

Docker 配置桌面 + VNC Server

Docker 配置桌面 + VNC Server

面向docker hub构建

1
2
3
4
5
6
7
8
docker pull sheephuan/vnc4docker:ubuntu2004-xfce4-20220531

docker run --privileged -itd -p 22001:22 -p 15901:5901 --name ubuntu-xfce4 vnc-ubuntu2004:xcfe4 /usr/sbin/init

docker exec -it ubuntu-xfce4 /bin/bash

# vncserver 默认密码123456
vncserver :1 -localhost no

1654065658384.png

从零开始构建

构建Dokcer

1
2
3
4
5
6
7
docker pull ubuntu:focal-20200423

# -v hostdir:dockerdir
# 注意这样启动 无法正常使用systemctl命令
docker run -it -p 22001:22 -p 15902:5901 --name xfce4 ubuntu:focal-20200423 /bin/bash
#启动
docker exec -it huany /bin/bash

apt换源

1
2
3
4
5
# 备份
cp /etc/apt/sources.list /etc/apt/sources.list.bak
rm /etc/apt/sources.list
vim /etc/apt/sources.list

阿里云

https://developer.aliyun.com/mirror/ubuntu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse

# deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse

清华开源镜像站

https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 20.04 LTS
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
# 记得把https换成http
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse

# 预发布软件源,不建议启用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse

中科大

https://mirrors.ustc.edu.cn/help/ubuntu.html?highlight=ubuntu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 默认注释了源码仓库,如有需要可自行取消注释
deb http://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse

deb http://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse

deb http://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse

deb http://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse

# 预发布软件源,不建议启用
# deb https://mirrors.ustc.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse

安装桌面环境 xfce4+tigervnc

安装

1
2
3
apt update
apt install xfce4 xfce4-goodies -y
apt install tigervnc-standalone-server -y

配置vnc

设置vnc密码
1
2
3
4
# 设置vnc密码
vncpasswd
# 输入两次密码
# 第三次的view-only password 输入 n 即可
设置vnc启动桌面
1
2
# 编写xstartup文件,
vim /root/.vnc/xstartup
1
2
3
4
5
#输入以下内容
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
exec startxfce4
1
2
# 设置启动权限
chmod u+x /root/.vnc/xstartup
设置vnc显示选项
1
2
3
4
5
6
# in /root/.vnc/config
# vim /root/.vnc/config
geometry=1920x1080
dpi=128


启动vnc
1
2
# -localhost no表示其他IP可以访问,否则只能本地访问
vncserver -localhost no
设置 VNC 为 系统服务(我没做)
1
vim /etc/systemd/system/vncserver@.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=simple
User=linuxize
PAMName=login
PIDFile=/home/%u/.vnc/%H%i.pid
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill :%i > /dev/null 2>&1 || :'
ExecStart=/usr/bin/vncserver :%i -geometry 1920x1080 -alwaysshared -fg
ExecStop=/usr/bin/vncserver -kill :%i

[Install]
WantedBy=multi-user.target

保存此文件

1
2
3
4
5
6
# 配置文件生效
systemctl daemon-reload
# 设为开机启动
systemctl enable vncserver@1.service
systemctl start vncserver@1.service
systemctl status vncserver@1.service

打包镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
docker commit xfce4 vnc-ubuntu2004:xcfe4

# 容器长传
docker tag vnc-ubuntu2004:xcfe4 sheephuan/vnc4docker:ubuntu2004-xfce4-20220531
docker login
docker push sheephuan/vnc4docker:ubuntu2004-xfce4-20220531

docker save -o vnc-ubuntu2004-xcfe4.tar vnc-ubuntu2004:xcfe4



# 在新服务载入
docker load --input vnc-ubuntu2004-xcfe4.tar

考虑学习写一个Dockerfile

问题

System has not been booted with systemd as init system (PID 1). Can’t operate.

1
docker run --privileged -itd -p 22004:22  -p 15904:5901 --name txfce4 vnc-ubuntu2004:xcfe4 /usr/sbin/init

Failed to exec default Terminal Emulator

1653997211716.png

https://blog.csdn.net/weixin_42912498/article/details/107162983

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×