nn-meter——预测流程分析
nn-Meter预测流程
- 输入预测指令 这条指令将会被
1
nn-meter predict --predictor RedmiK30Pro_cpu_tflite27 --predictor-version 1.0 --onnx /root/workspace/nn-Meter/workspace/models/mobilenetv3small_0.onnx
nn_meter/utils/nn_meter_cli/predictor.py#apply_latency_predictor_cli
这个函数接收到。函数内进行了参数解析,主要解析了–predictor, –predictor-version,–onnx/–tensorflow等参数。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25def apply_latency_predictor_cli(args):
# specify model type
if args.tensorflow:
input_model, model_type, model_suffix = args.tensorflow, "pb", ".pb"
elif args.onnx:
input_model, model_type, model_suffix = args.onnx, "onnx", ".onnx"
elif args.nn_meter_ir:
input_model, model_type, model_suffix = args.nn_meter_ir, "nnmeter-ir", ".json"
elif args.torchvision: # torch model name from torchvision model zoo
input_model_list, model_type = args.torchvision, "torch"
...
# load predictor
predictor = load_latency_predictor(args.predictor, args.predictor_version)
...
# predict latency
result = {}
for model in input_model_list:
latency = predictor.predict(model, model_type) # in unit of ms
result[os.path.basename(model)] = latency
logging.result(f'[RESULT] predict latency for {os.path.basename(model)}: {latency} ms')
return result - 加载预测器
load_latency_predictor
step 1中,解析出–predictor, –predictor-version参数后会去加载相关的predictor文件,这里调用了nn_meter/predictor/nn_meter_predictor.py#load_latency_predictor
函数根据用户目录下cache过的文件路径去找到预测器和融合规则。这些文件要么是官方默认提供的,要么是用户自己客制化的。找到然后返回一个nnMeterPredictor
对象。1
2
3
4
5
6
7
8
9
10def load_latency_predictor(predictor_name: str, predictor_version: float = None):
user_data_folder = get_user_data_folder()
pred_info = load_predictor_config(predictor_name, predictor_version)
if "download" in pred_info:
kernel_predictors, fusionrule = loading_to_local(pred_info, os.path.join(user_data_folder, 'predictor'))
else:
kernel_predictors, fusionrule = loading_customized_predictor(pred_info)
return nnMeterPredictor(kernel_predictors, fusionrule) - 预测器对象预测模型延时
predictor.predict
在step 2中得到一个预测器后,我们就会调用nn_meter/predictor/nn_meter_predictor.py#nnMeterPredictor.predict
函数去预测模型了。self.kd.load_graph(graph)
这里先将模型转换成graph,解析成kernels,这步的实现在nn_meter/kernel_detector/kernel_detector.py#KernelDetector.load_graph
,这里考虑了融合规则,如何发现op的组合符合融合规则的,那么这些OP将会组合在一起去预测。nn_predict(self.kernel_predictors, self.kd.get_kernels())
,将step 3.1中检测出来的kernels送入kernel预测器进行单逐一预测,nn_meter/predictor/prediction/predict_by_kernel.py#nn_predict
,这里主要是提取kernels的特征,其实就是conv2d这些op的参数,例如输入输出维度、卷积和大小等。然后就是逐层根据op/kernel name去选择预测器加载特征去预测延时了。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16def predict(
self, model, model_type, input_shape=(1, 3, 224, 224), apply_nni=False
):
logging.info("Start latency prediction ...")
if isinstance(model, str):
graph = model_file_to_graph(model, model_type, input_shape, apply_nni=apply_nni)
else:
graph = model_to_graph(model, model_type, input_shape=input_shape, apply_nni=apply_nni)
# logging.info(graph)
self.kd.load_graph(graph)
py = nn_predict(self.kernel_predictors, self.kd.get_kernels()) # in unit of ms
logging.info(f"Predict latency: {py} ms")
return py
nn-meter——预测流程分析