We use cookies on this website to provide a user experience that’s more tailored to you. By continuing to use the website, you are giving your consent to receive cookies on this site. Read more about our Cookie Policy and Privacy Policy.
I acceptHome > Resources Center > Blog
2020-05-29
We will discuss face recognition and mask detection models in the following paragraphs, detailing how they were built.
Phase 1: Data Collection:
The sample images we collected contain faces with background noises. To distinguish the background noises from the boundaries of the objects, we adopted the MTCNN algorithm, which can predict the probability of an image containing human faces and return the coordinates where the face is located in the image.
We extracted the face region of interest using the MTCNN algorithm, and scaled the cropped image to 112×112. The same processing steps were performed on the MS-Celeb-1M images.
The sample images were divided into two subsets, namely, training set (for training model parameters) and validation set (for verifying the prediction accuracy) for face detection:
Training set: 1,300 “with mask” images, 1,300 “without mask” images
Validation set: 300 “with mask” images, 300 “without mask” images
When it came to the face recognition model, we built the training set using MS-Celeb-1M, and the validation set using LFW, a public benchmark widely used in industry for studying the accuracy of a face recognition model (Huang, 2007).
Phase 2: Training:
Here we will introduce the model training technologies in terms of neural network, directory structure, data generation, and training scripts.
MobileNet V2, a lightweight convolutional neural network (Sandler, 2018), was applied to mask detection. The network was designed to optimize the computing performance of a model by significantly reducing the number of parameters and mathematical calculations, only at the cost of a slight decrease in prediction accuracy. In MobileNet V2, there are two types of blocks. We set the input image size to 112×112 for the first convolutional layer. That’s why we had resized the sample images to 112×112 in the data collection phase.
With the pre-trained model provided by GitHub user MrCPlusPlus (2018) as our initial design, we changed the number of output nodes to 2 (for the desired output clusters of “With Mask” and “Without Mask”), and fine-tuned it with the batch size of 32 (32 sample images from the training set were used each time). We also adopted the Adam optimizer with the initial learning rate of 0.1, applied Softmax loss function, and conducted L2 regularization to tune the network. The validation accuracy was stuck at 99% from Epoch 50.
Below is the directory structure of mask_wear_detection:
“./data/” mainly stores the mask image data.
“./output/” stores pre-trained model files and the files obtained after training.
The functions of the Python scripts in the directory are as follows:
data_process.py: To convert the file format from “jpg” to “TFRecord”.
train_nets.py: To train the mask detection model.
eval_ckpt_file.py: To evaluate the mask detection accuracy
freeze_graph.py: To freeze the model and convert the model format from “.ckpt” to “.pb” for deployment.
inference_predict_pb.py: To perform inference on a single image.
common.py: To define training parameters.
MobileFaceNet.py: MobileNet V2 architecture.
To convert the file format to TFRecord, we should open the script “data_process.py”:
We used the preset “clear_train.txt” and “clear_val.txt” to generate non-mask training and validation sets, and “mask_train.txt” and “mask_val.txt” to generate the mask ones.
At the top of the “train_nets.py” script, we imported the following packages.
The training hyperparameters were defined as follows.
The optimizer was set to Adam by default.
The TFRecord files, generated in “data_process.py”, carry the training and validation sets. We used TensorFlow iterator to extract the training / validation data from the files each time.
We adopted the Softmax Loss with L2 regularization to calculate the “total_loss”.
Transfer learning was applied to this training. We imported the “pretrained_model”, and loaded the parameters of all layers except the top one as the initialization parameters.
By executing TensorFlow Session, we performed parameter training on “images_train” (image data of training set) and “labels_train” (labels in training set). The loss (the lower the better) and the accuracy (the higher the better) could be figured out based on the indicators “total_loss_train” and “acc_train”.
The validation loss (inference_loss) and accuracy (acc_val) were calculated over each 10 (validate_interval value) times of training to evaluate the generalization of the model.
Once the validation accuracy was improved, we saved the model in the “.ckpt” format.
For the purpose of face recognition, we adopted ResNet50, a version of Residual Network (He, 2016). The skip connections of Residual Network enable the bottom-top feature fusion for addressing the problems of vanishing or exploding gradients during the training process, so the parameters of the bottom layers could be fully trained and tuned. Likewise, the input image size was set to 112×112 for the first layer of the network.
We adopted the ResNet50 architecture provided by GitHub user deepinsight (2020) as the pre-trained model, and finetuned it with such parameters: batch size = 8, epoch = 10, optimizer = Adam, and learning rates as follows.
epoch |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Learning Rate |
0.1 |
0.01 |
0.01 |
0.01 |
0.001 |
0.001 |
0.0001 |
0.0001 |
0.00001 |
0.00001 |
We also applied Softmax Loss with L2 regularization to calculate the total loss. The validation accuracy of the model finally reached 99.8% over the validation set from LFW.
Below is the directory structure of InsightFace_Tensorflow:
“./data/” stores MS-Celeb-1M data (train.rec and train.idx), LFW data (lfw.bin) and TFRecord data.
“./output/” stores pre-trained model files and the files obtained after training.
The functions of the Python scripts in the directory are as follows:
data_process.py: To convert the file format from “train.rec” and “train.idx” to “TFRecord”.
train_nets.py: To train the face recognition model.
eval_ckpt_file.py: To evaluate the recognition accuracy using LFW data.
freeze_graph.py: To freeze the model and convert the model format from “.ckpt” to “.pb” for deployment.
inference_extract_pb.py: To perform inference on a single image.
common.py: To define training parameters.
ResNet50.py: ResNet50 architecture.
To convert the file format to TFRecord, we should open the script “data_process.py”:
We assigned the paths “train.rec” and “train.idx” to variables “idx_path” and “bin_path” to generate the training set.
At the top of the script “train_nets.py”, we imported the following packages.
The training hyperparameters were defined as follows:
The optimizer was set to Adam by default.
The TFRecord file, generated in “data_process.py”, carries the training set. We used the TensorFlow iterator to extract the training data from the file each time.
The Softmax Loss function was employed.
Transfer learning was applied to this training. We imported the pretrained_model, and loaded the parameters of all layers except the top one as the initialization parameters. Before that, we had saved the pb file provided by deepinsight (2020) as an npy file.
By executing TensorFlow Session, we performed parameter training on “images_train” (image data of training set) and “labels_train” (labels in training set). The loss (the lower the better) and the accuracy (the higher the better) could be figured out based on the indicators “total_loss_train” and “acc_train”.
The validation accuracy was calculated by “evaluation.py” over each 10 (validate_interval value) times of training.
Once the “np.mean” value (accuracy indicator) was improved, we saved the model in the “.ckpt” format.
It is noteworthy that the MTCNN used here is a pre-trained network provided by GitHub user OAID (2018).
Phase 3: Deployment:
With the well-trained models for mask detection, face detection and recognition, we can deploy them to different applications we have.
We loaded and inferred the abovementioned models using C++ dynamic library provided by GitHub user Neargye (2020), and compiled the AI programs into DLL files to make them applicable to different applications.
For the smart temperature measurement system, we used the FLIR C3 camera for image acquisition. This camera is able to measure the surface temperature of an object by thermal imaging. Despite that the programming language of FLIR C3 SDK is C# and the operating environment is Windows 10, we are able to integrate the DLL files and the SDK to achieve a series of features including image acquisition, face detection, identity verification, mask detection, temperature measuring, result display, etc.10。
Reference
[1] Guo, Yandong, et al. "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition." european conference on computer vision (2016): 87-102.
[2] G. B. Huang, M.
Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database
for studying face recognition
in unconstrained environments. Technical report, 2007.
[3] Sandler, Mark,
et al. "MobileNetV2: Inverted Residuals and Linear
Bottlenecks." computer vision and pattern recognition (2018): 4510-4520.
[4] Open source
code. Available from https://github.com/MrCPlusPlus/MobileFaceNet_Tensorflow_Pretrain.
[5] He,
Kaiming, et al. "Deep Residual Learning for Image Recognition." computer
vision and pattern recognition (2016): 770-778.
[6] Open source
code. Available from https://github.com/deepinsight/insightface.
[7] Open source
code. Available from https://github.com/OAID/FaceDetection.
[8] Open source code. Available from https://github.com/Neargye/hello_tf_c_api.
General Enquiry / Sales Hotline +31 20 567 2000
Service Hotline +372 622 33 00
Copyright © 中信國際電訊(信息技術)有限公司 CITIC Telecom International CPC Limited
Thank you for your enquiry.