We use cookies on this website to provide a user experience that’s more tailored to you. By continuing to use the website, you are giving your consent to receive cookies on this site. Read more about our Cookie Policy and Privacy Policy.

 I accept

Blog

CITIC TELECOM CPC > Blog
2020-05-29

How to Build AI Face Recognition and Mask Detection Models?

By Henry Zeng, Senior AI Application Specialist, China Entercom

We will discuss face recognition and mask detection models in the following paragraphs, detailing how they were built.


Phase 1: Data Collection:

The sample images we collected contain faces with background noises. To distinguish the background noises from the boundaries of the objects, we adopted the MTCNN algorithm, which can predict the probability of an image containing human faces and return the coordinates where the face is located in the image.

We extracted the face region of interest using the MTCNN algorithm, and scaled the cropped image to 112×112. The same processing steps were performed on the MS-Celeb-1M images.

The sample images were divided into two subsets, namely, training set (for training model parameters) and validation set (for verifying the prediction accuracy) for face detection:

Training set: 1,300 “with mask” images, 1,300 “without mask” images

Validation set: 300 “with mask” images, 300 “without mask” images

When it came to the face recognition model, we built the training set using MS-Celeb-1M, and the validation set using LFW, a public benchmark widely used in industry for studying the accuracy of a face recognition model (Huang, 2007).


Phase 2: Training:

Here we will introduce the model training technologies in terms of neural network, directory structure, data generation, and training scripts.

MobileNet V2, a lightweight convolutional neural network (Sandler, 2018), was applied to mask detection. The network was designed to optimize the computing performance of a model by significantly reducing the number of parameters and mathematical calculations, only at the cost of a slight decrease in prediction accuracy. In MobileNet V2, there are two types of blocks. We set the input image size to 112×112 for the first convolutional layer. That’s why we had resized the sample images to 112×112 in the data collection phase.

With the pre-trained model provided by GitHub user MrCPlusPlus (2018) as our initial design, we changed the number of output nodes to 2 (for the desired output clusters of “With Mask” and “Without Mask”), and fine-tuned it with the batch size of 32 (32 sample images from the training set were used each time). We also adopted the Adam optimizer with the initial learning rate of 0.1, applied Softmax loss function, and conducted L2 regularization to tune the network. The validation accuracy was stuck at 99% from Epoch 50.

Below is the directory structure of mask_wear_detection:


“./data/” mainly stores the mask image data.

“./output/” stores pre-trained model files and the files obtained after training.

The functions of the Python scripts in the directory are as follows:

data_process.py: To convert the file format from “jpg” to “TFRecord”.

train_nets.py: To train the mask detection model.

eval_ckpt_file.py: To evaluate the mask detection accuracy

freeze_graph.py: To freeze the model and convert the model format from “.ckpt” to “.pb” for deployment.

inference_predict_pb.py: To perform inference on a single image.

common.py: To define training parameters.

MobileFaceNet.py: MobileNet V2 architecture.


To convert the file format to TFRecord, we should open the script “data_process.py”:

We used the preset “clear_train.txt” and “clear_val.txt” to generate non-mask training and validation sets, and “mask_train.txt” and “mask_val.txt” to generate the mask ones.

At the top of the “train_nets.py” script, we imported the following packages.

The training hyperparameters were defined as follows.

The optimizer was set to Adam by default.

The TFRecord files, generated in “data_process.py”, carry the training and validation sets. We used TensorFlow iterator to extract the training / validation data from the files each time.

We adopted the Softmax Loss with L2 regularization to calculate the “total_loss”.

Transfer learning was applied to this training. We imported the “pretrained_model”, and loaded the parameters of all layers except the top one as the initialization parameters.

By executing TensorFlow Session, we performed parameter training on “images_train” (image data of training set) and “labels_train” (labels in training set). The loss (the lower the better) and the accuracy (the higher the better) could be figured out based on the indicators “total_loss_train” and “acc_train”.

The validation loss (inference_loss) and accuracy (acc_val) were calculated over each 10 (validate_interval value) times of training to evaluate the generalization of the model.

Once the validation accuracy was improved, we saved the model in the “.ckpt” format.


For the purpose of face recognition, we adopted ResNet50, a version of Residual Network (He, 2016). The skip connections of Residual Network enable the bottom-top feature fusion for addressing the problems of vanishing or exploding gradients during the training process, so the parameters of the bottom layers could be fully trained and tuned. Likewise, the input image size was set to 112×112 for the first layer of the network.

We adopted the ResNet50 architecture provided by GitHub user deepinsight (2020) as the pre-trained model, and finetuned it with such parameters: batch size = 8, epoch = 10, optimizer = Adam, and learning rates as follows.

epoch

1

2

3

4

5

6

7

8

9

10

Learning Rate

0.1

0.01

0.01

0.01

0.001

0.001

0.0001

0.0001

0.00001

0.00001

We also applied Softmax Loss with L2 regularization to calculate the total loss. The validation accuracy of the model finally reached 99.8% over the validation set from LFW.


Below is the directory structure of InsightFace_Tensorflow:

“./data/” stores MS-Celeb-1M data (train.rec and train.idx), LFW data (lfw.bin) and TFRecord data.

“./output/” stores pre-trained model files and the files obtained after training.

The functions of the Python scripts in the directory are as follows:

data_process.py: To convert the file format from “train.rec” and “train.idx” to “TFRecord”.

train_nets.py: To train the face recognition model.

eval_ckpt_file.py: To evaluate the recognition accuracy using LFW data.

freeze_graph.py: To freeze the model and convert the model format from “.ckpt” to “.pb” for deployment.

inference_extract_pb.py: To perform inference on a single image.

common.py: To define training parameters.

ResNet50.py: ResNet50 architecture.


To convert the file format to TFRecord, we should open the script “data_process.py”:

We assigned the paths “train.rec” and “train.idx” to variables “idx_path” and “bin_path” to generate the training set.

At the top of the script “train_nets.py”, we imported the following packages.

The training hyperparameters were defined as follows:

The optimizer was set to Adam by default.

The TFRecord file, generated in “data_process.py”, carries the training set. We used the TensorFlow iterator to extract the training data from the file each time.

The Softmax Loss function was employed.

Transfer learning was applied to this training. We imported the pretrained_model, and loaded the parameters of all layers except the top one as the initialization parameters. Before that, we had saved the pb file provided by deepinsight (2020) as an npy file.

By executing TensorFlow Session, we performed parameter training on “images_train” (image data of training set) and “labels_train” (labels in training set). The loss (the lower the better) and the accuracy (the higher the better) could be figured out based on the indicators “total_loss_train” and “acc_train”.

The validation accuracy was calculated by “evaluation.py” over each 10 (validate_interval value) times of training.

Once the “np.mean” value (accuracy indicator) was improved, we saved the model in the “.ckpt” format.

It is noteworthy that the MTCNN used here is a pre-trained network provided by GitHub user OAID (2018).


Phase 3: Deployment:

With the well-trained models for mask detection, face detection and recognition, we can deploy them to different applications we have.
We loaded and inferred the abovementioned models using C++ dynamic library provided by GitHub user Neargye (2020), and compiled the AI programs into DLL files to make them applicable to different applications.

For the smart temperature measurement system, we used the FLIR C3 camera for image acquisition. This camera is able to measure the surface temperature of an object by thermal imaging. Despite that the programming language of FLIR C3 SDK is C# and the operating environment is Windows 10, we are able to integrate the DLL files and the SDK to achieve a series of features including image acquisition, face detection, identity verification, mask detection, temperature measuring, result display, etc.10。

Back to Read More




Reference

[1] Guo, Yandong, et al. "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition." european conference on computer vision (2016): 87-102.

[2] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, 2007.

[3] Sandler, Mark, et al. "MobileNetV2: Inverted Residuals and Linear Bottlenecks." computer vision and pattern recognition (2018): 4510-4520.

[4] Open source code. Available from https://github.com/MrCPlusPlus/MobileFaceNet_Tensorflow_Pretrain.

[5] He, Kaiming, et al. "Deep Residual Learning for Image Recognition." computer vision and pattern recognition (2016): 770-778.

[6] Open source code. Available from https://github.com/deepinsight/insightface.

[7] Open source code. Available from https://github.com/OAID/FaceDetection.

[8] Open source code. Available from https://github.com/Neargye/hello_tf_c_api.

Hot Articles

The New Normal: Security Threats on Remote Access

The New Normal: Switching from In-person to Virtual Corporate Events

Cloud Transition Journey - 7 Commandments for Success

SD-WAN – The Opportunities and Challenges

Everything matters! How does Shyndec Pharmaceutical reshape the “modernization” of the network?

Share this post
Select Tags

ALL AI BIG DATA CLOUD DATA CENTERS DATA PRIVACY EDTECH INFOSECURITY INNOVATION OBOR SD-WAN TRANSFORMATION Featured  Business Insights
Related Products

If you would like to learn more about the topic, please leave us your information and we will contact you shortly.

Contact Us

Products & Services
Private Network Information Security Cloud Solutions Cloud Data Center Internet Services Managed Portal Europe & CIS Solutions
Solutions
Architecture, Engineering & Construction Automobile Banking & Finance e-Commerce Logistics and Transportation Manufacturing Professional Services Retail Trading Others
Customer Service
Customer Login Services Hotline Service Center
Resources Center
Product Leaflets New Offering Videos White Paper Success Stories
About Us
Our Company Our Partners News Center Accreditation & Awards Success Stories Videos Contact Us Careers Blog COVID-19
Contact Us

General Enquiry 372 622 33 99

Sales Hotline 372 622 33 60

Contact Us

Sitemap | Disclaimer | Net Abuse Policy | Privacy Policy | Cookie Policy | Terms & Conditions

中信集團品牌認證 | 中信雲賦能
Copyright © 中信國際電訊(信息技術)有限公司 CITIC Telecom International CPC Limited

Follow Us

Welcome to CITIC Telecom CPC

You are about to visit our website

Please choose a location:

South Africa

South America

US & Canada

China Entercom

Asia Pacific
Europe & CIS
South Africa
South America
US & Canada
Need help? Chat with CPC Chatbot
Supported browsers: Latest versions of IE11, Firefox, Chrome and Safari.
Terms & Conditions
Welcome to CITIC Telecom International CPC Limited. Your conversation with CPC Chatbot may be recorded for training, quality control and dispute handling purposes. By clicking “Continue” and using CPC Chatbot, you accept and agree to be bound by our Privacy Policy and give your consent to receive cookies on this site. Read more about our Cookie Policy and Privacy Policy.
Continue