As it was presented in the MIT Technology Review, Facebook’s face recognition article, which was posted and approved on CVPR2014 drew heavy interest.
It achieves almost human-like person recognition accuracy by large-scale learning using 4.4 million face images of 4030 individuals obtained by the company in a process called DeepFace, so it is quite a catchy subject.
It is important for us to know what came before, to learn what came after their application.
How does DeepFace Face Recognition work?
Three acts on detected face rectangles are carried out by DeepFace’s paper:
- Rectangular alignment with two-dimensions
- Out-of-planes synchronization using a three-dimensional model
- To obtain a face expression vector, enter an orientation image into the Deep Neural Network (DNN) and compare two face rectangles to decide if they are the same individual, based on the corresponding face expression vector.
The alignment part of the first half has nothing to do with deep learning.
1. Face Pattern Recognition
Six unique areas of the face (both eyes, the tip of the nose, the corners of both mouths, and the middle of the lower lip) are detected by two-dimensional orientation and the image’s location and size are changed to match.
Using Support Vector Regression, which is a function called Local Binary Pattern, feature point identification is used.
This feature point is further increased to 67 points in the three-dimensional alignment, and the three-dimensional location of the feature point is calculated by corresponding to the corresponding reference point prepared in advance of the normal face 3D model (the one averaging the face 3D model data set).
2. Matrix formation
We approximate the camera matrix on the basis of this which projects the points of three-dimensional space on the camera plane.
The direction of the feature point is determined when taken from the front.
At the same time, polygons with feature points as vertices are created and the pixel values of the original image are refined on the corresponding front image triangle.
The representation of the front face can be derived from the image faced diagonally in this direction.
3. Deep Neural Network specs
Using convolution/pulling layers to capture positional characteristics is typical in image and face recognition.
On the other hand, DeepFace is an alignment of the input image, so there is not much need for the position.
For positional features, low-order characteristics (edges, textures, gradients, etc) are appropriate, but higher-order characteristics (eyes, nose, lips, and larger face parts) are more likely to catch different features for different positions.
After the third sheet, DeepFace uses various weights for each position. Furthermore, except for the first layer, pooling processing is not undertaken.
Pyramid Convolutional Neural Network (CNN)
A related paper, called Face++, also became popular. It used a traditional CNN but had a new learning method.
The network is split into shared and unshared layers, and on the input side, the shared layers are organised.
Second, in order to learn small patches of facial images as input, we can use a one-tier shared layer.
We can only leave part of the resulting weight of the shared layer, raise the shared layer, and learn with a marginally larger patch.
Repeat this to finally think about CNN in depth.
This is a sort of selfish layerwise instruction, but having a trainer and an extra layer on the output side is unique.
As an input, the first one learns a small patch, but then recommends a way to use the unshared layer behind it based on the position of the patch (eyes, nose, mouth, etc.), which raises learning costs but increases precision.
Pyramid CNN focuses on studying characteristics from an identity standpoint on different sections and sizes of the face.
Ending Notes
I found the paper quite interesting, and if you liked reading this article, follow me as an author. Until then, keep coding!