oreorite.blogg.se - Rectlabel yolo

The sliding window is handled by the convolutional nature of the RPN, which allows it to scan all regions in parallel (on a GPU).

How fast can the RPN scan that many anchors? Pretty fast, actually. In practice, there are about 200K anchors of different sizes and aspect ratios, and they overlap to cover as much of the image as possible. Which are boxes distributed over the image area, as show on the left. The regions that the RPN scans over are called anchors. The RPN is a lightweight neural network that scans the image in a sliding-window fashion and finds areas that contain objects.

Simplified illustration showing 49 anchor boxes I’ll continue to refer to the backbone feature map as if it’s one feature map, but keep in mind that when using FPN, we’re actually picking one out of several at runtime. We pick which to use dynamically depending on the size of the object. the top layer of the first pyramid), in FPN there is a feature map at each level of the second pyramid. RPN introduces additional complexity: rather than a single backbone feature map in the standard backbone (i.e. Our implementation of Mask RCNN uses a ResNet101 + FPN backbone.Ĭode Tip: The FPN is created in MaskRCNN.build(). By doing so, it allows features at every level to have access to both, lower and higher level features. The Feature Pyramid Network (FPN) was introduced by the same authors of Mask R-CNN as an extension that can better represent objects at multiple scales.įPN improves the standard feature extraction pyramid by adding a second pyramid that takes the high level features from the first pyramid and passes them down to lower layers. While the backbone described above works great, it can be improved upon.