

Let’s say you want to upsample a feature map with dimensions H×W by a factor 2. This can avoid pixel artifacts that may be introduced by other methods, in particular by deconvolution. Pixel ShuffleĪn interesting way to upsample data is to use “pixel shuffle”. OK, but what do these sampling modes actually do? I’ll show you in moment what STRICT_ALIGN_ENDPOINTS_MODE and UPSAMPLE_MODE look like, and when you’d use these in your model.

We’ll conveniently skip over ROI_ALIGN_MODE as I don’t really know what it’s used for (but I suspect it’s intended for the CropResize layer.) That’s the only place they differ - in any other situations, both modes work the same. In “strict” align mode, this would sample from pixel 0 in the source image, but in ALIGN_ENDPOINTS_MODE it samples from the center pixel. What about ALIGN_ENDPOINTS_MODE? Well, there is no real difference between STRICT_ALIGN_ENDPOINTS_MODE and ALIGN_ENDPOINTS_MODE, except when the output tensor is just one pixel wide (or tall). That means you can use Upsample if you have an integer scaling factor and ResizeBilinear if you don’t. When you use UPSAMPLE_MODE with a ResizeBilinear layer, it gives exactly the same results as the Upsample layer in bilinear mode.

The two modes we’re going to look at in this blog post are STRICT_ALIGN_ENDPOINTS_MODE and UPSAMPLE_MODE. The difference between these sampling modes is in how they determine which pixels to read from the source tensor. UPSAMPLE_MODE - gives same results as Upsample layer.This sampling mode is actually very relevant to our investigation. ResizeBilinear doesn’t have an option for nearest neighbors, but it does allow you to select the sampling mode that will be used by the bilinear interpolation.

The Upsample layer doesn’t have many options, it only lets you choose between NN (nearest neighbor) and bilinear interpolation. And of course, ResizeBilinear can also scale down. This is typically done using a conv layer with stride 2 or using pooling layers. This blog post is mostly about upsampling, but convolutional neural networks also have various ways to downsample feature maps. Other than that, they do pretty much the same thing. The biggest difference between these layers and an API such as Core Image’s CILanczosScaleTransform or Accelerate’s vImageScale, is that they work on feature maps that may have many more channels than the 3 or 4 channels in a regular RGBA image. I’m not going to talk about it in this blog post this layer works the same way as ResizeBilinear when it comes to resizing.
