Occlusion output size

Under the ‘Visualising CNN’ section, Occlusion Analysis subsection, the output size has been calculated as int(np.ceil((height-occ_size)/occ_stride)), which looks incorrect.

The calculation has to be similar to output size in convolution operation, since we are essentially doing a similar operation (sliding across image patchwise, and replacing each patch with a single value)

Shouldn’t the correct formula for output calculation be int(np.ceil((height-occ_size)/occ_stride +1)) ?

Just telling my understanding of the width and height taken inside this occlusion function. I hope it also addresses your point.

1.According to you The calculation has to be similar to output size in convolution operation, since we are essentially doing a similar operation (sliding across image patchwise, and replacing each patch with a single value)

  • So For selecting each cell with a partiluar height and particular width , inside the for loop we are defining each cell in this way-
  • h_start = h * occ_stride
    w_start = w * occ_stride
    h_end = min(height, h_start + occ_size)
    w_end = min(width, w_start + occ_size)
*   h_start = h*occ_stride

This line is for moving from one cell to another cell height wise. Since occ_stride is taken to be 50. It moves height wise to 50 units in each iteration.

   *  w_start = w*occ_stride

This line is for moving from one cell to another cell width wise. Since Occ_stride is taken to 50 , It moves width wise to 50 units in each iteration.

    *  h_end = min(height, h_start + occ_size)

This line is setting the height length from the location of h_start. Using min function to make sure height length does not exceed the boundary.

    *  w_end = min(width, w_start + occ_size)

This line is setting the width length from the location of w_start. Using the min function to make sure width length does not exceed the boundary.

  1. Since we are doing these operation on each cell we need to make our output cell similar to it which can be done by doing similar Reverse operation.So output height will be smaller by the ratio of occ_stride and also there is some padding need to be taken care of and that is why we do (height-occ_size). This helps so that we dont lose out pixels at the very edge.

  2. Shouldn’t the correct formula for output calculation be int(np.ceil((height-occ_size)/occ_stride +1)) ?

  • int(np.ceil((height-occ_size)/occ_stride))
    The ceil of the scalar x is the is the smallest integer i, such that i >= x
    so we dont really need to add +1 here.

There’s a problem with the for loop as well, we’ll come to that later.
First let’s address the size of the output. We are computing the output_height and output_weight and then using the same to construct heatmap.
Let’s take an example, tell me if we have an input of 9x9, occlusion size of 3 and stride of 3, what should be the output size ?

Please correct me if my interpretation of the code is wrong but this is how i think the code will work according to the values you have taken as an example.

Input image with dimension 9x9 which means width = 9; height =9.
occ_size = 3 and occ_stride = 3.

output_height = int(np.ceil((height-occ_size)/occ_stride))
= (9 - 3)/3
= 2

output_width = int(np.ceil((width-occ_size)/occ_stride))
= (9-3)/3
= 2

heatmap = torch.zeros((output_height, output_width))
creating a space for heatmap and initialising with zero
| 0 0 |
| 0 0 |

for h in range(0, 9):
    for w in range(0, 9):
        
        #Calculating for first iteration.
        h_start = h*occ_stride = 0 * 3 = 0
        w_start = w*occ_stride = 0 * 3 = 0
        h_end = min(height, h_start + occ_size) = min(9, 0+3) = 3
        w_end = min(width, w_start + occ_size) = min(9, 0+3) = 3

        if (w_end) >= width or (h_end) >= height:
            continue

        input_image = image.clone().detach() #we are doing a deep copy of the image and detaching it,
        input_image[:, :, w_start:w_end, h_start:h_end] = occ_pixel  = 0.5 ##setting the default value of 0.5 to the cell we have selected from the input image. So now it will lose its red,blue , green channel property and it lose it colors and now it will be either whitish or greyish in color.

        output = model(input_image)
        output = nn.functional.softmax(output, dim=1)
        prob = output.tolist()[0][label] #calculating the probabllity for just the label we are interested in. For example in the video we have taken the image of dome.We are seeing what is the probability of its still being a **dome** after we have made changes to the cell selected in first iteration and set its value to be 0.5.

        heatmap[h, w] = prob  

Similarly it will work on all the iterations. and whenever we hit the boundary

if (w_end) >= width or (h_end) >= height:
continue

this code will save us from going out of bounds.
However i think the range should be 0 to output_height and 0 to output_width because for all other values we go out of bounds but it wont really matter as we have already set the boundary conditions.

Your interpretation of the code is perfect and yes range could have output_width and output_height to avoid extra redundant iterations.

But from the explanation of Occlusion, shouldn’t the output size be (3x3), we need to look at every patch of 9x9 image right ? So there would be 3 patches along width and 3 along height ?

Yes,I agree with you. There should be 3 patches but this code will work only for 2 patches of height and width as 3 units.I think for the input image with dimensions 9x9 if we are taking occ_size and occ_stride as 3 and the range of both the for loops should be from 0 to output_height +1 and 0 to output_width +1.

Also the heatmap initialised with zeros should have output_height+1 and output_width+1 as its rows and columns respectively.

And then if we remove eqaual sign (=) from this code line as done below
if (w_end) > width or (h_end) > height:
continue

It will work fine and include three patches along the column width of image and repeat for three times along the row eventually covering all the cells of input image.

Or instead of making these many changes, how about revisiting the output_size formula and modify it to int(np.ceil((height-occ_size)/occ_stride +1)) :stuck_out_tongue:

1 Like

No, seems like i don’t have that option now since I already discarded it with another explanation. :crazy_face:

However, are you sure you want to modify it to int(np.ceil((height-occ_size)/occ_stride +1))

OR

int(np.ceil((height-occ_size)/occ_stride - 1)).

I think the later will work since The ceil of the scalar x is the is the smallest integer i, such that i >= x. :sweat_smile:
And still you will have to remove equal to sign (=) from this line as done below
if (w_end) > width or (h_end) > height:
continue
because we want to include the last iteration and only then we will be able to cover all the cells of input image.