Dealing with large image size in pix2pix

I am using a pix2pix model for a project where for training purpose i am using image size of 256x256

but now after my model has been trained, i have a test image which has size of 1024x1024 and resizing it to 256x256 will decrease its quality. is there any method to run my model on this 1024x1024 image.

I had read somewhere that we can stride a window of 256x256 over the 1024x1024 image and then stitch the output images for each images together. but then I think there will be no smoothness in the border of this stitched output images. Any suggestions on how to do it?

I don’t think stitching outputs is a good idea.
For this, you might want to train some upsampling-downsampling network separately, which you can stack before and after pix2pix.
Even any other interpolation method may help, but for quality issues, you can train this other network.

my generator is a u net architecture and is fully convolutional so it must be independent of the image size. So while testing will it produce a good output on larger images as well?

also i have came across a paper : https://arxiv.org/pdf/2008.08579.pdf

in this his input image size was 512x512 and while testing he was using some 5120x5120 so he strided a window of 512x512 with stride of 256 and then in the overlap region he is calculating the average using a gaussian weight distribution(mentioned in his paper section: 2.5) here i am having some doubts:
1. He mentions that we will get an overlap of 19x19 but I think the overlap will be of 256x256

2. Also the weighting function which he mentioned is exp(−d ^2/2σ ^2 ) where d is the distance of the pixel in the overlap from the patch centre. So is it like calculating the Euclidean distance between 2 pixels in 2d space?

  1. ((5120-512)/256)+1 = 19 right?
  2. I might have to look over this weighting function, can you point out that in the shared paper? A screenshot/equation number would help.

The calculation which you have mentioned in point 1 is to calculate the output size after convolution but here for each corresponding 512x512 we get a 512x512 output. And with stride of 256 i think we will get an overlap of 256x256 between two adjacent windows.

Yes, but as it says O/P will be 19x19, not 512x512 right?

Oh I understood this by using a toy example shown below

now in the fig below the 2x2 overlap at the centre of all the patches(windows) corresponds to the 19x19 overlap mentioned in the paper.

but what about the other overlaps of 2 adjacent patch outputs (red - green , green - yellow, yellow - blue, blue - red ) each of size (2x2).

Aren’t we ignoring these 4 overlaps when we consider only the central (combined) overlap only?

So can’t we calculate the weight for each overlapping pixel (centre overlap + adjacent overlap) using the formula they have mentioned.
suppose there is a pixel which has 2 overlaps and we calculate the weights based on the distance of the pixel from its corresponding patch centre ) lets say w1 and w2 and then we calculate the value at that location as “(w1*p1 + w2*p2)/(w1+w2)" and in case of overlap caused by 4 patches we will have 4 weights.
Is this approach right?