
problem with old approaches is finding effective loss ( what to ) as CNN use Ecludian distance which can be easily minimised by avg all possible outputs resulting in blurry results.
coming up with a loss/objective function is hard.
Approach
- use GANs as objective can be to generate realistic undistinguishable results
- Gan learns “structured loss” which penalizes the joint configuration of the output.
Related
- cGANs (labels, text, images) for applications like painting, future frame prediction, style transfer
Architecture

- Generator uses U-net … Discriminator uses convolutional patchGAN classifier which penalizes structure @ scale of image patches.
Loss function

add l1 loss (less blurry) discriminator job remain same, while generator has one more
Tasks
1- fool discriminator
2-be near ground truth output
without Z .. net would map x=>y but deterministic ie: match no distinction, thus use Gaussian noise
Open-problem: cGAN that produce stochastic output to capture full entropy of the conditional distribution they model
Generator:
U-net with skip-connections allows input & output to share info directly eg:) image colorization tasks —input & output share information of prominent edges
Discriminator
L1&L2 produce blurry results on image generation, loss fails to encourage high-frequency crispness but capture the low-frequency accurately thus we only focus on high-frequency and use the l1 framework for low-frequency
PatchGAN: penalize structures at the scale of patches… classify if NXN patch is real or fake. runs as a convolutional across the image averaging the response to provide the ultimate output of D
more science : D models image as Markov-Random-Field\
cGan makes image more color, same as L1 ake it lurry when uncertain,
l1 when uncertain incentive average
Novelty
- cGAN
- U-net for generator
- PatchGAN for discriminator
- L1 loss in addition to the loss