Interpreting Statistics View

I am running @Birdstream’s super-resolution model from the Show the Community forum post here

(Performance update: 33% training in about 2h50 min)

The IoU (aka Jaccard Similarity) is indistinguishable from 1 after 32% training (actual 0.993) but the predicted image is noticeably noisy. the loss over all epochs is small but non-zero. Can anyone shed some light on how the IoU is being calculated here?

Looking at global loss we also see this

Epoch 0 at the left of “Loss over all epochs” - is that 0 based counting?

And… validation loss = 0 at epoch 1. How is that possible? I believe the loss used is quadratic, rather than cross-entropy, and all losses must be +ve so no cancellation is possible.

All insights welcome!

As for the first question i really can’t say but for the second i believe that you currently are training on epoch 1 (the 2:nd epoch) and there is no validation loss calculated yet so it just display it as zero :slight_smile:

Yeah, you could be right…

On the one hand: been here before, it hasn’t been calculated yet (for this epoch).

On the other: training and validation are then handled differently, which confuses the easily confused (like me).

I prefer the numpy fillnan with previous approach - then at least it doesn’t mislead quite as much.

Jumping in (a little late) here.

The version of the tool that @birdstream uses groups masks and images into the same image datatype. This means that the statistics you are seeing are better fit for a segmentation problem than a image generation problem, so IoU should not be what you want to measure here as it calculates the overlap of different classes. Standard pixel accuracy would be better here.
This will be fixed with the upcoming update where we break out Mask into its own datatype instead of using Image for it.

As for why validation drops to for every new epoch, @birdstream was right there, before you get to the validation phase it counts validation as 0 because it does not exist any records of it for that epoch.
Training is technically handled the same way, but as training is calculated right away in the first iteration you will never see the 0 drop.
We do have a task on our list to let validation take the previous value instead of defaulting to 0 though, to make this less confusing.
Another approach (a preferred one perhaps) would be to just not show the validation until it starts getting some data for it. Unfortunately this seems to be trickier with the chart library we are using, so we defaulted to just showing the past value for the upcoming task.

Hope that helps!

Explain that again to me offline and then maybe I can advance my understanding a bit more - I don’t yet see where masks enter this. And: is the choice is statistics made with the model type (and not alterable)?