I have performed several tests to test the PV test... My conclusions are that the PV test is not fundamentally wrong but for the numerical experiment with very weak forcing and velocity (see *Test of the PV test*), numerical errors included in the PV test (mostly due to unavoidable spatial averages) are in the order of the expected changes in potential vorticity (PV). Furthermore, these expected changes are also close to numerical precision: PV is in the order of 1e-8 1/(m.s) and the changes of PV over 1 day are in the order of 1e-14, which makes difficult to see any significative change over 100 days.

However, I did correct the test: the Coriolis parameter was not the one used in the experiment and output of averaged PV is now used. I re-performed the test on an experiment (exp2_t) where the velocities have a realistic amplitude (10-20 cm/s) and the result is shown in Fig. 1, with Fig. 2 showing the trajectories.

A priori, the estimates from the 1-day averaged output (dots in Fig. 1) should be more accurate but I would consider that the differences with their respective estimates from snapshots (dashes) gives an upper bound for the numerical error made during the test. In this experiment, PV is again in the order of 1e-8 and the difference in PV due to computational error is in the order of 1e-11, that is one per mille. For all the parcels but the one at 10°E and 27°N, the changes of PV over one day is in the order of 1e-10, that is one per cent: in these cases, it seems that the test is reasonable and shows that PV is almost conserved but not exactly. For the parcel at 10°E and 27°N, the changes are in the order of 1e-11, which is in the order of computational error, consistent with the observation that the test does not seem to be reasonable.

The remaining question is then: for the parcels where we think the test is meaningful, how much is the difference between the red and blue lines meaningful (that is due to the failure of the model to respect the PV equation) and how much is not (that is due to error in the PV test itself)? I have one estimate of the error made with the PV test (the difference between estimates computed from output and estimates computed from snapshot), but is this estimate conservative or not?

*A priori*, an experiment with a resolution as high as we want should not only reduce the numerical errors made during the computation of the PV test itself but also the numerical errors that violate the PV equation. Do you agree on this fundamental statement? If true, then, the problem is the following: if lack of resolution implies numerical errors in not only the PV equation but also in the PV test, how can we distinguish confidently between which causes the failure of the PV test?