Readers are referred to the following documents, which contain compliance measures: If statistical significance is not a useful guide, what is Kappa`s order of magnitude that reflects appropriate correspondence? The guidelines would be helpful, but other factors than the agreement may influence their magnitude, making it problematic to interpret a certain order of magnitude. As Sim and Wright have noted, two important factors are prevalence (codes are likely or vary in probabilities) and bias (marginal probabilities are similar or different for both observers). Other things are the same, kappas are higher when the codes are equal. On the other hand, kappas are higher when codes are distributed asymmetrically by both observers. Unlike probability variations, the effect of distortion is greater when Kappa is small than when it is large. [11]:261-262 Statistical methods for assessing compliance vary depending on the nature of the variables examined and the number of observers between whom an agreement is sought. These are summarized in Table 2 and explained below. If two instruments or techniques are used to measure the same variable on a continuous scale, Bland Altman plots can be used to estimate match. This diagram is a diagram of the difference between the two measurements (axis Y) with the average of the two measurements (X axis). It therefore offers a graphic representation of distortion (average difference between the two observers or techniques) with approval limits of 95%. These are indicated by the formula: On the surface, this data may appear accessible for analysis with methods for 2 × 2 tables (if the variable is classified) or the correlation (if numerical) that we have previously discussed in this series.
[1.2] However, further examination would show that this is not true. In these two methods, the two measures relate to different variables for each individual (for example. B exposure and result, height and weight, etc.) whereas, in the `agreement studies`, the two measures refer to the same variable (for example). B, breast x-rays, measured by two radiologists or hemoglobin using two methods). Let us now consider a hypothetical situation in which examiners do exactly that, i.e. assign notes by throwing a coin toss; Heads – pass, tails – Table 1, situation 2. In this case, one would expect that 25% (-0.50 × 0.50) of the students would receive the results of both and that 25% of the two would receive the „fail” grade – a total approval rate „expected” for „not” or „fail” of 50% (-0.25 – 0.25 – 0.50). Therefore, the observed approval rate (80% in situation 1) must be interpreted to mean that a 50% agreement was foreseen by chance. These auditors could have improved it by 50% (at best an agreement minus the randomly expected agreement – 100% 50% – 50%), but only reached 30% (observed agreement minus the randomly expected agreement – 80% 50% – 30%).
