This is judged by the person using the scale.

The user studies the scale then provides a subjective judgment regarding its
apparent value based on criteria such as length, item labels, modes of response etc.

This is a simple superficial judgement based on the user's opinion.

It is not because a tool appears valid that it actually is!


A qualitative phase which should be carried out at the start of the development of a scale. It consists of determining the items which make up the scale through a survey of experts. Patients are now considered as experts.


The construct is evaluated by both the scale which is being tested and by a reference criterion (Gold Standard). A sufficient number of subjects are evaluated independently by the two tools and then the strength of the statistical relationship between the two is evaluated using a correlation coefficient.

The validity is said to be concomitant when the evaluations are carried out simultaneously.

Validity is said to be predictive when the construct is first measured by the scale, then by a reference criterion several weeks or months later.


The aim is to reinforce the validity of a scale which measures a particular construct by correlating the scale to successive theoretical hypotheses.

Convergent validity: refers to the degree to which the scores of the tested scale are correlated with other variables in the same domain.

Divergent validity: refers to a low correlation between the scores of the scale and theoretical variables from a different evaluation field.

Discriminant validity: refers to the capacity of the scale to obtain significantly different scores between two populations with different characteristics.


The intra-rater reliability of a test relates to the stability of the scores obtained by a rater when he/she carries out the test on two separate occasions. A single rater tests each patient twice (or more) with several days in between each test. The patient's state must remain unchanged during this time.

1) Quantitative measure:
- Intraclass correlation coefficients (ICC)
- Bland and Altman method (fidelity between two raters)

2) Qualitative measure: Two coefficients, Kappa or weighted Kappa


The inter-rater reliability of a test describes the stability of the scores obtained when two different raters carry out the same test. Each patient is tested independently at the same moment in time by two (or more) raters.

1) Quantitative measure:
- Intraclass correlation coefficients (ICC)
- Bland and Altman method (fidelity between two raters)

2) Qualitative measure: Two coefficients, Kappa or weighted Kappa


The test-retest reliability of a test describes the stability of scores obtained by a patient when he/she is evaluated on two separate occasions. This appears similar to intra-rater reliability but in this case, the patient self-evaluates him/herself (for example a pain-rating scale).

1) Quantitative measure:
- Intraclass correlation coefficients (ICC)
- Bland and Altman method (fidelity between two raters)

2) Qualitative measure: Two coefficients, Kappa or weighted Kappa

Internal consistency:

Cronbach's Alpha coefficient is used to measure the internal consistency of a group of items relating to a single clinical domain: in other words, how the items correlate with each other.

The internal consistency is tested during the development of the scale.

Responsiveness :

A tool is said to be sensitive to change if it can precisely measure increases and decreases in the construct measured.

This is important for tools which are used to evaluate changes following a therapeutic action. The aim is to measure the capacity of the scale to detect small but clinically significant changes.

When an outcome measure is sensitive to change, the score increases as the patient improves, decreases as the patient worsens and does not change if the patient's state remains stable.

Finding a clinical assessment scale - Physical Medicine and Rehabilitation

échelle d'évaluation médicale - Psychometric properties

How to choose the right scale:

(By Pr. Serge Poiraudeau)

Self-reported evaluation scales are used to analyse and quantify concepts such as pain, function, expectations, fears and beliefs, self-esteem, adaptation strategies, anxiety, depression etc.

Such scales therefore measure very different concepts but they must have similar psychometric qualities.

The most important quality is content validity which ensures that what is evaluated is that which is important for the patient.

This psychometric quality, often neglected in the development of evaluation scales from the 1980's to 2000 was brought to the forefront by the Food and Drug Administration who demand that a qualitative evaluation be carried out with patients before the development of an evaluation scale.

Without this phase, a self-reported scale can theoretically not be the principal outcome measure in a study.

Once the qualitative analysis has been carried out, the important items are usually selected by a method of expert agreement (Delphi), including patients.

When the items have been selected, the construct validity is assessed (verification that the tool measures the concept which it is presumed to measure) as well as reliability and the capacity to detect a significant change (effect size, standardized response mean).

These validation phases are long and few scales, even those which are widely used, have been tested very rigorously.

The aim of this website is therefore to propose references to researchers and clinicians regarding the validity of the scales which they wish to use.
The aim is not to find the ideal scale but to choose those for which the psychometric properties have been the best evaluated.

4 quality criteria:

1) Validity :

- Face (more info)

- Content (more info)

- Criterion (more info)

. Concurrent validity

. Predictive validity

- Construct (more info)

. Convergent validity

. Divergent validity

. Discriminant validity

2) Reliability :

- Intra-rater (more info)

- Inter-rater (more info)

- Test-retest (more info)

3) Internal consistency : (more info)

4) Responsiveness : (more info)

Reference :

The psychometrical properties described
on this website are based on the procedure proposed in the following article:

Fermanian J.
[Validation of assessment scales in
physical medicine and rehabilitation:
how are psychometric properties determined?]. Ann Readapt Med Phys.
2005 Jul;48(6):281-7. Epub 2005 Apr 25.

Generally, these scales measure:

1. Impairment: a symptom (pain, strength, spasticity etc.)
2. Function: the impact of impairment on activities of daily living
    (difficulty walking, climbing stairs, getting dressed etc.)
3. Disability: the impact on social life
    (going to work, shopping, sports activities etc.)
4. Quality of life: happy or unhappy

For example: I am paraplegic

1. Impairment: lower limb paralysis
2. Function: difficulty to go down stairs
3. Disability: I cannot shop independently
4. Quality of life: I am not unhappy because a
    carer does my shopping

. . . . . . . . . . . . . . . . .     Copyright © 2012 Cytisco - Web agency. All rights reserved     . . . . . . . . . . . . . . . . .