Skip to main content
Open Access Publications from the University of California

Dermatology Online Journal

Dermatology Online Journal bannerUC Davis

Reliability of self-assessed reading of skin tests: A possible approach in research and clinical practice?

Main Content

Reliability of self-assessed reading of skin tests: A possible approach in research and clinical practice?
Magnus Falk1, Chris Anderson2
Dermatology Online Journal 16 (2): 4

1. The Research and Development Unit for Local Health Care, County of Östergötland, Sweden.
2. Department of Clinical end Experimental Medicine, Division of Dermatology, Faculty of Health Sciences, Linköping University, Sweden


In the investigation and management of skin disease, various testing protocols are of importance. The extent to which clinical judgments and decisions on therapy are supported by the performance of such testing can be affected negatively by the lack of time and resources for the performance of tests. In the present study, the possibility of utilizing self-reporting by subjects is investigated. Determination of irritation threshold for sodium lauryl sulphate (SLS) and minimal erythema dose for ultraviolet B were chosen as suitable self-reading protocols. Test reading by 26 subjects instructed in “present” or “absent” reporting of test reactions were compared to trained observer reading. Absolute agreement was found in 76.9 percent of the SLS reactions and in 85 percent of the UVB reactions. Weighted Kappa for the agreement between observations showed values of 0.76 for the SLS reactions and 0.83 for UVB reactions. We conclude that use of the protocols here studied, and other test protocols modified to accommodate a binomial assessment outcome (“+” or “-”), could well lead to an increase in the performance of skin testing. This could be a qualitative advantage for diagnosis and management of skin diseases. Additionally, population studies and even prevention initiatives could be facilitated.

I. Introduction

Skin testing has an important role in the diagnosis and management of skin diseases. Testing protocols, however, consume time and resources for both patients and medical staff. Therefore, it is possible that testing is done less often than would be ideal.

The skin tests most commonly used in clinical practice are the Type I allergy test, the epicutaneous patch test, and the minimal erythemal dose (MED) phototest. In skin testing for detection of Type I allergies, reading is done 20 minutes after pricking the skin with the allergen and requires the measurement of the area of the wheal. This is, of necessity, done at the time of the test provocation and there is little advantage in patient reading [1, 2, 3]. The classic epicutaneous patch test, first used over a hundred years ago, involves topical application under occlusion for 48 hours of standardized or patient-specific substances to intact skin in order to detect contact (type IV) allergy. The assessment of patch test reactions is quite complex, since allergic responses need to be differentiated from irritant reactions. Detailed scoring based on the morphology of the reactions [4, 5] is required, which necessitates reporting by a trained observer – usually an experienced dermatologist. In light sensitivity testing (phototesting), the assessment reading is performed 6-24 hours after administration of a known dose of broad band or narrow band light and is much less complicated, being based on the determination of the presence (“+”) or absence (“-”) of a reaction on a provoked skin area, in order to decide the minimal erythema dose (MED) [6, 7, 8, 9, 10]. A fourth type of test is an extension of epicutaneous testing - the assessment of skin irritation susceptibility. The procedure uses increasing concentrations of a known irritant and seeks to establish a threshold for reaction [11]. Existing protocols vary greatly both in time of application and reading but in general, reporting of the detailed morphology of the test reactions is less critical than for allergic reactions [4, 11, 12, 13].

In all four situations described above, the various test procedures are relatively time consuming, requiring an involvement of both nurses and doctors, as well as return or even repeated visits to the clinic. As a result, the number of patients tested tends to be restricted to those with very specific or more serious conditions. Not least in the case of patients with different kinds of eczema, a broader mapping of the individual's sensitivity to irritant agents and to UV light, would be valuable. It is possible that developments in the concept of innate immunity, which includes both irritant and UV provocation [14, 15], will also increase interest and relevance of these tests. One way to save clinical resources and the patient's time would be to make it possible for the patients themselves to perform the final reading of the test and then report the result to the clinic. The information obtained could be of interest in all aspects of the patient's management, from diagnosis to treatment to prevention. A prerequisite for such a procedure would be protocols with simpler, binomial assessment based on “+” or “-”, without any element of detailed scoring. The outcome of the test would be reported to the treating doctor by postcard, email, telephone, or fax. The major issues in this matter are whether the patients’ own “self reading” of skin tests would be reliable enough to be of clinical value and if protocols can be modified to accommodate binomial reading. Of the four skin test types itemized above, two reactions, the MED phototest and irritancy threshold protocols are those most suitable for the simpler “+” or “-” assessment paradigm necessary for subject reporting/reading.

The aim of the present study was to compare, in a study situation comparable to a clinical situation, self-reading by untrained subjects of the outcome of a UVB light MED test and a sodium lauryl sulphate (SLS) irritant patch test with reading by a trained observer.

II. Methods

This research was performed as part of a project to further develop phototesting in the continuum of patient care. The data analyzed was collected following standard medical practice for testing at the Department of Dermatology, University Hospital in Linköping.

A. Subjects

The test group consisted of 26 subjects, 13 male and 13 female, all medical students in the preclinical stage of education, between 21 and 32 years of age. They were all fully informed before giving assent to their voluntary participation. Ten of the subjects had a previous or present history of eczema (self-reported), but all skin tests were performed on normal (unaffected) skin. According to Fitzpatrick's classification [16] one subject had skin type I; 11 subjects had skin type II; 13 subjects had skin type III; one subject had skin type IV.

B. Study design

The skin test procedure was performed at the Department of Dermatology in Linköping, following the standard local routines for UVB and patch test provocation to assess irritant threshold. The protocol included a provocation and an assessment (reading) session 25 hours after provocation. Each of the two tests was read and reported independently by the subject prior to the assessment by the trained observer.

C. Provocations

1. SLS patch test

SLS was used in four concentrations; 0.5, 1.5, 3.0, and 6.0 percent. Twenty μl of each solution was applied to filter paper on aluminium discs (8 mm Finn-chambers) to the ventral surface of the left upper arm of each subject, within an area of 40 x 40 mm. The patch test was removed by the subject after 24 hours.

2. UVB test

The UVB test equipment (Oriel Corporation, Stratford, CT, USA) included a UV-lamp (HBO 200W/2) and the optical filters WG305 and UG5 (Schott Glass Technologies Inc., Durea, USA). Irradiance of the UVB field was determined with a Dexter 2M thermopile detector (Model 2M, Dexter Research Center Inc., Dexter, MI, USA). In the UV light beam produced, UVB was the predominant wavelength. UVA was also present, though in amounts below the biological effects of UVB. The UV beam was transferred through a liquid light guide and projected as a collimated beam to four separate, 1 cm in diameter areas, on the ventral surface of the right upper arm of each subject. The provocation times were 3, 4, 6, and 9 seconds, corresponding to doses 42, 56, 84, and 126 mJ/cm².

D. Reading the tests

1. Self-reading

Subjects were instructed to report the tests by counting the number of visible skin reactions on each test area (i.e., left and right upper arm), after 25 hours (one hour after removal of patches). A skin reaction was defined in the written instruction as any visible change in skin morphology. The results were written down by the subjects on a postcard and then sealed in an envelope (for blinding purposes against the trained investigator); the envelope was not opened until the data evaluation phase of the study.

2. Investigator reading

Immediately after self-reading, both tests were examined by one of the authors (MF). Apart from counting the number of visible skin reactions, the investigator also scored the reactions in detail as follows:

SLS patch test reactions were scored on a 5-graded scale for each of the following parameters: erythema, edema, scaling, papules, vesicles, and crusts. The criteria for grading were: “-” designated a negative reaction; (+) was “barely perceptible” and 1+ to 3+ were positive reactions of increasing intensity.

The UVB test reactions were scored as “-” = negative, (+) = barely perceptible erythema, 1+ = erythema with a clear border, 2+ = erythema and edema, 3+ = erythema, edema and papules, 4+ = erythema, edema and vesicles.

E. Statistical analysis

The agreement between investigator and self-reading, according to the number of positive reactions on each subject, was estimated in percent, but also by calculation of weighted Kappa. The statistical benefit of Kappa is that it takes into account the possible agreement that could happen by chance. The following guidelines for interpretation of kappa value are usually used: 0-0.20 = poor agreement, 0.21-0.40 = fair agreement, 0.41-0.60 = moderate agreement, 0.61-0.80 = substantial agreement, 0.80-1.00 = almost perfect agreement [17].

III. Results

Both UVB and SLS patch tests were well tolerated by all subjects. A majority described mild itching from the patch test but all subjects could complete the test. No discomfort was reported from the UVB-test.

A. SLS patch test

All subjects showed reactions to the highest SLS concentration as can be seen in Table 1. With decreasing SLS concentration there was, as expected, a decrease in the proportion of subjects reacting. In those subjects presenting only 1 or 2 reactions (i.e., to the two highest SLS concentrations) there was an absolute agreement between self-reading and trained observer reading. In the case of subjects with 3 or 4 reactions, there seemed to be a slight under-reporting by subjects, but in no case was the difference between subject and trained observer reading more than one reaction. For the whole material there was a 76.9 percent absolute agreement between trained observer and subject self-reading. The weighted kappa value was 0.76 (i.e., “substantial agreement”), with the 95 percent CI 0.58-0.94.

In order to illuminate possible underlying reasons for differences between trained observer and self-reading, the details of the trained observer scoring is also presented in Table 1. The predominant morphologic feature was erythema, which was present in all reactions but four. The four reactions without erythema occurred in different subjects and for all four concentrations of SLS. In regard to the number of positive reactions that were reported, there were 6 cases of under-reporting at self-reading (grey area in Table 1). In 4 of these cases the reaction was scored by the trained observer as “(+)”, i.e “barely perceptible.”

B. UVB test

The outcome of the UVB test is presented in Table 2, including trained observer scoring of the reactions. The two lowest UVB-doses provoked were not sufficient to cause erythema in any of the subjects. The highest UVB-dose 17 (126 mJ/cm²) caused reactions in 17 subjects and 10 of these also responded to the second highest UVB-dose (84 mJ/cm²). The remaining 8 subjects presented no reaction to any of the given UVB-doses. Absolute agreement between investigator and self-reading was found in 85 percent of cases. The weighted kappa value was 0.83 (i.e., “almost perfect agreement”), with the 95 percent CI 0.67-0.99. As seen for the patch test, a slight under-reporting by the subjects was present (marked in grey in Table 2).

IV. Discussion

The relatively good agreement between trained observer assessment and subject self-reading seen in the present study is encouraging for the concept of a broader use of subject self-reading. The study demonstrates two examples of skin testing protocols in which patients themselves would be able to reliably perform the reading and reporting of the test result. For both test types, agreement with control readings by a trained observer was substantial, although there was a tendency for the subjects to under-report reactions that were weaker. This was, in many cases, probably due to the actual morphology of the under-reported reactions (irregular erythema, sometimes annular, as well as other findings such as scaling and edema). Especially in the case of the SLS patch test, there was a marked proportion of the under-reported reactions that consisted of “barely perceptible” findings. This occurred more commonly for weaker reactions, which were difficult for even a trained observer to detect, and perhaps not of decisive importance for the estimation of skin sensitivity to the irritant. For the UVB-test, the better agreement between subject and trained observer reading, compared to the SLS patch test, was probably due to more homogenous reactions in general. Erythema was present in all cases, mostly with a homogenous distribution over the whole of the illuminated area.

The previous literature on self-assessment of skin testing is limited. In several studies of tuberculin tests, results vary from inadequate agreement levels [18, 19] to consistently reliable [20, 21, 22], in part probably explained by subject material- or instruction-related differences. The tuberculin test, however, has the same binomial “+” or “-” characteristics, which is also in these studies and is noted to be an important attribute for self-assessment protocols.

Just as important as the binomial endpoints, are various aspects of the test-reading situation. Particularly, the reading instruction to the observer can markedly affect the outcome of the test. An example is the finding by Lock-Anderson et al. [23], that better inter-observer agreement was obtained when the MED was defined as the lowest light dose inducing a “barely perceptible erythema” rather than “erythema with a well-defined border.” In the present study the information given to the subjects was to report “any” reaction that was visible without further qualifying characteristics. The same principle was applied in the SLS patch test; any visible change in skin morphology was defined for the self-reader as a positive reaction to be reported.

The present study design was intended to mimic a possible situation in clinical practice. Whilst various test reading intervals could have been chosen, the 24 hour reading concept simplifies the procedure in regard to instructions and compliance by the subject. The instructions stated “Read this test at X o'clock tomorrow and mail the result on this postcard.” However, despite the relatively simple test reading procedure and instructions, an issue to comment on is the choice of study population, consisting of medical students. Indeed, they were all in the pre-medical stage of education and had not yet been trained in examining skin lesions, but due to age and educational level they are unlikely to be representative of a normal patient population. This circumstance may have affected the outcome. This is an important limitation of the study; therefore, an extended study, based on more typical patients and investigating individual assessment of a patient’s ability to participate in this type of testing would be valuable.

Expert reading by a trained observer, usually a dermatologist, is considered the “golden standard” in the assessment of clinical skin tests. Although this may consist of a more advanced scoring system – as in the normal case of an epicutaneous patch test – the difference between investigator reading and self-reading reflects an inter-observer variability that is known also to be present between trained observers. The level of agreement found in this study should thus be compared to the levels found in studies on inter-observer agreement among trained observers. In a study on patch test reactions Bruze et al. [24] found an 82 percent agreement between 5 dermatologists when deciding on positive or negative reactions. In the more detailed scoring of the reactions, deviation varied depending on morphological feature, from 11.7 percent for erythema and for infiltration, to as much as 30.8 percent for papules, but the overall agreement was good. In the previously cited study by Lock-Anderson et al. [23], inter-observer variability among 8 dermatologists on phototest reading, showed considerable differences in estimation of MED. Taking such results in consideration, the observer agreement found in the present paper suggests that patient-performed reading of a relatively uncomplicated skin test protocol based on binomial reporting of outcome is to a surprising extent comparable to that of the trained observer. This ability of subjects represents a resource that could be used in patient care. Testing with patient self-reading could well provide more concrete data on the patient's skin sensitivity than patient “recall” as to skin attributes [25, 26], in regard to UV sensitivity or skin irritation. An abnormal result could indicate a need for further, more detailed investigation, such as a more extensive light testing or patch test seriesand laboratory investigation.

Also of interest are advances in teledermatology, for which a limited number of studies have looked at the ability of patients to take their own digital photographs to be used as a complementary tool in the clinical assessment and monitoring of different dermatolgical conditions such as psoriasis [27, 28, 29]. Furthermore, in rural population areas that are far from medical centers, various forms of self-assessment strategies might be valuable. Patient-performed test reading is not meant to replace expert reading and is of little value in the absence of an expert clinical evaluation. However, it may serve as a useful complement in the clinical situation. It is important to remember that in traditional clinical situations, phototesting and patch testing are usually performed on the back or on the buttocks, areas out of reach for the patient to perform self-reading.

As well as being desirable in the investigation of patients with dermatological problems, a broadened use of phototesting might also have a place in epidemiological studies on variability in a normal population, patient educational situations, or prevention initiatives, in which the patient’s self-reading of a test to show his/her individual sensitivity can reinforce preventative information. In order to reduce inter-observer variability and to maintain sufficient reliability, future development of skin test protocols for self-reading by subjects should, to the extent that it is possible, aim at provocations generating homogenous reaction fields, sharp borders, and the possibility of a binomial “+” or “-” classification. In a separate publication we will report the clearly superior ability of naked-eye readings of UVB reactions to discern sharp erythema borders in comparison to indistinct borders [30]. That a simplified test reading assessment can improve observer agreement has, again, also been shown in the area of tele-dermatology, an approach that, as for self-reading, has the purpose of broadening the use of skin tests. In a study on allergy patch test reading from photographic images of test reactions, Ivens et al. could demonstrate a markedly better agreement when test reading criteria were reduced from six to three categories [31].

Examples of protocols, not studied in this publication, which might be candidates for self-reading are the 24 hour reading of Type I allergy tests and the “dilution series” in epicutaneous patch testing. In the case of Type I testing, a “late reading” by the patient the day after the test might uncover possible “late phase responses,” which might indicate combined allergy (of more relevance to eczema than a standard Type I response). At the moment no accepted clinical protocols exist for this. The dilution series is usually performed after the standard reading of the epicutaneous test, in which the morphological assessment of an individual reaction is doubtful. The essence of the dilution series is that an irritant reaction usually disappears with dilution of the provoking substance. If reading shows only one or two reactions of the normally applied four concentrations, allergy can probably be excluded. A patient might well be able to self-read such a test. Adoption of such a protocol would result in further investigation of large numbers of doubtful reactions, which are at present left not fully investigated.

V. Conclusion

In conclusion, patient-performed reading in the two skin test protocols chosen for this study actually seems to be a fairly reliable method for collection of test data, giving an outcome comparable to those of a trained observer. Self-reading of test protocols with a result communicated by post, email, telephone, or fax can obviate the need for return visits and thus could increase the frequency with which dermatological patients could be subjected to such actual testing. Although further investigation in this matter is needed, changes in test protocols over the whole range of skin testing to focus on binomial reporting outcomes could facilitate improved characterization of our patients within the physical and financial restraints of the current health care setting.


1. Meinert R, Fischer T, Kuehr J. Influence on skin prick test criteria on estimation of prevalence and incidence of allergic sensitization in children. Allergy 1994; 49: 526-532. [PubMed]

2. Chinn S, Jarvis D, Luczynska CM, Lai E, Burney PGJ. Measuring atopy in a multi-centre epidemiological study. Eur J Epidemiol 1996; 12: 155-162. [PubMed]

3. William A, McCann MD, Dennis R, Ownby MD. The reproducibility of allergy skin test scoring and interpretation by board-certified/board-eligible allergists. Ann Allergy Asthma and Immunol 2002: 89:368-371. [PubMed]

4. Mowad CM. Patch testing: pitfalls and performance. Curr Opin Allergy Clin Immunol; 2006: 6:340-344. [PubMed]

5. Bruynzeel DP, Anderson KE, Camaras JG, Lapachelle JM, Menne T, White IR. The European standard series. Eurpean Environmental and Contact Dermatitis Group (EECDRG). Contact Dermatitis 1995; 33(3):145-148. [PubMed]

6. Diffey BL, Farr PM. Quantitative aspects of ultraviolet erythema. Clin Phys Physiol Meas 1991; 12: 311-325. [PubMed]

7. Diffey BL. The Consistency of studies of ultraviolet erythema in normal human skin. Phys Med Biol 1982: 27: 715-720. [PubMed]

8. Kim J, Lim H. Evaluation of the photosensitive patient. Semin Cutan Med Surg 1999; 4: 253- 256. [PubMed]

9. Roelandts R. The diagnosis of photosensitivity. Arch dermatology 2000; 136: 1152-1157. [PubMed]

10. Gordon PM, Saunders PJ, Diffey BL, Farr PM. Phototesting prior to narrowband ultraviolet B phototherapy. Br J Dermatol 1998; 139: 811-814. [PubMed]

11. Basketter DA, Chamberlain M, Griffiths HA, Rowson M, Whittle E. The classification of skin irritants by human patch test. Food Chem Toxicol 1997; 35: 845-852. [PubMed]

12. Cowley NC, Farr PM. A dose response study of irritant reactions to sodium lauryl sulphate in patients with seborrhoeic dermatitis and atopic eczema. Acta derm venereol 1992; 72: 432-435. [PubMed]

13. Tupker RA, Willis C, Berardesca E, Lee CH, Fartasch M, Agner T, Serup J. Guidelines on sodium lauryl sulfate (SLS) exposure tests. A report from the Standardisation Group of the European Soceiety of Contact Dermatitis. Contact dermatitis 1997; 37: 53-69. [PubMed]

14. Kim J, Modlin RL. Innate immunity and the skin. In: Fitzpartick's Dermatology in general medicine. Editors: Fredberg I M, Eisa A Z, Wolf K, et al. Mc Graw-Hill Inc, 6:th ed 2003. Pages: 247-252.

15. Mariathasan S, Monack DM. Inflammasome adaptors and sensors: intracellular regulators of infection and inflammation. Nature Rev Immunol 2007;7: 31-40. [PubMed]

16. Fitzpatrick TB: The validity and practicality of sun reactive skin types I through VI. Arch dermatol 1988;124:869-871. [PubMed]

17. Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. British Med J 1992; 304: 1491-1494. [PubMed]

18. Howard TP, Solomon DA. Reading the tuberculin skin test. Who when and how? Arch Intern Med 1988; 148: 2457-2459. [PubMed]

19. Gourevitch MN, Teeter R, Schoenbaum EE, Klein RS. Self-assessment of tuberculin skin test reactions by drug users with or at risk for human immunodeficiency virus infection. In J Tuberc ling Dis 1999; 3: 321-325. [PubMed]

20. Navin J, Kaplan J, Desilvo E. Self-reading of PPD skin tests. J Am Coll Health 1994; 43: 37-38. [PubMed]

21. Risser NL, Belcher DW, Bushyhead JB, Sullivan BM. The accuracy of tuberculin skin tests: self assessment by adult outpatients. Public Health Rep 1985; 100: 439-445. [PubMed]

22. Prezant MD, Kelly MD, Karwa MD, Kavanagh K. Self-assessment of tuberculin skin test reactions by New York City firefighters: Reliability and cost-effectiveness in an occupational health care setting. Ann Intern Med 1996; 125: 280-283. [PubMed]

23. Lock-Andersen J, Wulf HC. Threshold level for measurement of UV sensitivity: reproducibility of phototest. Photodermatol Photimmunol Photomed 1996; 12: 154-161. [PubMed]

24. Bruze M, Isaksson M, Edman B, Björkner B, Fregert S, Möller H. A study on expert reading of patch test reactions: inter-individual accordance. Contact Dermatitis 1995; 32: 331-337. [PubMed]

25. Boldeman C, Dal H, Kristjansson S, Lindelöf B. Is self-assessment of skin type a valid method for adolescents? J Am Acad Dermatol 2004;50(3): 447-449. [PubMed]

26. Rampen FH, Fleuren BA, de Boo TM, Lemmens WA. Unreliability on self-reported burning tendency and tanning ability. Arch dermatol 1988;124:885-888. [PubMed]

27. Schreier G, Havn D, Kastner P, Koller S, Salmhofer W, Hoffmann-Wellenhof R. A mobile-phone based teledermatology system to support self-management of patients suffering from psoriasis. Conf Proc IEEE Eng Med Biol Soc 2008;2008:5338-41. [PubMed]

28. Qureshi AA, Brandling-Bennett HA, Giberti S, McClure D, Halpern EF, Kvedar JC. Evaluation of digital skin images submitted by patients who received practical training or an online tutorial. J Telemed Telecare 2006;12(2):79-82. [PubMed]

29. Ebner C, Wurm MT, Binder B, Kettler H, Lozzi GP, et al. Mobile teledermatology: a feasibility study of 58 subjects using mobile phones. J Telemed Telecare 2008; 14: 2–7. [PubMed]

30. Falk M, Ilias A, Anderson C. Inter-observer Inter-observer variability in reading of phototest reactions with sharply or diffusely delineated borders. Skin Res Technol 2008;14(4):397-402. [PubMed]

31. Ivens U, Serup J, O'goshi K. Allergy patch test reading from photographic images: disagreement on ICDRG grading but agreement on simplified tripartite reading. Skin Res Technol 2007;13:110-113. [PubMed]

© 2010 Dermatology Online Journal