Diagnostic Testing 101.1: The Importance of Sensitivity, Specificity and Diagnostic Test Accuracy

To have striven, to have made an effort, to have been true to certain ideals — this alone is worth the struggle. We are here to add what we can to, not to get what we can from, life. – William Osler


Diagnostic Medicine

Diagnostic medicine is the process of identifying the condition or disease that a patient has and  ruling out conditions or diseases the patient does not have through assessment of  the patient’s signs, symptoms, and results of various diagnostic tests.

Diagnostic Test Accuracy

Diagnostic test accuracy is simply the ability of the test to discriminate among alternative states of health (Zweig and Campbell, 1993).

If a test’s results do not differ between alternative states of health, then the test has insignificant accuracy; if the results do not overlap with other states of health then the test has perfect accuracy.  Most tests accuracies fall between these two extremes.

The intrinsic accuracy of a test is measured by comparing the test results to the “true condition status.”

‘True condition status”  refers to one of  two mutually exclusive states.  Either a condition is present or it is absent.  

We determine true condition status by means of a “gold standard” which is a source of information completely different from the test under evaluation which tells us the true condition status of the patient.

Say we want to develop a new rapid test for detecting strep throat.    Strep throat is caused by the Streptococcus bacteria.   Although more common in children and adolescents it can occur in people of all ages.  Strep throat is one of many possible causes of sore throat and pharyngitis.   It is contagious and can cause complications such as rheumatic and scarlet fever.  Treatment with antibiotics can shorten the course of the disease and reduce the risk of complications.


A throat culture is obtained by swabbing the patient’s throat with a cotton swab.  The sample is then sent to the lab where it is cultured.  If strep is present it will grow on the culture and look as below.     The bacteria either grows on the culture or it doesn’t.  A throat culture is the “gold standard” for diagnosing strep throat.  The problem is it may take two days to get back.


Sensitivity and Specificity

The two most important measures of diagnostic test accuracy are sensitivity and specificity.     

The probability that a test will be positive in someone with the condition =  Sensitivity

The Probability that a test will be negative in someone without the condition = Specificity

For diagnosing strep throat we want our test to be as close as possible to the gold standard in terms of both sensitivity and specificity.

Sensitivity and specificity can be illustrated by a table with two rows and two columns.  This simple  Decision Matrix  where the rows summarize the data  according to the true condition status of the patients and the columns summarize the test results.  This table is called a “count table” because it indicates the numbers of patients in various categories.      The total number of patients with and without the condition is, respectively n\ and n0; the total number of patients with the condition who test positive and negative is respectively s\ and s0; and the total number of patients without the condition who test positive and negative is respectively r\ and ro.

The total number of patients in the study group N, is equal to N = si+so+rx+ro, or N = n\ + no·

The true condition status is symbolized by the variable D, where D = 1 if the condition is present and D= 0 if the condition is absent.

Test results indicating the condition is present are called positive; those indicating the condition is absent are called negative.

Test results are symbolized  by the variable T, where T =1 denotes positive test results and T= 0 denotes negative test results.

Screen Shot 2015-02-02 at 1.32.12 PM

The sensitivity (Se) of a test is its ability to detect the condition when it is present.

We write sensitivity as Se = P(T = 1 | D = 1), which is read:

“sensitivity (Se) is the probability (P) that the test result is positive (T = 1), given that the condition is present (D = 1).”

Among the n\ patients with the condition, s\ test positive; thus, Se = s\/n\.

The specificity (Sp) of a test is its ability to exclude the condition in patients without the condition.

We write specificity as Sp — P(T = 0 | D — 0), which is read:

“specificity (Sp) is the probability (P) that the test result is negative (T = 0), given that the condition is absent (D = 0).”

Among no patients without the condition, ro test negative; thus, Sp — TQ/UQ

False Negative and False Positive Tests

There are consequences associated with all test results.

False Negative Tests:   If a test falsely indicates the absence of a condition in someone who truly has it then treatment can be delayed or not provided.

The consequences of a false negative strep test depend on what we do with it.   Serious consequences can arise if we use our new strep test as the sole basis for subsequent decision making.     Putting complete trust in the negative test result would lead to no antibiotic treatment provided to a patient with Strep  and can lead to continued illness,  spread of the disease and complications that would not have occurred if antibiotics were provided.  The patient could potentially get rheumatic or scarlet fever.

If the new test is negative  but a culture was drawn the false results could delay treatment by a couple days or so but treatment is nevertheless provided.  The consequences are likely to be minimal.   It is highly unlikely a patient would get rheumatic or scarlet fever  as, although a little later, they are still  being treated with the proper antibiotics.

False Positive Tests:   If a test falsely indicates the presence of a condition in someone who does not truly have it then unnecessary tests and treatments can occur.  Incorrect treatment and false labeling of patients can also occur.

In the case of a false positive strep test, a patient may undergo a course of antibiotics when they do not need them.     Although the patient may suffer side-effects from the antibiotics the severity and duration of any  of these consequences are minimal.

Screen Shot 2015-02-02 at 9.14.11 PM

The importance of a Diagnostic Accuracy in testing is directly proportional to the tests potential to cause patient consequences and harm.

Diagnostic Medicine uses a patient’s signs, symptoms and the results of various diagnostic tests to arrive at a diagnosis.

In diagnosing strep throat a good clinician will take into account  a number of variables in consideration of a differential diagnosis and base testing and treatment on the preponderance of information supporting or opposing the diagnosis.

For strep throat using the new test in addition to a throat culture, history and careful physical exam and basing the decision to prescribe antibiotics on clinical acumen based on the overall picture is the best approach.     The test can  be considered a piece of the puzzle but does not define it.  Therefore the risk of a false positive or false negative is minimal as it is just one data point.

Diagnostic accuracy is necessary if a test is being used as the  basis for further tests and treatment.  If  a test  is  being used as the sole basis for further tests and treatment it needs to be accurate.   If the results of a test can cause significant patient harm or death then it needs to  be either 100% accurate or combined with other highly accurate tests to confirm the diagnosis.

The specificity of a test is particularly important as a false positive can result in unneeded interventions and treatment.     Stand-alone tests used in diagnosis and treatment need to be both sensitive and specific.    Diagnostic accuracy is a product of consequences of  false-negative and false positive tests.

 Diagnostic Research Methodology

Research to discover the accuracy of a diagnostic test should be straightforward; administer the test to a group of people and see if it works.

The test being tested is the “index test”. Results of the index test are compared with the results of a “gold standard” reference test.

The research question is, “How accurately do index test results predict the (true, gold standard) reference test results?”

Diagnostic test accuracy studies require a sample of subjects  who have been given the test under evaluation,  some form of scoring of the tests findings and a reference or “gold standard” to which the test findings are compared.   Examples include autopsy reports, surgery findings and pathology results from biopsy findings.

The gold standard for a patient’s true disease status may not always be available.    A  brain biopsy could be considered a gold standard for diagnosing Alzheimer’s disease but is neither practical nor humane.

The Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool is a set of fourteen questions that investigate the methodologic quality of scientific studies that quantify diagnostic test performance.

Screen Shot 2015-02-02 at 6.07.29 PM

The questions identify research methodologies known to bias the accuracies research discovers.

Multiple factors need to be considered in  evaluating the diagnostic accuracy of a test including diagnostic validation and  verification.   Is the test testing what it is supposed to be testing for and are we doing it correctly?

Diagnostic accuracy of a test necessitates a reference standard,  The reference standard can be the best available method for establishing the presence or absence of a condition (such as the throat culture for strep throat) or a combination of methods (imaging, neuropsychological testing, clinical exam, etc. in Alzheimer’s disease.

Any test that is going to be used as a basis for decisions that impact other human beings needs to  be validated before it is introduced on the market.  The literature needs to  be reviewed critically and trials must be designed using objective evidence that validates the test is testing for what it purports to be and verifies the correct methodology of the test.  Verification that the test is being collected, handled, stored, transported and processed  correctly is requisite.

Cutoff levels, , cross-reactivity and myriad other issues need to be worked out prior to bringing a diagnostic test to market.

Screen Shot 2015-02-02 at 8.51.26 PM

\Screen Shot 2015-02-02 at 8.51.49 PM

Screen Shot 2015-02-02 at 8.52.02 PM

The reliability, validity and accuracy of drug test results needs to  be known prior to using a test.  Specificity and sensitivity must be known prior to using a test on any population.

This should go without saying as to do anything else would be irresponsible and careless.


Evidence-based medicine, systematic reviews, and guidelines in interventional pain management: part 7: systematic reviews and meta-analyses of diagnostic accuracy studies Pain Physician 2009, 12(6):929-963. PubMed Abstract | Publisher Full Text

Jaeschke R, Guyatt G, Lijmer J: Diagnostic tests. In Users’ guides to the medical literature: a manual for evidence-based clinical practice. Edited by Guyatt G, Rennie D. AMA Press; 2002:121-140.

Lundh A, Gøtzsche PC: Recommendations by Cochrane review groups for assessment of the risk of bias in studies.BMC Med Res Methodol 2008, 8:22.doi:10.1186/1471-2288-8-22 PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

Streiner DL: Diagnosing tests: using and misusing diagnostic and screening tests.J Pers Assess 2003, 81(3):209-219. PubMed Abstract | Publisher Full Text OpenURL

Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J: The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003., 3(25)  http://www.biomedcentral.com/1471-2288/3/25 webcite


GCP, good clinical practice; GCLP, good clinical laboratory practice; GLP, good laboratory practice; STARD, standards for reporting of diagnostic accuracy. See Section III, 2.13  From Nature Reviews Microbiology 4,S20–S32(1 December 2006) | doi:10.1038/nrmicro1570

GCP, good clinical practice; GCLP, good clinical laboratory practice; GLP, good laboratory practice; STARD, standards for reporting of diagnostic accuracy. See Section III, 2.13 From Nature Reviews Microbiology 4, S20–S32 (1 December 2006) | doi:10.1038/nrmicro1570


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s