Diagnostiek: provocatietesten

Beoordeeld: 01-07-2024

Uitgangsvraag

Wat is de plaats van provocatietestentesten bij het stellen van de diagnose cervicaal radiculair syndroom?

Aanbeveling

Overweeg om als onderdeel van lichamelijk onderzoek een combinatie van de twee onderstaande provocatietesten uit te voeren om de diagnose CRS waarschijnlijker te maken:

Spurling’s test,
Gecombineerde Upper Limb Neural Tension testen voor nervus medianus, nervus radialis en nervus ulnaris.

Overwegingen

Voor- en nadelen van de interventie en de kwaliteit van het bewijs

In de klinische praktijk wordt de diagnose van CRS gebaseerd op een combinatie van het klinische beeld (inclusief anamnese) van de patiënt, het lichamelijk onderzoek en (zo nodig) diagnostische beeldvorming (Thoomes, 2017; Sleijser-Koehorst, 2021). Deze module evalueert alleen verschillende provocatietesten die kunnen worden uitgevoerd tijdens het lichamelijk onderzoek.

Om de rol van provocatietesten in het diagnostisch traject van patiënten met een cervicaal radiculair syndroom te bepalen, is in de literatuur gezocht naar de diagnostische accuratesse van provocatietesten. Er werd één systematische review gevonden (Thoomes, 2017). Deze is samengevat, en studies die hierna zijn verschenen zijn aan deze samenvatting toegevoegd (Grondin, 2021; Park, 2017; Sleijser-Koehorst, 2021). De bewijskracht voor de kritieke uitkomstmaat sensitiviteit bij vier gecombineerde ULNT’s was medium. De bewijskracht voor de kritieke uitkomstmaten sensitiviteit en negatief voorspellende waarde is in alle andere gevallen (upper limb neural tension tests, arm squeeze test, Spurling’s test, tractie, Shoulder abduction en de Neck tornado test) laag tot zeer laag. Dit komt door methodologische tekortkomingen (risico op vertekening) en brede betrouwbaarheidsintervallen, vaak in combinatie met kleine studiepopulaties (imprecisie) en (klinische) heterogeniteit. De bewijskracht voor de uitkomstmaat specificiteit en positief voorspellende waarde van de Spurling’s test en gecombineerde ULNTs is laag, maar de gepoolde data suggereren wel hoge specificiteit en PPV (>0,80). Derhalve kunnen er op basis van alleen de literatuur geen sterke aanbevelingen geformuleerd worden. Mede omdat er nog geen bewijs voor of tegen de diagnostische waarde van klinisch neurologisch onderzoek is en er geen complicaties beschreven zijn bij het uitvoeren van de provocatietesten.

Waarden en voorkeuren van patiënten (en evt. hun verzorgers)

Het belangrijkste doel is de mate van waarschijnlijkheid van de diagnose CRS verhogen zodat een passende behandelstrategie voorgesteld kan worden. Eén van de mogelijke voordelen van het uitvoeren van de provocatie testen om de diagnose CRS te bevestigen is de reproductie van patiënt specifieke klachten. Het feit dat de behandelaar in staat is de (voor de patiënt bekende) klachten op te wekken, kan bij de patiënt vertrouwen wekken in het stellen van de diagnose en daarmee het voorgestelde behandelbeleid. De testen zijn zowel in de eerste lijn als in de tweedelijns gezondheidszorg setting direct uitvoerbaar na het afnemen van de anamnese en vereisen geen vervolgconsult en extra tijdsinvestering van de patiënt. Anders dan een korte verergering (tijdens de provocatie test) van de klachten zijn er voor de patiënt geen nadelen bekend van het uitvoeren van de provocatie testen.

Kosten (middelenbeslag)

Er zijn geen kosteneffectiviteitsstudie voor de uitvoering van deze diagnostiek bij de werkgroep bekend. Voor uitvoering van de provocatie testen zoals door de werkgroep aanbevolen, zijn geen relevante extra kosten noodzakelijk.

Aanvaardbaarheid, haalbaarheid en implementatie

Er is geen onderzoek gedaan naar de aanvaardbaarheid en haalbaarheid van de provocatietesten bij de diagnostiek van CRS. Voor veel medisch specialisten zullen deze provocatietesten wellicht minder of niet bekend zijn; indien gewenst zouden zij zich hierin kunnen laten bijscholen. In principe is iedere fysiotherapeut opgeleid voor het uitvoeren van deze provocatietesten.

Rationale van de aanbeveling: weging van argumenten voor en tegen de diagnostische procedure

Gezien de lage tot zeer lage bewijskracht voor de diagnostische waarde van de individuele provocatietesten, is de aanbeveling van de werkgroep voor de diagnose CRS vooral aandacht te besteden aan anamnese en beeldvorming. Indien de behandelaar meerwaarde ziet van aanvullende testen, is het de aanbeveling om dan een cluster van provocatietesten toe te passen om de mate van waarschijnlijke aanwezigheid van een cervicaal radiculair syndroom vast te stellen. Daarbij kan worden gebruikgemaakt van:

Spurling’s test,
Gecombineerde Upper Limb Neural Tension testen voor n. medianus, radialis en ulnaris,

De ‘A’ variant van de Spurling is gekozen om de kans op een vals positieve uitslag van reproductie van somatische referred pain te verminderen zoals die opgewekt zou kunnen worden in andere varianten met bijvoorbeeld een positie van lateroflexie in combinatie met extensie en rotatie naar de aangedane zijde. Met de ‘A’ variant van Spurling wordt de variant bedoeld waarbij, ná lateroflexie van het hoofd naar de aangedane zijde, langzaam axiale compressie wordt toegevoegd met daarna (indien nodig) enige cervicale extensie. Reproductie van patiënt specifieke klachten is een positieve testuitslag.

De diagnostische waarde van het neurologisch onderzoek naar reflexen, spierkracht en sensibiliteit is onbekend. Er is alleen retrospectief onderzoek gedaan bij geopereerde CRS-patiënten (Thoomes, 2017). Het neurologisch onderzoek blijft in de spreekkamer de standaard en kan worden aangevuld met specifieke wortelrekkingsproeven zoals in deze module beschreven staan. Als de zenuwwortel substantieel gecomprimeerd wordt, zal een motorische en/of sensibele hypofunctie waarneembaar zijn.

Onderbouwing

Achtergrond

In de klinische praktijk is de diagnose van CRS gebaseerd op een combinatie van het klinische beeld (inclusief anamnese) van de patiënt, het lichamelijk onderzoek en (zo nodig) diagnostische beeldvorming (Thoomes, 2017; Sleijser-Koehorst, 2021). Er kunnen verschillende provocatietesten worden uitgevoerd tijdens het lichamelijk onderzoek, maar de diagnostische nauwkeurigheid van deze testen is onbekend. Deze module evalueert de diagnostische accuratesse van provocatietestentesten voor het aantonen of uitsluiten van een cervicaal radiculair syndroom.

Conclusies

1. Four combined Upper limb Neural tension tests (ULNT’s)

1.1 Four combined Upper limb Neural tension tests (ULNT’s)

1.1.1 Sensitivity

Low GRADE

The evidence suggests that the sensitivity of one positive ULNT out of a cluster of four combined ULNT’s is likely high (>0.80) for diagnosing cervical radiculopathy.

Source: Apelby-Albrecht, 2013; Grondin, 2021

1.1.2 Specificity, PPV, NPV

Very low GRADE

The evidence is very uncertain about the specificity, PPV and NPV of one positive ULNT out of a cluster of four combined ULNTs for diagnosing cervical radiculopathy.

Source: Apelby-Albrecht, 2013; Grondin, 2021

1.2 ULNT1 median alone

1.2.1 Sensitivity, specificity, PPV, NPV

Very low GRADE

The evidence is very uncertain about the sensitivity, specificity PPV and NPV of ULNT1 alone for diagnosing cervical radiculopathy.

Source: Apelby-Albrecht, 2013; Grondin, 2021; Sleijser-Koehorst, 2021

2. Arm squeeze test

Low GRADE

The evidence suggests that the diagnostic accuracy (sensitivity, specificity, PPV and NPV) is high (>0.80) for the arm squeeze test.

Source: Gumina, 2013

3. Spurling’s test

3.1 Sensitivity, specificity, PPV, NPV

Very low GRADE

The evidence is very uncertain about the sensitivity, specificity, PPV and NPV of Spurling’s test for diagnosing cervical radiculopathy.

Source: Park, 2017; Shabat, 2012; Shah; 2004; Sleijser-Koehorst, 2021, Viikari-Juntura, 1989

4. Traction

Sensitivity, specificity, negative predictive value, positive predictive value

Very low GRADE

The evidence is very uncertain about the sensitivity, specificity, PPV and NPV of Traction for diagnosing cervical radiculopathy.

Source: Viikari-Juntura (1989

5. Shoulder abduction test

5. Sensitivity, NVP

Low GRADE

The evidence suggests that the sensitivity of the shoulder abduction test for diagnosing cervical radiculopathy is low (<0.60).

Source: Sleijser-Koehorst, 2021; Viikari-Juntura, 1989

Low GRADE

The evidence suggests that the negative predictive value of the shoulder abduction test for diagnosing cervical radiculopathy is moderate (>0.60, <0.80).

Source: Sleijser-Koehorst, 2021; Viikari-Juntura, 1989

5.2 Specificity, PPV

Very low GRADE

The evidence is very uncertain about the specificity and PPV of the shoulder abduction test for diagnosing cervical radiculopathy.

Source: Sleijser-Koehorst, 2021; Viikari-Juntura, 1989

6. Neck tornado test (Choi’s test)

Very low GRADE

The evidence is very uncertain about the sensitivity, specificity, PPV and NPV of the Neck tornado test (Choi’s test) in diagnosing cervical radiculopathy.

Source: Park, 2017

Samenvatting literatuur

Description of studies

Thoomes (2017) performed a systematic review on the diagnostic accuracy of test for diagnosing cervical radiculopathy performed during a physical examination. Diagnostic accuracy outcomes were compared with a reference standard of imaging or surgical findings. The electronic databases CENTRAL, PubMed (including MEDLINE), Embase, CINAHL, Web of Science and Google Scholar were searched from inception up to March 2016. Criteria for inclusion of studies were: 1) patients who were over 18 years of age, patients suspected of cervical radiculopathy from nerve root compression due to cervical disc herniation/degenerative spondylotic changes, 3) reporting diagnostic accuracy of a physical examination test, carried out in primary or secondary care setting and 4) presenting results from full reports. The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) was used to assess the risk of bias on the following domains: patient selection, index test, reference test and flow and timing. All studies were judged to have a high or unclear risk of bias in for at least one domain. The authors of the systematic review declared no competing interests.

After publication of the systematic review by Thoomes (2017), three other diagnostic accuracy studies were published that matched the predefined PICO (Grondin, 2021; Park, 2017; Sleijser-Koehorst, 2021). These studies are summarized below.

The prospective cohort study by Grondin (2021) tested diagnostic accuracy of single and combined Upper limb neurodynamic tests (ULNTs) and included patients with a diagnostic uncertainty between September 2017 and September 2019. The study was carried out in accordance with the Standards for Reporting Diagnostic accuracy studies (STARD) guidelines. Criteria for inclusion of patients were: 1) between 18 and 65 years of age, 2) reporting arm pain with or without neck pain (for at least 3 months), 3) self-reported pain score between 30 mm and 80 mm on a 100 mm visual analogue scale (VAS) for the previous 24 hours, and 4) a self-reported score of >20% on the Neck Disability Index (NDI). Patients were excluded in case of: 1) inability to understand French, 2) significant neck trauma at time of study, 3) a history of neck or arm surgery, 4) presence of one of the following conditions: cardiovascular/psychiatric/neoplastic/neurological/(extra)pyramidal pathology, cervical myelopathy, diabetes, pregnancy, fibromyalgia or an inflammatory joint condition/arthritis.

The reference test was performed by a single neurosurgeon with at least 15 years of experience, consisting of a clinical diagnosis (history and presence of radicular pain/symptoms of cervical radiculopathy), confirmed using imaging verification by MRI. ULNTs were carried out approximately 1 hour after the reference standard by a single physiotherapist with 10 years of experience in neck pain management and with advanced certification for orthopedic assessment. After screening 109 individuals, 85 patients were included in the study, and no missings were reported. The authors declared no competing interests.

The retrospective study by Park (2017) tested the diagnostic accuracy of the Spurling test and the Neck tornado test (Choi’s test) and for this purpose reviewed records of 135 patients who were referred to the pain clinic between September 2014 and August 2015. Criteria of inclusion of patients were: 1) presence of neck pain and 2) availability of a cervical spine MRI. Exclusion criteria were: 1) a history of cervical spine surgery, 2) a previous nerve block for cervical radiculopathy, 3) pregnancy and 4) inflammatory disease such as rheumatoid arthritis.

The reference test was performed by a pain clinician with at least 10 years of experience, confirming cervical radiculopathy considering symptoms and MRI. The Spurling test and NNT were performed at an unknown time interval before the reference test. Records of 135 patients were reviewed and no missings were reported. However, the report lacked a detailed patient flow. The authors declared no competing interests.

The prospective cohort study by Sleijser-Koehorst (2021) tested the diagnostic accuracy of the Spurling test, Upper Limb Neurodynamic test and the Shoulder abduction relief test. Criteria of inclusion of patients were: 1) at least 18 years old, 2) ability to understand the Dutch language, 3) Patients were excluded in case they: 1) reported serious cervical pathology (malignancies, (rheumatoid) arthritis, myelopathy or fractures), 2) suffered neurological conditions, diabetes mellitus, complex regional pain syndrome, polyneuropathy or 4) had a history of spinal surgery.

The reference test was performed by a neurosurgeon based on clinical presentation and an MRI scan confirming nerve root compression or irritation at a relevant segmental level. The physical tests were performed by an experienced physiotherapist, prior to the reference standard. Missing data were reported for the Spurling (n= 1), ULNT1 (n= 4) and the Shoulder abduction relief test (n= 3). The authors declared no competing interests.

Characteristics of the included studies are described in Table 1.

Table 1. Description of included studies

Study	Characteristics		Diagnostics			Study design
Setting		Population	Indextest	Cut-off value	Reference test (cut-off)
Thoomes (2017)
Apelby-Albrecht (2013)	Center for Spinal surgery Country: Sweden Prevalence: 0.69 (95% CI 0.54 to 0.81)	Mean age: NR Female (%): NR Duration of pain: NR	ULNT1 (median), ULNT2a (median), ULNT2b (radial) and ULNT3 (ulnar)	Increase/decrease in symptoms combined with structural differentiation	1: Clinical examination, medical history and; 2: MRI-scan and; 3 history	Diagnostic cohort study
Gumina (2013)	Shoulder Clinical Office and Orthopedic Spine Ambulatory Country: Italy Prevalence: 0.20 (95% CI 0.18 to 0.22)	Mean age: NR Female (%): NR Duration of pain: NR	Arm squeeze test	Higher score (≥ 3 points) on pressure on the middle third of the upper arm compared with the other two areas	1: Clinical examination and; 2: MRI-scan and; 3 history	Cohort study resembling a case control-design
Shabat (2012)	Spine Surgery Unit Country: Israel Prevalence: 0.68 (95% CI 0.71 to 0.75)	Mean age: NR Female (%): NR Duration of pain: NR	Spurling (Ext+Rot+Ax compression)	Increase of symptoms	Complete physical examination and MRI/CT imaging	Cohort study
Shah (2004)	Neurosurgical Unit Country: India Prevalence: 0.86 (95% CI 0.72 to 0.82)	Mean age: NR Female (%): NR Duration of pain: NR	Spurling (Ext+LF+Ax pressure)	Increase of symptoms	T-2 weighted axial MRI	Prospective cohort study
Viikari-Juntura (1989)	Neurosurgery department Country: Finland Prevalence:	Mean age: NR Female (%): NR Duration of pain: NR	Spurling (LF+Rot+Ax compression) Traction	Increase of symptoms	1: conventional neurological examination and; 2: Cervical myelography	Prospective cohort study
Grondin (2021)	Neurosurgery department Country: France Prevalence: 0.317	Mean age (SD): 44 (CR+) and 45 (CR-) Female (%): NR Duration of pain, months (SD): 93 (98) for CR+ and 71 (62) for CR-	ULNT1 (median), ULNT2a (median), ULNT2b (radial) and ULNT3 (ulnar)	Reproduction of a familiar symptomatic complaint combined with structural differentiation	1: diagnosis based on clinical presentation by neurosurgeon and; 2: MRI-scan	Prospective cohort study
Park (2017)	Pain clinic in hospital Country: Korea Prevalence: 0.50 (95% CI 0.41 to 0.58)	Mean age: 53.4 (13.1) Female (%): 57 (42) Duration of pain: NR	Spurling (Ext+Rot+Ax pressure) Neck tornado test (Choi’s test)	Reproduction/increase of radicular pain/tingling	1 diagnosis based on clinical presentation by neurosurgeon and; 2: MRI-scan	Retrospective cohort study
Sleijser-Koehorst (2021)	Multidisciplinary clinic Country: the Netherlands Prevalence: 0.37 (0.27 to 0.48)	Mean age (SD): 49.9 (10.7) Female (%): 65 (48.5) Median duration of pain, weeks (IQR): 26 (13- 104)	Index: Spurling (Ext+Rot+LF) Comparators: ULNT1, Shoulder abduction relief test, and cervical distraction test	Index: Reproduction of symptoms Comparators: Reproduction of symptoms and increased/decreased symptoms (ULNT1) or relief of symptoms (Shoulder abduction/cervical distraction	1: diagnosis based on clinical presentation by neurosurgeon and; 2: MRI-scan	Prospective cohort study
Abbreviations: Ax: axial compression/pressure; CR+: subjects with cervical radiculopathy; CR- subjects without cervical radiculopathy; Ext: extension; LF: lateral flexion; NR: not reported; Rot: rotation; SD: standard deviation; ULNT: Upper Limb Neurodynamic Tests

Results

Diagnostic accuracy is assessed below for the following instruments:

Upper limb Neural tension tests (ULNT’s)
1.1 Four combined Upper limb Neural tension tests (ULNT’s); 1.2 ULNT1 median
Arm squeeze test
Spurling’s test
3.1 Spurling’s test (Ext+ Rot) on “true radicular symptoms”; 3.2 Spurling’s test (Ext+ LF); 3.3 Spurling’s test (LF+ Rot); 3.4 Spurling’s test (Ext+Rot+LF); 3.5 Spurling’s test (Ext+Rot+Ax)
Traction
Shoulder abduction test
Neck tornado test (Choi’s test)

For each instrument sensitivity, specificity, PPV and NPV were reported and summarized below.

1. Upper limb Neural tension tests (ULNT’s)

1.1 Four combined Upper limb Neural tension tests (ULNT’s)

Two studies reported on four combined ULNT’s as a diagnostic for cervical radiculopathy (Apelby-Albrecht (2013), Grondin (2021), and compared the outcome with clinical examination and MRI as reference. A positive outcome on one of four ULNTs was needed for a diagnosis of CRS. Results are depicted in Table 2 and Table 3.

Table 2 shows the results of Apelby-Albrecht (2013) as summarized in Thoomes (2017). Regarding sensitivity, 5 out of 35 patients (3%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using a combination of 4 ULNT’s. Regarding specificity, 5 out of 16 (31%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.87 meaning that 34 out of 39 patients testing positive on a combination of 4 ULNT’s, indeed did have cervical radiculopathy. The NPV was 0.92, translating into 10 out of 11 (92%) testing negative with a combination of 4 ULNT’s, indeed did not have cervical radiculopathy.

Table 3 shows the results of Grondin (2021). Since no 2x2 Table was presented by the authors for this outcome, the values of TP, FP, FN and TN are derived from sensitivity, specificity, prevalence and included participants reported in the publication.

Regarding sensitivity, 1 out of 27 patients (4%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using a combination of 4 ULNT’s. Regarding specificity, 31 of the 58 patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.46 meaning that 31 of the 57 patients testing positive on a combination of 4 ULNT’s, actually did not have cervical radiculopathy. The NPV was 0.96, translating into 1 out of 28 (4%) testing negative with a combination of 4 ULNT’s, actually have cervical radiculopathy.

Table 2: Diagnostic accuracy of ULNT1, ULNT2a, ULNT2b and ULNT3 combined (Apelby-Albrecht, 2013)

	Reference (clinical examination and MRI)
	+	-
combination of 4 ULNT’s +	34 (TP)	5 (FP)	39	PPV: 34/39 = 0.87 (95% CI 0.77 to 0.93)
combination of 4 ULNT’s -	1 (FN)	11 (TN)	12	NPV: 11/12 = 0.92 (95% CI 0.61 to 0.99)
	35	16	51
	Sensitivity: 34/35 = 0.97 (95% CI 0.85 to 1.00)	Specificity: 11/16 = 0.69 (95% CI 0.41 to 0.89)

Table 3: Diagnostic accuracy of ULNT1, ULNT2a, ULNT2b and ULNT3 combined (Grondin (2021)

	Reference (clinical examination and MRI)
	+	-
combination of 4 ULNT’s +	26 (TP)	31 (FP)	57	PPV: 26/57 = 0.46 (95% CI 0.39 to 0.52)
combination of 4 ULNT’s -	1 (FN)	27 (TN)	28	NPV: 27/28 = 0.96 (95% CI 0.79 to 0.99)
	27	58	85
	Sensitivity: 1/27 = 0.96 (95% CI 0.81 to 1.00)	Specificity: 27/58 = 0.47 (95% CI 0.33 to 0.60)

1.2 ULNT1 median

Three studies reported on ULNT1 median as a diagnostic for cervical radiculopathy (Apelby-Albrecht, 2013; Grondin, 2021; Sleijser-Koehorst, 2021).

Table 4 shows the results of Apelby-Albrecht (2013) as summarized in Thoomes (2017). Regarding sensitivity, 6 out of 35 patients (17%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using ULNT1 median alone. Regarding specificity, 4 out of 16 (25%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.88 meaning that 4 out of 33 patients testing positive on ULNT1 median alone, actually did not have cervical radiculopathy. The NPV was 0.67, translating into 6 out of 18 (33%) testing negative with ULNT1 median alone, actually have cervical radiculopathy.

Table 5 shows the results of Grondin (2021). Since no 2x2 Table was presented by the authors for this outcome, the values of TP, FP, FN and TN are derived from sensitivity, specificity, prevalence and included participants reported in the publication.

Regarding sensitivity, 11 out of 27 patients (41%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using ULNT1 median alone. Regarding specificity, 14 out of 58 patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.53 meaning that 14 of the 30 patients testing positive on ULNT1 median alone, actually did not have cervical radiculopathy. The NPV was 0.80, translating into 11 out of 55 (20%) testing negative with ULNT1 median alone, actually have cervical radiculopathy.

Table 6 shows the results of Sleijser-Koehorst, 2021). Regarding sensitivity, 21 out of 64 patients (33%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using ULNT1 median alone. Regarding specificity, 22 out of 66 (33%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.66 meaning that 22 out of 65 patients testing positive on ULNT1 median alone, actually did not have cervical radiculopathy. The NPV was 0.68, translating into 21 out of 65 (32%) testing negative with ULNT1 median alone, actually have cervical radiculopathy.

Table 4: Diagnostic accuracy of ULNT1 median alone (Apelby-Albrecht, 2013)

	Reference (clinical examination and MRI)
	+	-
ULNT1 median +	29 (TP)	4 (FP)	33	PPV: 29/33 = 0.88 (95% CI 0.71 to 0.96)
ULNT1 median -	6 (FN)	12 (TN)	18	NPV: 12/18 = 0.67 (95% CI 0.41 to 0.86)
	35	16	51
	Sensitivity: 29/35 = 0.83 (95% CI 0.66 to 0.93)	Specificity: 12/16 = 0.75 (95% CI 0.48 to 0.93)

Table 5: Diagnostic accuracy of ULNT1 median alone (Grondin (2021)

	Reference (clinical examination and MRI)
	+	-
ULNT1 median +	16 (TP)	14 (FP)	30	PPV: 16/30 = 0.53 (95% CI 0.34 to 0.72)
ULNT1 median -	11 (FN)	44 (TN)	55	NPV: 44/55 = 0.80 (95% CI 0.67 to 0.90)
	27	58	85
	Sensitivity: 16/27 = 0.59 (95% CI 0.39 to 0.78)	Specificity: 44/58 = 0.76 (95% CI 0.63 to 0.86)

Table 6: Diagnostic accuracy of ULNT1 median alone (Sleijser-Koehorst, 2021))

	Reference (clinical examination and MRI)
		+		-
ULNT1 median +		43 (TP)		22 (FP)	65	PPV: 43/65 = 0.66 (95% CI 0.53 to 0.77)
ULNT1 median -		21 (FN)		44 (TN)	65	NPV: 44/6 5= 0.68 (95% CI 0.55 to 0.79)
		64		66	130
	Sensitivity: 43/64 = 0.67 (95% CI 0.54 to 0.78)		Specificity: 44/66 = 0.67 (95% CI 0.54 to 0.78)

2. Arm squeeze test

One study reported on the arm squeeze test as a diagnostic for cervical radiculopathy (Gumina, 2013), and compared the outcome with clinical examination and MRI as reference. Results are depicted in Table 7.

Regarding sensitivity, 10 out of 305 patients (3%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using the arm squeeze test. Regarding specificity, 43 out of 1262 (3%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.87 meaning that 43 out of 338 patients testing positive on the arm squeeze test, actually did not have cervical radiculopathy. The NPV was 0.99, translating into 10 out of 1229 (1%) testing negative with the arm squeeze test, actually have cervical radiculopathy.

Table 7: Diagnostic accuracy of the arm squeeze test (Gumina, 2013)

	Reference (clinical examination and MRI)
	+	-
Arm squeeze test +	295 (TP)	43 (FP)	338	PPV: 295/338 = 0.87 (95% CI 0.83 to 0.91)
Arm squeeze test -	10 (FN)	1219 (TN)	1229	NPV: 1219/1229 = 0.99 (95% CI 0.98 to 0.99)
	305	1262	1567
	Sensitivity: 295/305 = 0.97 (95% CI 0.93 to 0.98)	Specificity: 1219/1262 = 0.97 (95% CI 0.95 to 0.98)

3. Spurling’s test

Five studies reported on the Spurling’s test as a diagnostic for radiculopathy (Park, 2017; Shabat, 2012; Shah, 2004; Sleijser-Koehorst, 2021; Viikari-Juntura, 1989). A variety of different movements before Spurling’s test was reported, results are depicted in paragraphs 3.1 to 3.5.

Summarized, sensitivity ranged from 0.38 (95% CI 0.22 to 0.56) in Viikari-Juntura (1989) to 0.98 (95% CI 0.92 to 0.99) in Shabat (2012) and specificity ranged from 0.84 (95% CI (95% CI 0.72 to 0.91) in Sleijser-Koehorst (2021) to 1.00 (95% CI 0.56 to 1.00) in Shah (2004).

PPV ranged from 0.78 (95% CI 0.63 to 0.88) in Sleijser-Koehorst (2021) to 1.00 (95% CI 0.85 to 1.00) in Shah (2004), and NPV ranged from 0.32 (95% CI 0.15 to 0.55) in Shah (2004) to NPV: 49/52 = 0.94 (95% CI 0.83 to 0.99) in Shabat (2012).

3.1 Spurling’s test (Ext+ Rot) on “true radicular symptoms”

Shabat (2012) reported on Spurling’s test using cervical extension combined with ipsilateral rotation. See Table 8 below.

Regarding sensitivity, 3 out of 118 patients (2%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using spurling’s test (Ext+ Rot). Regarding specificity, 6 out of 55 (11%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.95, meaning that 6 out of 121 (5%) participants testing positive on Spurling’s test (Ext+ Rot), actually did not have cervical radiculopathy. The NPV was 0.94, translating into 3 out of 52 participants (6%) testing negative with spurling’s test (Ext+Rot), actually have cervical radiculopathy.

Table 8: diagnostic accuracy of Spurlings test (Shabat, 2012)

	Reference (MRI/CT)
	+	-
Spurling’s test (Ext+ Rot) +	115 (TP)	6 (FP)	121	PPV: 115/121 = 0.95 (95% CI 0.89 to 0.98)
Spurling’s test (Ext+ Rot) -	3 (FN)	49 (TN)	52	NPV: 49/52 = 0.94 (95% CI 0.83 to 0.99)
	118	55	173
	Sensitivity: 115/118 = 0.98 (95% CI 0.92 to 0.99)	Specificity: 49/55 = 0.89 (95% CI 0.77 to 0.96)

3.2 Spurling’s test (Ext+ LF)

Shah (2004) reported on Spurling’s test using cervical extension combined with ipsilateral lateral flexion. Results are depicted in Table 9 below.

Regarding sensitivity, 15 out of 43 patients (35%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Spurling’s test (Ext+LF). Regarding specificity, 0 out of 7 (0%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 1.00, meaning that all of the 28 participants testing positive on Spurling’s test (Ext+LF), actually did have cervical radiculopathy. The NPV was 0.32, translating into 15 out of 22 participants (64%) testing negative with Spurling’s test (Ext+LF), actually have cervical radiculopathy.

Table 9: diagnostic accuracy of Spurlings test (Shah, 2004)

	Reference (MRI/operation)
	+	-
Spurling’s test (Ext+ LF) +	28 (TP)	0 (FP)	28	PPV: 28/28 = 1.00 (95% CI 0.85 to 1.00)
Spurling’s test (Ext+ LF) -	15 (FN)	7 (TN)	22	NPV: 7/22 = 0.32 (95% CI 0.15 to 0.55)
	43	7	50
	Sensitivity: 28/43 = 0.65 (95% CI 0.49 to 0.79)	Specificity: 7/7 = 1.00 (95% CI 0.56 to 1.00)

3.3 Spurling’s test (LF+ Rot)

Viikari-Juntura (1989) reported on Spurling’s test using ipsilateral lateral flexion and rotation.

Table 10 shows the results of Viikari-Juntura (1989) as presented by Thoomes (2017). Regarding sensitivity, 20 out of 32 patients (62%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Spurling’s test (Ext+LF). Regarding specificity, 3 out of 54 (6%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.86, meaning that 3 out of 15 (14%) participants testing positive on Spurling’s test (Ext+LF), actually did not have cervical radiculopathy. The NPV was 0.80, translating into 20 out of 71 participants (20%) testing negative with Spurlings test (Ext+LF), actually have cervical radiculopathy.

Table 10: diagnostic accuracy of Spurlings test (Viikari-Juntura, 1989)

	Reference (MRI/operation)
	+	-
Spurling’s test (Ext+ LF) +	12 (TP)	3 (FP)	15	PPV: 12/15 = 0.86 (95% CI 0.56 to 0.98)
Spurling’s test (Ext+ LF) -	20 (FN)	51 (TN)	71	NPV: 51/71 = 0.80 (95% CI 0.51 to 0.95)
	32	54	86
	Sensitivity: 12/32 = 0.38 (95% CI 0.22 to 0.56)	Specificity: 51/54 = 0.94 (95% CI 0.83 to 0.99)

3.4 Spurling’s test (Ext+Rot+LF)

Table 11 shows the results of Sleijser-Koehorst, 2021). Regarding sensitivity, 27 out of 65 patients (41%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Spurling’s test. Regarding specificity, 11 out of 68 (16%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.78, meaning that 11 out of 49 (22%) participants testing positive on Spurling’s test, actually did not have cervical radiculopathy. The NPV was 0.68, translating into 27 out of 84 participants (32%) testing negative with Spurling’s test, actually have cervical radiculopathy.

Table 11: diagnostic accuracy of Spurlings test (Sleijser-Koehorst, 2021)

	Reference (MRI and clinical presentation)
	+	-
Spurling’s test (Ext+ LF) +	38 (TP)	11 (FP)	49	PPV: 38/49 = 0.78 (95% CI 0.63 to 0.88)
Spurling’s test (Ext+ LF) -	27 (FN)	57 (TN)	84	NPV: 57/84 = 0.68 (95% CI 0.57 to 0.78)
	65	68	133
	Sensitivity: 38/65 = 0.59 (95% CI 0.46 to 0.70)	Specificity: 11/68 = 0.84 (95% CI 0.72 to 0.91)

3.5 Spurling’s test (Ext+Rot+Ax)

Park (2017) reported on Spurling’s test using extension, rotation and downward pressure on the head.

Table 12 shows the results of Park (2017). Regarding sensitivity, 30 out of 67 patients (45%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Spurling’s test. Regarding specificity, 1 out of 68 (1%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.97, meaning that 1 out of 38 (3%) participants testing positive on Spurling’s test, actually did not have cervical radiculopathy. The NPV was 0.69, translating into 30 out of 97 participants (31%) testing negative with Spurling’s test, actually have cervical radiculopathy.

Table 12: diagnostic accuracy of Spurlings test (Park, 2017)

	Reference (MRI)
	+	-
Spurling’s test (Ext+ LF) +	37 (TP)	1 (FP)	38	PPV: 37/38 = 0.97 (95% CI 0.86 to 1.00)
Spurling’s test (Ext+ LF) -	30 (FN)	67 (TN)	97	NPV: 67/97 = 0.69 (95% CI 0.59 to 0.78)
	67	68	135
	Sensitivity: 37/67 = 0.55 (95% CI 0.43 to 0.67)	Specificity: 67/68 = 0.99 (95% CI 0.92 to 1.00)

4. Traction

One study reported on Traction as a diagnostic for cervical radiculopathy (Viikari-Juntura (1989)), and compared the outcome with a myelogram as reference. In total, 24 participants received traction as clinical test. Results are depicted in Table 13.

Regarding sensitivity, 10 out of 15 patients (62%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Traction. Regarding specificity, 1 out of 33 (3%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.83, meaning that 1 out of 6 (17%) participants testing positive on traction, actually did not have cervical radiculopathy. The NPV was 0.76, translating into 10 out of 42 participants (14%) testing negative with traction, actually have cervical radiculopathy.

Table 13: diagnostic accuracy of traction (Viikari-Juntura, 1989)

	Reference (Myelogram)
	+	-
Traction +	5 (TP)	1 (FP)	6	PPV: 5/6 = 0.83 (95% CI 0.37 to 0.99)
Traction -	10 (FN)	32 (TN)	42	NPV: 32/42 = 0.76 (95% CI 0.60 to 0.87)
	15	33	48
	Sensitivity: 5/15 = 0.33 (95% CI 0.13 to 0.52)	Specificity: 1/32 = 0.97 (95% CI 0.37 to 0.99)

5. Shoulder abduction test

Two studies reported on the shoulder abduction relief test (Viikari-Juntura, 1989; Sleijser-Koehorst, 2021).

Table 14 shows the results of Sleijser-Koehorst (2021). Regarding sensitivity, 32 out of 64 patients (50%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Shoulder abduction test. Regarding specificity, 17 out of 67 (15%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.65, meaning that 17 out of 49 (35%) participants testing positive on Shoulder abduction test, actually did not have cervical radiculopathy. The NPV was 0.61, translating into 32 out of 82 participants (39%) testing negative with Shoulder abduction test, actually have cervical radiculopathy.

Table 15 shows the results of Viikari-Juntura (1989) as presented by Thoomes (2017). Regarding sensitivity, 8 out of 15 patients (53%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using Shoulder abduction test. Regarding specificity, 2 out of 13 (15%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.78, meaning that 2 out of 9 (12%) participants testing positive on Shoulder abduction test, actually did not have cervical radiculopathy. The NPV was 0.58, translating into 8 out of 19 participants (42%) testing negative with Shoulder abduction test, actually have cervical radiculopathy.

Table 14: diagnostic accuracy of the shoulder abduction test (Sleijser-Koehorst, 2021)

	Reference (MRI and clinical presentation)
	+	-
Shoulder abduction +	32 (TP)	17 (FP)	49	PPV: 32/49 = 0.65 (95% CI 0.50 to 0.78)
Shoulder abduction -	32 (FN)	50 (TN)	82	NPV: 50/82 = 0.61 (95% CI 0.50 to 0.72)
	64	67	131
	Sensitivity: 32/64 = 0.50 (95% CI 0.37 to 0.63)	Specificity: 50/67 = 0.75 (95% CI 0.62 to 0.84)

Table 15: diagnostic accuracy of the shoulder abduction test (Viikari-Juntura, 1989)

	Reference (Myelogram)
	+	-
Shoulder abduction +	7 (TP)	2 (FP)	9	PPV: 7/9 = 0.78 (95% CI 0.40 to 0.96)
Shoulder abduction -	8 (FN)	11 (TN)	19	NPV: 11/19 = 0.58 (95% CI 0.34 to 0.79)
	15	13	28
	Sensitivity: 7/15 = 0.47 (95% CI 0.22 to 0.73)	Specificity: 11/13 = 0.85 (95% CI 0.54 to 0.97)

6. Neck tornado test (Choi’s test)

One study reported on the neck tornado test (NNT) as a diagnostic for cervical radiculopathy (Park, 2017).

Table 16 shows the results of Park (2017). Regarding sensitivity, 10 out of 67 patients (15%) with cervical radiculopathy were falsely identified as not having cervical radiculopathy by using the NNT. Regarding specificity, 9 out of 68 (13%) patients without cervical radiculopathy were falsely identified as having cervical radiculopathy. The PPV was 0.86, meaning that 9 out of 66 (14%) participants testing positive on the NNT, actually did not have cervical radiculopathy. The NPV was 0.86, translating into 10 out of 69 participants (14%) testing negative with the NNT, actually have cervical radiculopathy.

Table 16: diagnostic accuracy of the Neck tornado test (Choi’s test) (Park, 2017)

	Reference (MRI)
	+	-
NNT +	57 (TP)	9 (FP)	66	PPV: 57/66 = 0.86 (95% CI 0.76 to 0.94)
NNT -	10 (FN)	59 (TN)	69	NPV: 59/69 = 0.86 (95% CI 0.75 to 0.93)
	67	68
	Sensitivity: 57/67 = 0.85 (95% CI 0.74 to 0.93)	Specificity: 59/68 = 0.87 (95% CI 0.76 to 0.94)

Level of evidence of the literature

1. Upper limb Neural tension tests (ULNT’s)

1.1 Four combined Upper limb Neural tension tests (ULNT’s)

1.1.1 The level of evidence regarding the outcome measure sensitivity started as “high” and was downgraded by two levels to “low”. Since the impact on the risk of bias of the inappropriate time between reference and index test (Apelby-Albrecht, 2013) was estimated not too high, and the high quality of the study by Grondin (2021), there was no downgrading for risk of bias. The number of included patients however was low (Apelby-Albrecht, 2013) (-1, imprecision) and the reported prevalence of both studies was not consistent (-1, inconsistency).

1.1.2 The level of evidence regarding the outcome measures specificity and PPV started as “high” and was downgraded by four levels to “very low”. Since the impact on the risk of bias of the inappropriate time between reference and index test (Apelby-Albrecht, 2013) was estimated not too high, and the high quality of the study by Grondin (2021), there was no downgrading for risk of bias. A low number of included patients and confidence intervals crossing the borders of clinical relevance (Apelby-Albrecht, 2013) (-2, imprecision) and strong inconsistency without 95% CI’s overlapping and the reported prevalence of both studies was not consistent (-1, inconsistency) (-2, inconsistency),

1.1.3 The level of evidence regarding the outcome measures negative predictive value started as “high” and was downgraded by two levels to “very low” because of a low number of included patients and confidence intervals crossing the borders of clinical relevance in both studies (-2, imprecision), and the reported prevalence of both studies was not consistent (-1, inconsistency).

1.2 ULNT1 median alone

1.2.1 The level of evidence regarding the outcome measures sensitivity, PPV and NPV started as high and was downgraded by four levels to very low because of inconsistency (-1, inconsistency) and confidence intervals crossing the borders of clinical relevance (Apelby-Albrecht, 2013; Grondin, 2021; Sleijser-Koehorst, 2021) (-2, imprecision), and the reported prevalence of Apelby-Albrecht (2013) and Grondin (2021) was not consistent (-1, inconsistency). Since the impact on the risk of bias of the inappropriate time between reference and index test (Apelby-Albrecht, 2013) was estimated not too high, and the high quality of the study by Grondin (2021) and Sleijser-Koehorst (2021), there was no downgrading for risk of bias.

1.2.2 The level of evidence regarding the outcome measure specificity started as high and was downgraded by three levels to very low because of a low number of included patients and confidence intervals crossing the borders of clinical relevance (Apelby-Albrecht, 2013, Grondin, 2021 and Sleijser-Koehorst (2021)) (-2, imprecision), and the reported prevalence of Apelby-Albrecht (2013) and Grondin (2021) was not consistent (-1, inconsistency).. Since the impact on the risk of bias of the inappropriate time between reference and index test (Apelby-Albrecht, 2013) was estimated not too high, and the high quality of the study by Grondin (2021) and Sleijser-Koehorst (2021), there was no downgrading for risk of bias.

1.2.3 The level of evidence regarding the outcome measures negative predictive value and positive predictive value started as high and was downgraded by three levels to very low because of a low number of included patients and confidence intervals crossing the borders of clinical relevance in both studies (-2, imprecision), and the reported prevalence of Apelby-Albrecht (2013) and Grondin (2021) was not consistent (-1, inconsistency).

2. Arm squeeze test

The level of evidence regarding the outcome measures sensitivity, specificity, negative predictive value and positive predictive value started at high and was downgraded by two levels to “low” because the sample had a case-control character (-2, risk of bias).

3. Spurling’s test

3.1 The level of evidence regarding the outcome measure sensitivity started as high and was downgraded by four levels to very low because of questionable overall risk of bias in Shabat (2012), using different reference tests (Shabat, 2012; Shah, 2004, Viikari-Juntura, 1989) and retrospective inclusion in Park (2017) (-2, risk of bias), strong inconsistency without 95% CI’s overlapping between Viikari-Juntura (1989) and Shabat (2012) and the reported prevalences were not consistent (-2, inconsistency) and broad confidence intervals (Shah, 2004; Sleijser-Koehorst, 2021) (-1, imprecision).

3.2 The level of evidence regarding the outcome measure specificity started as high and was downgraded by three levels to very low because of questionable overall risk of bias in Shabat (2012), using different reference tests (Shabat, 2012; Shah, 2004, Viikari-Juntura (1989)) and retrospective inclusion in Park (2017) (-2, risk of bias), confidence intervals crossing the borders of clinical relevance (Sleijser-Koehorst, 2021), and the reported prevalences were not consistent (-1, inconsistency).

3.3 The level of evidence regarding the outcome measure positive predictive value started at high and was downgraded by four levels to very low because of questionable overall risk of bias in Shabat (2012) and using different reference tests (Shabat, 2012; Shah, 2004, Viikari-Juntura (1989)) (-2, risk of bias) and broad confidence intervals crossing borders of clinical relevance (Viikari-Juntura, 1989; Sleijser-Koehorst, 2021) (-1, imprecision), and the reported prevalences were not consistent (-1, inconsistency).

3.4 The level of evidence regarding the outcome measure negative predictive value started as high and was downgraded by four levels to very low because of questionable overall risk of bias in Shabat (2012) and using different reference tests (Shabat, 2012; Shah, 2004, Viikari-Juntura (1989)) and retrospective inclusion in Park (2017) (-2, risk of bias) and strong inconsistency between all studies without 95% CI’s overlapping between Shah (2004) and Shabat (2012) and the reported prevalences were not consistent (-2, inconsistency).

4. Traction

4.1 Sensitivity

The level of evidence regarding the outcome measure sensitivity started as high and was downgraded by three levels to very low because of inappropriate exclusion criteria, and not all included patients received the same reference standard or index test, with no study of higher quality to compensate (-2, risk of bias) and a low number of included patients (-1, imprecision).

4.2 Specificity, PPV and NPV

The level of evidence regarding the outcome measures specificity, PPV and NPV started as high and was downgraded by three levels to very low because of inappropriate exclusion criteria, and not all included patients received the same reference standard or index test with no study of higher quality to compensate (-2, risk of bias) and crossing borders of clinical relevance (-1, imprecision).

5. Shoulder abduction test

5.1 Sensitivity, NPV

The level of evidence regarding the outcome measures sensitivity and negative predictive value started as high and was downgraded by two levels to low because of inappropriate exclusion criteria, and not all included patients received the same reference standard or index test (Viikari-Juntura, 1989) (-1, risk of bias) and crossing borders of clinical relevance (-1, imprecision).

5.2 Specificity, PPV

The level of evidence regarding the outcome measures specificity and positive predictive value started as high and was downgraded by three levels to very low because of inappropriate exclusion criteria, and not all included patients received the same reference standard or index test (Viikari-Juntura, 1989) (-1, risk of bias), conflicting results (-1, inconsistency) and crossing borders of clinical relevance (-1, imprecision).

6. Neck tornado test (Choi’s test)

The level of evidence regarding the outcome measures sensitivity, specificity, negative predictive value and positive predictive value started as high and was downgraded by three levels to very low because risk of selection bias could not be ruled out due to the retrospective design of Park (2017), with no study of higher quality to compensate (-2, risk of bias) and crossing borders of clinical relevance (-1, imprecision).

Zoeken en selecteren

A systematic review of the literature was performed to answer the following question: What is the diagnostic accuracy of diagnostic tests during physical examination for identifying cervical radiculopathy?

P: Patients who were suspected of having cervical radiculopathy

I: Diagnostic physical tests during physical examination for identifying cervical radiculopathy

C: Not applicable

R: (1) Diagnostic imaging magnetic resonance imaging (MRI) or computed tomography (CT) myelography, or (2) findings during surgery

O: Sensitivity, positive predictive value, specificity, negative predictive value

Timing and setting: Diagnostic trajectory in secondary care

Relevant outcome measures

The guideline development group considered (high) sensitivity and (high) negative predictive value as critical outcome measures for decision making; and (high) specificity and (high) positive predictive value as important outcome measures for decision making.

The working group defined values for sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) ≥ 0.80 as high; 0.60–0.79 as moderate and <0.60 as low, conform cut-off values presented by Sleijser-Koehorst (2021).

Search and select (Methods)

The databases Medline (via OVID) and Embase (via Embase.com) were searched with relevant search terms until March 23. The detailed search strategy is depicted under the tab Methods. The systematic literature search resulted in 366 hits. Studies were selected based on the following criteria:

Systematic review (searched in at least two databases, and detailed search strategy, risk of bias assessment and results of individual studies available), randomized controlled trial or observational study comparing diagnostic test during physical examination with a reference test (diagnostic imaging magnetic resonance imaging (MRI) or computed tomography (CT) myelography, or (2) findings during surgery) resulting in diagnostic accuracy measures;
Patients aged ≥ 18 years;
Full-text English or Dutch language publication
Studies including ≥ 20 patients (ten in each study arm); and
Studies according to PICRO and setting

Initially, seven studies were selected based on title and abstract screening. After reading the full text, three studies were excluded (see the Table with reasons for exclusion under the tab Methods) and four studies were included.

Results

Four studies were included in the analysis of the literature. Important study characteristics and results are summarized in the evidence Tables. The assessment of the risk of bias is summarized in the risk of bias Tables.

Referenties

Apelby-Albrecht M, Andersson L, Kleiva IW, Kvåle K, Skillgate E, Josephson A. Concordance of upper limb neurodynamic tests with medical examination and magnetic resonance imaging in patients with cervical radiculopathy: a diagnostic cohort study. J Manipulative Physiol Ther. 2013 Nov-Dec;36(9):626-32. doi: 10.1016/j.jmpt.2013.07.007. Epub 2013 Oct 23. PMID: 24161389.
Grondin F, Cook C, Hall T, Maillard O, Perdrix Y, Freppel S. Diagnostic accuracy of upper limb neurodynamic tests in the diagnosis of cervical radiculopathy. Musculoskelet Sci Pract. 2021 Oct;55:102427. doi: 10.1016/j.msksp.2021.102427. Epub 2021 Jul 8. PMID: 34298491.
Gumina S, Carbone S, Albino P, Gurzi M, Postacchini F. Arm Squeeze Test: a new clinical test to distinguish neck from shoulder pain. Eur Spine J. 2013 Jul;22(7):1558-63. doi: 10.1007/s00586-013-2788-3. Epub 2013 Apr 21. PMID: 23604976; PMCID: PMC3698345.
Park J, Park WY, Hong S, An J, Koh JC, Lee YW, Kim YC, Choi JB. Diagnostic Accuracy of the Neck Tornado Test as a New Screening Test in Cervical Radiculopathy. Int J Med Sci. 2017 Jun 23;14(7):662-667. doi: 10.7150/ijms.19110. PMID: 28824298; PMCID: PMC5562117.
Shabat S, Leitner Y, David R, Folman Y. The correlation between Spurling test and imaging studies in detecting cervical radiculopathy. J Neuroimaging. 2012 Oct;22(4):375-8. doi: 10.1111/j.1552-6569.2011.00644.x. Epub 2011 Sep 1. PMID: 21883627.
Shah KC, Rajshekhar V. Reliability of diagnosis of soft cervical disc prolapse using Spurling's test. Br J Neurosurg. 2004 Oct;18(5):480-3. doi: 10.1080/02688690400012350. PMID: 15799149.
Sleijser-Koehorst MLS, Coppieters MW, Epping R, Rooker S, Verhagen AP, Scholten-Peeters GGM. Diagnostic accuracy of patient interview items and clinical tests for cervical radiculopathy. Physiotherapy. 2021 Jun;111:74-82. doi: 10.1016/j.physio.2020.07.007. Epub 2020 Jul 28. PMID: 33309074.
Thoomes EJ, van Geest S, van der Windt DA, Falla D, Verhagen AP, Koes BW, Thoomes-de Graaf M, Kuijper B, Scholten-Peeters WGM, Vleggeert-Lankamp CL. Value of physical tests in diagnosing cervical radiculopathy: a systematic review. Spine J. 2018 Jan;18(1):179-189. doi: 10.1016/j.spinee.2017.08.241. Epub 2017 Aug 31. PMID: 28838857.
Viikari-Juntura E, Porras M, Laasonen EM. Validity of clinical tests in the diagnosis of root compression in cervical disc disease. Spine (Phila Pa 1976). 1989 Mar;14(3):253-7. doi: 10.1097/00007632-198903000-00003. PMID: 2711240.

Evidence tabellen

Risk of bias assessment diagnostic accuracy studies (QUADAS II, 2011)

Study reference	Patient selection	Index test	Reference standard	Flow and timing	Comments with respect to applicability
Grondin, 2021	Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes	Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes	Is the reference standard likely to correctly classify the target condition? Yes Were the reference standard results interpreted without knowledge of the results of the index test? Yes	Was there an appropriate interval between index test(s) and reference standard? Yes Did all patients receive a reference standard? Yes Did patients receive the same reference standard? Yes Were all patients included in the analysis? Yes	Are there concerns that the included patients do not match the review question? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? No Are there concerns that the target condition as defined by the reference standard does not match the review question? No
CONCLUSION: Could the selection of patients have introduced bias? RISK: LOW	CONCLUSION: Could the conduct or interpretation of the index test have introduced bias? RISK: LOW	CONCLUSION: Could the reference standard, its conduct, or its interpretation have introduced bias? RISK: LOW	CONCLUSION Could the patient flow have introduced bias? RISK: LOW
Park, 2017	Was a consecutive or random sample of patients enrolled? No Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Unclear	Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear	Is the reference standard likely to correctly classify the target condition? Unclear Were the reference standard results interpreted without knowledge of the results of the index test? Unclear	Was there an appropriate interval between index test(s) and reference standard? Unclear Did all patients receive a reference standard? Yes Did patients receive the same reference standard? Yes Were all patients included in the analysis? Yes	Are there concerns that the included patients do not match the review question? Yes, since the retrospective design selection bias cannot be ruled out. Are there concerns that the index test, its conduct, or interpretation differ from the review question? Yes/No/Unclear Are there concerns that the target condition as defined by the reference standard does not match the review question? No
CONCLUSION: Could the selection of patients have introduced bias? RISK: HIGH	CONCLUSION: Could the conduct or interpretation of the index test have introduced bias? RISK: UNCLEAR	CONCLUSION: Could the reference standard, its conduct, or its interpretation have introduced bias? RISK: UNCLEAR	CONCLUSION Could the patient flow have introduced bias? RISK: LOW
Sleijsler-Koehorst, 2021	Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes	Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Yes (see abstracts)	Is the reference standard likely to correctly classify the target condition? Yes Were the reference standard results interpreted without knowledge of the results of the index test? Yes	Was there an appropriate interval between index test(s) and reference standard? Yes Did all patients receive a reference standard? Yes Did patients receive the same reference standard? Yes Were all patients included in the analysis? Yes, apart from minimal amount of missings	Are there concerns that the included patients do not match the review question? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? No Are there concerns that the target condition as defined by the reference standard does not match the review question? No/Unclear
CONCLUSION: Could the selection of patients have introduced bias? RISK: LOW	CONCLUSION: Could the conduct or interpretation of the index test have introduced bias? RISK: LOW	CONCLUSION: Could the reference standard, its conduct, or its interpretation have introduced bias? RISK: LOW	CONCLUSION Could the patient flow have introduced bias? RISK: LOW

Study reference

Patient selection

Index test

Reference standard

Flow and timing

Comments with respect to applicability

Grondin, 2021

Was a consecutive or random sample of patients enrolled?

Yes

Was a case-control design avoided?

Yes

Did the study avoid inappropriate exclusions?

Yes

Were the index test results interpreted without knowledge of the results of the reference standard?

Yes

If a threshold was used, was it pre-specified?

Yes

Is the reference standard likely to correctly classify the target condition?

Yes

Were the reference standard results interpreted without knowledge of the results of the index test?

Yes

Was there an appropriate interval between index test(s) and reference standard?

Yes

Did all patients receive a reference standard?

Yes

Did patients receive the same reference standard?

Yes

Were all patients included in the analysis?

Yes

Are there concerns that the included patients do not match the review question?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

CONCLUSION:

Could the selection of patients have introduced bias?

RISK: LOW

CONCLUSION:

Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW

CONCLUSION:

Could the reference standard, its conduct, or its interpretation have introduced bias?

RISK: LOW

CONCLUSION

Could the patient flow have introduced bias?

RISK: LOW

Park, 2017

Was a consecutive or random sample of patients enrolled?

Was a case-control design avoided?

Yes

Did the study avoid inappropriate exclusions?

Unclear

Were the index test results interpreted without knowledge of the results of the reference standard?

Unclear

If a threshold was used, was it pre-specified?

Unclear

Is the reference standard likely to correctly classify the target condition?

Unclear

Were the reference standard results interpreted without knowledge of the results of the index test?

Unclear

Was there an appropriate interval between index test(s) and reference standard?

Unclear

Did all patients receive a reference standard?

Yes

Did patients receive the same reference standard?

Yes

Were all patients included in the analysis?

Yes

Are there concerns that the included patients do not match the review question?

Yes, since the retrospective design selection bias cannot be ruled out.

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Yes/No/Unclear

Are there concerns that the target condition as defined by the reference standard does not match the review question?

CONCLUSION:

Could the selection of patients have introduced bias?

RISK: HIGH

CONCLUSION:

Could the conduct or interpretation of the index test have introduced bias?

RISK: UNCLEAR

CONCLUSION:

Could the reference standard, its conduct, or its interpretation have introduced bias?

RISK: UNCLEAR

CONCLUSION

Could the patient flow have introduced bias?

RISK: LOW

Sleijsler-Koehorst, 2021

Was a consecutive or random sample of patients enrolled?

Yes

Was a case-control design avoided?

Yes

Did the study avoid inappropriate exclusions?

Yes

Were the index test results interpreted without knowledge of the results of the reference standard?

Yes

If a threshold was used, was it pre-specified?

Yes (see abstracts)

Is the reference standard likely to correctly classify the target condition?

Yes

Were the reference standard results interpreted without knowledge of the results of the index test?

Yes

Was there an appropriate interval between index test(s) and reference standard?

Yes

Did all patients receive a reference standard?

Yes

Did patients receive the same reference standard?

Yes

Were all patients included in the analysis?

Yes, apart from minimal amount of missings

Are there concerns that the included patients do not match the review question?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

No/Unclear

CONCLUSION:

Could the selection of patients have introduced bias?

RISK: LOW

CONCLUSION:

Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW

CONCLUSION:

Could the reference standard, its conduct, or its interpretation have introduced bias?

RISK: LOW

CONCLUSION

Could the patient flow have introduced bias?

RISK: LOW

Judgments on risk of bias are dependent on the research question: some items are more likely to introduce bias than others, and may be given more weight in the final conclusion on the overall risk of bias per domain:

Patient selection:

Consecutive or random sample has a low risk to introduce bias.
A case control design is very likely to overestimate accuracy and thus introduce bias.
Inappropriate exclusion is likely to introduce bias.

Index test:

This item is similar to “blinding” in intervention studies. The potential for bias is related to the subjectivity of index test interpretation and the order of testing.
Selecting the test threshold to optimise sensitivity and/or specificity may lead to overoptimistic estimates of test performance and introduce bias.

Reference standard:

When the reference standard is not 100% sensitive and 100% specific, disagreements between the index test and reference standard may be incorrect, which increases the risk of bias.
This item is similar to “blinding” in intervention studies. The potential for bias is related to the subjectivity of index test interpretation and the order of testing.

Flow and timing:

If there is a delay or if treatment is started between index test and reference standard, misclassification may occur due to recovery or deterioration of the condition, which increases the risk of bias.
If the results of the index test influence the decision on whether to perform the reference standard or which reference standard is used, estimated diagnostic accuracy may be biased.
All patients who were recruited into the study should be included in the analysis, if not, the risk of bias is increased.

Judgement on applicability:

Patient selection: there may be concerns regarding applicability if patients included in the study differ from those targeted by the review question, in terms of severity of the target condition, demographic features, presence of differential diagnosis or co-morbidity, setting of the study and previous testing protocols.

Index test: if index tests methods differ from those specified in the review question there may be concerns regarding applicability.

Reference standard: the reference standard may be free of bias but the target condition that it defines may differ from the target condition specified in the review question.

Table of excluded studies

Reference	Reason for exclusion
Seliverstova, 2022	Article in Russian (wrong language)
Mizer, 2017	Evaluated subjective history and self-report rather than physical tests (wrong intervention)
Redebrandt, 2022	No sensitivity/specificity/other diagnostic accuracy outcomes were presented (wrong outcome)

Verantwoording

Autorisatiedatum en geldigheid

Laatst beoordeeld : 01-07-2024

Laatst geautoriseerd : 01-07-2024

Geplande herbeoordeling : 01-07-2027

Initiatief en autorisatie

Initiatief:

Nederlandse Vereniging voor Neurochirurgie

Geautoriseerd door:

Koninklijk Nederlands Genootschap voor Fysiotherapie
Nederlandse Orthopaedische Vereniging
Nederlandse Vereniging voor Anesthesiologie
Nederlandse Vereniging voor Neurochirurgie
Nederlandse Vereniging voor Neurologie
Ergotherapie Nederland
Vereniging van Oefentherapeuten Cesar en Mensendieck
Nederlandse Vereniging voor Manuele Therapie

Samenstelling werkgroep

Voor het ontwikkelen van de richtlijnmodules is in 2022 een multidisciplinaire werkgroep ingesteld, bestaande uit vertegenwoordigers van alle relevante specialismen (zie hiervoor de ‘samenstelling van de werkgroep’) die betrokken zijn bij de zorg voor patiënten met een CRS.

WERKGROEP

Mevr. dr. Carmen Vleggeert-Lankamp (voorzitter), neurochirurg, NVvN
Dhr. dr. Ruben Dammers, neurochirurg, NVvN
Mevr. drs. Martine van Bilsen, neurochirurg, NVvN
Dhr. drs. Maarten Liedorp, neuroloog, NVN
Mevr. drs. Germine Mochel, neuroloog, NVN
Mevr. dr. Akkie Rood, orthopedisch chirurg, NOV
Dhr. dr. Erik Thoomes, fysiotherapeut en manueel therapeut, KNGF/NVMT
Dhr. prof. dr. Jan Van Zundert, hoogleraar Pijngeneeskunde, NVA
Dhr. Leen Voogt, ervaringsdeskundige, Nederlandse Vereniging van Rugpatiënten ‘de Wervelkolom’

KLANKBORDGROEP

Mevr. Elien Nijland, ergotherapeut/handtherapeut, EN
Mevr. Meimei Yau, oefentherapeut, VvOCM

Met ondersteuning van:

Mevr. dr. Charlotte Michels, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Mevr. drs. Beatrix Vogelaar, adviseur, Kennisinstituut van de Federatie Medisch Specialisten

Belangenverklaringen

De Code ter voorkoming van oneigenlijke beïnvloeding door belangenverstrengeling is gevolgd. Alle werkgroepleden hebben schriftelijk verklaard of zij in de laatste drie jaar directe financiële belangen (betrekking bij een commercieel bedrijf, persoonlijke financiële belangen, onderzoeksfinanciering) of indirecte belangen (persoonlijke relaties, reputatiemanagement) hebben gehad. Gedurende de ontwikkeling of herziening van een module worden wijzigingen in belangen aan de voorzitter doorgegeven. De belangenverklaring wordt opnieuw bevestigd tijdens de commentaarfase.

Een overzicht van de belangen van werkgroepleden en het oordeel over het omgaan met eventuele belangen vindt u in onderstaande tabel. De ondertekende belangenverklaringen zijn op te vragen bij het secretariaat van het Kennisinstituut van de Federatie Medisch Specialisten.

*Naam lid werkgroep*	*Hoofdfunctie*	*Nevenwerkzaamheden*	*Gemelde belangen*	Ondernomen actie
*Carmen Vleggeert-Lankamp (voorzitter)*	Neurochirurg, Leiden Universitair Medisch Centrum, Leiden	* Medisch Manager Neurochirurgie Spaarne Gasthuis, Hoofdorp/ Haarlem, gedetacheerd vanuit LUMC (betaald) * Voorzitter Nederlandse Vereniging voor Neurochirurgie (onbetaald) * President Board Cervical Spine Research Society Europe (onbetaald) * Lid commissie Veelbelovende Zorg ZonMw (onbetaald) * Lid Raad van Toezicht Revalidatiecentrum Rijndam (betaald) * Boardmember Eurospine, chair research committee	Niet anders dan onderzoeksleider in projecten naar etiologie van en uitkomsten in het CRS. Co-promotor bij verscheidende trajecten waarbij de winst van een cervicale discusprothese als niet bestaand wordt beschreven. *Spreker op internationale congressen.	Geen actie
*Akkie Rood*	Orthopedisch chirurg, Sint Maartenskliniek, Nijmegen	Lid NOV, DSS, NvA	Geen	Geen actie
*Erik Thoomes*	Fysio-Manueel therapeut / praktijkeigenaar, Fysio-Experts, Hazerswoude	*Promovendus / wetenschappelijk onderzoeker Universiteit van Birmingham, UK,School of Sport, Exercise and Rehabilitation Sciences, College of Life and Environmental Sciences, Centre of Precision Rehabilitation for Spinal Pain (CPR Spine) (onbetaald)	Geen	Geen actie
*Germine Mochel*	Neuroloog, DC klinieken (loondienst)	Lid werkgroep pijn NVN Lid NHV en VNHC	*Dienstverband bij DC klinieken, alwaar behandeling/diagnostiek patiënten CRS	Geen actie
*Jan Van Zundert*	Anesthesioloog-pijnspecialist. Hoogleraar pijngeneeskunde MaastrichtUMC+, Maastricht (0.6 fte). Deze functie omvat het regelen van de klinische praktijk, uitwerken en begeleiden van onderzoeksprojecten, begeleiden van PhD. studenten en onderwijs. *Afdelingshoofd multidisciplinair pijncentrum, Lanaken/Genk, België (0.4 fte). Organisatie van de dienst op klinisch vlak en stimuleren van het klinische onderzoek.	Geen	Geen financiering omtrent projecten die betrekking hebben op cervicaal radiculair lijden (17 jaar geleden op CRS onderwerk gepromoveerd, nadien geen PhD CRS-projecten begeleidt).	Geen actie
*Leen Voogt*	Ervaringsdeskundige CRS. Voorzitter Nederlandse Vereniging van Rugpatiënten 'de Wervelkolom' (NVVR)	Vrijwilligerswerk voor de patiëntenvereniging (onbetaald).	Geen	Geen actie
*Maarten Liedorp*	Neuroloog in loondienst (0.6 fte), ZBC Kliniek Lange Voorhout, Rijswijk	lid oudergeleding MR IKC de Piramide (onbetaald) bestuurslid Waterbuurtvereniging (onbetaald) lid werkgroep Pijn NVN (onbetaald) lid Dutch Spine Society (onbetaald) *lid Ned Ver Neurologie (onbetaald)	Geen	Geen actie
*Martine van Bilsen*	Neurochirurg, Radboudumc, Nijmegen	Geen	Geen	Geen actie
*Ruben Dammers*	Neurochirurg, ErasmusMC, Rotterdam	Geen	Geen	Geen actie
*Naam lid klankbordgroep*	*Hoofdfunctie*	*Nevenwerkzaamheden*	*Gemelde belangen*	Ondernomen actie
*Meimei Yau*	Praktijkhouder Yau Oefentherapeut, Oefentherapeut Mensendieck, Den Haag.	Geen	Kennis opdoen, informatie/expertise uitwisselen met andere disciplines, oefentherapeut vertegenwoordigen. KP register	Geen actie
*Vera Keil*	Radioloog, AmsterdamUMC, Amsterdam. Afgevaardigde NVvR Neurosectie	Geen	Als radioloog heb ik natuurlijk een interesse aan een sterke rol van de beeldvorming.	Geen actie
*Elien Nijland*	Ergotherapeut/hand-ergotherapeut (totaal 27 uur) bij Treant zorggroep (Bethesda Hoogeveen) en Refaja ziekenhuis (Stadskanaal)	Voorzitter Adviesraad Hand-ergotherapie (onbetaald)		Geen actie

Inbreng patiëntenperspectief

Er werd aandacht besteed aan het patiëntenperspectief door een afgevaardigde van de Nederlandse Vereniging van Rugpatiënten ‘de Wervelkolom’ te betrekken in de werkgroep. De verkregen input is meegenomen bij het opstellen van de uitgangsvragen, de keuze voor de uitkomstmaten en bij het opstellen van de overwegingen (zie kop ‘Waarden en voorkeuren van patiënten’). De conceptrichtlijn is tevens voor commentaar voorgelegd aan de Nederlandse Vereniging van Rugpatiënten ‘de Wervelkolom’ en de eventueel aangeleverde commentaren zijn bekeken en verwerkt.

Kwalitatieve raming van mogelijke financiële gevolgen in het kader van de Wkkgz

Bij de richtlijn is conform de Wet kwaliteit, klachten en geschillen zorg (Wkkgz) een kwalitatieve raming uitgevoerd of de aanbevelingen mogelijk leiden tot substantiële financiële gevolgen. Bij het uitvoeren van deze beoordeling zijn richtlijnmodules op verschillende domeinen getoetst (zie het stroomschema op de Richtlijnendatabase).

Uit de kwalitatieve raming blijkt dat er waarschijnlijk geen substantiële financiële gevolgen zijn, zie onderstaande tabel.

Module	Uitkomst raming	Toelichting
Diagnostiek – nieuw ontwikkelde module	Geen financiële gevolgen	Hoewel uit de toetsing volgt dat de aanbeveling(en) breed toepasbaar zijn (5.000-40.000 patiënten), volgt ook uit de toetsing dat het overgrote deel (±90%) van de zorgaanbieders en zorgverleners al aan de norm voldoet en het geen nieuwe manier van zorgverlening of andere organisatie van zorgverlening betreft. Er worden daarom geen financiële gevolgen verwacht.

Werkwijze

AGREE

Deze richtlijnmodule is opgesteld conform de eisen vermeld in het rapport Medisch Specialistische Richtlijnen 2.0 van de adviescommissie Richtlijnen van de Raad Kwaliteit. Dit rapport is gebaseerd op het AGREE II instrument (Appraisal of Guidelines for Research & Evaluation II; Brouwers, 2010).

Knelpuntenanalyse en uitgangsvragen

Tijdens de voorbereidende fase inventariseerde de werkgroep de knelpunten in de zorg voor patiënten met CRS. Tevens zijn er knelpunten aangedragen door Ergotherapie Nederland, het Nederlands Huisartsen Genootschap, Nederlandse Vereniging van Ziekenhuizen, Nederlandse Vereniging van Revalidatieartsen, Vereniging van Oefentherapeuten Cesar en Mensendieck, Zorginstituut Nederland, Zelfstandige Klinieken Nederland, via enquête. Op basis van de uitkomsten van de knelpuntenanalyse zijn door de werkgroep concept-uitgangsvragen opgesteld en definitief vastgesteld.

Uitkomstmaten

Na het opstellen van de zoekvraag behorende bij de uitgangsvraag inventariseerde de werkgroep welke uitkomstmaten voor de patiënt relevant zijn, waarbij zowel naar gewenste als ongewenste effecten werd gekeken. Hierbij werd een maximum van acht uitkomstmaten gehanteerd. De werkgroep waardeerde deze uitkomstmaten volgens hun relatieve belang bij de besluitvorming rondom aanbevelingen, als cruciaal (kritiek voor de besluitvorming), belangrijk (maar niet cruciaal) en onbelangrijk. Tevens definieerde de werkgroep tenminste voor de cruciale uitkomstmaten welke verschillen zij klinisch (patiënt) relevant vonden.

Methode literatuursamenvatting

Een uitgebreide beschrijving van de strategie voor zoeken en selecteren van literatuur is te vinden onder ‘Zoeken en selecteren’ onder Onderbouwing. Indien mogelijk werd de data uit verschillende studies gepoold in een random-effects model. Review Manager 5.4 werd gebruikt voor de statistische analyses. De beoordeling van de kracht van het wetenschappelijke bewijs wordt hieronder toegelicht.

Beoordelen van de kracht van het wetenschappelijke bewijs

De kracht van het wetenschappelijke bewijs werd bepaald volgens de GRADE-methode. GRADE staat voor ‘Grading Recommendations Assessment, Development and Evaluation’ (zie http://www.gradeworkinggroup.org/). De basisprincipes van de GRADE-methodiek zijn: het benoemen en prioriteren van de klinisch (patiënt) relevante uitkomstmaten, een systematische review per uitkomstmaat, en een beoordeling van de bewijskracht per uitkomstmaat op basis van de acht GRADE-domeinen (domeinen voor downgraden: risk of bias, inconsistentie, indirectheid, imprecisie, en publicatiebias; domeinen voor upgraden: dosis-effect relatie, groot effect, en residuele plausibele confounding).

GRADE onderscheidt vier gradaties voor de kwaliteit van het wetenschappelijk bewijs: hoog, redelijk, laag en zeer laag. Deze gradaties verwijzen naar de mate van zekerheid die er bestaat over de literatuurconclusie, in het bijzonder de mate van zekerheid dat de literatuurconclusie de aanbeveling adequaat ondersteunt (Schünemann, 2013; Hultcrantz, 2017).

GRADE	Definitie
Hoog	er is hoge zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is zeer onwaarschijnlijk dat de literatuurconclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Redelijk	er is redelijke zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is mogelijk dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Laag	er is lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; er is een reële kans dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Zeer laag	er is zeer lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; de literatuurconclusie is zeer onzeker.

Bij het beoordelen (graderen) van de kracht van het wetenschappelijk bewijs in richtlijnen volgens de GRADE-methodiek spelen grenzen voor klinische besluitvorming een belangrijke rol (Hultcrantz, 2017). Dit zijn de grenzen die bij overschrijding aanleiding zouden geven tot een aanpassing van de aanbeveling. Om de grenzen voor klinische besluitvorming te bepalen moeten alle relevante uitkomstmaten en overwegingen worden meegewogen. De grenzen voor klinische besluitvorming zijn daarmee niet één op één vergelijkbaar met het minimaal klinisch relevant verschil (Minimal Clinically Important Difference, MCID). Met name in situaties waarin een interventie geen belangrijke nadelen heeft en de kosten relatief laag zijn, kan de grens voor klinische besluitvorming met betrekking tot de effectiviteit van de interventie bij een lagere waarde (dichter bij het nuleffect) liggen dan de MCID (Hultcrantz, 2017).

Overwegingen (van bewijs naar aanbeveling)

Om te komen tot een aanbeveling zijn naast (de kwaliteit van) het wetenschappelijke bewijs ook andere aspecten belangrijk en worden meegewogen, zoals aanvullende argumenten uit bijvoorbeeld de biomechanica of fysiologie, waarden en voorkeuren van patiënten, kosten (middelenbeslag), aanvaardbaarheid, haalbaarheid en implementatie. Deze aspecten zijn systematisch vermeld en beoordeeld (gewogen) onder het kopje ‘Overwegingen’ en kunnen (mede) gebaseerd zijn op expert opinion. Hierbij is gebruik gemaakt van een gestructureerd format gebaseerd op het evidence-to-decision framework van de internationale GRADE Working Group (Alonso-Coello, 2016a; Alonso-Coello 2016b). Dit evidence-to-decision framework is een integraal onderdeel van de GRADE methodiek.

Formuleren van aanbevelingen

De aanbevelingen geven antwoord op de uitgangsvraag en zijn gebaseerd op het beschikbare wetenschappelijke bewijs en de belangrijkste overwegingen, en een weging van de gunstige en ongunstige effecten van de relevante interventies. De kracht van het wetenschappelijk bewijs en het gewicht dat door de werkgroep wordt toegekend aan de overwegingen, bepalen samen de sterkte van de aanbeveling. Conform de GRADE-methodiek sluit een lage bewijskracht van conclusies in de systematische literatuuranalyse een sterke aanbeveling niet a priori uit, en zijn bij een hoge bewijskracht ook zwakke aanbevelingen mogelijk (Agoritsas, 2017; Neumann, 2016). De sterkte van de aanbeveling wordt altijd bepaald door weging van alle relevante argumenten tezamen. De werkgroep heeft bij elke aanbeveling opgenomen hoe zij tot de richting en sterkte van de aanbeveling zijn gekomen.

In de GRADE-methodiek wordt onderscheid gemaakt tussen sterke en zwakke (of conditionele) aanbevelingen. De sterkte van een aanbeveling verwijst naar de mate van zekerheid dat de voordelen van de interventie opwegen tegen de nadelen (of vice versa), gezien over het hele spectrum van patiënten waarvoor de aanbeveling is bedoeld. De sterkte van een aanbeveling heeft duidelijke implicaties voor patiënten, behandelaars en beleidsmakers (zie onderstaande tabel). Een aanbeveling is geen dictaat, zelfs een sterke aanbeveling gebaseerd op bewijs van hoge kwaliteit (GRADE gradering HOOG) zal niet altijd van toepassing zijn, onder alle mogelijke omstandigheden en voor elke individuele patiënt.

Implicaties van sterke en zwakke aanbevelingen voor verschillende richtlijngebruikers
	Sterke aanbeveling	Zwakke (conditionele) aanbeveling
Voor patiënten	De meeste patiënten zouden de aanbevolen interventie of aanpak kiezen en slechts een klein aantal niet.	Een aanzienlijk deel van de patiënten zouden de aanbevolen interventie of aanpak kiezen, maar veel patiënten ook niet.
Voor behandelaars	De meeste patiënten zouden de aanbevolen interventie of aanpak moeten ontvangen.	Er zijn meerdere geschikte interventies of aanpakken. De patiënt moet worden ondersteund bij de keuze voor de interventie of aanpak die het beste aansluit bij zijn of haar waarden en voorkeuren.
Voor beleidsmakers	De aanbevolen interventie of aanpak kan worden gezien als standaardbeleid.	Beleidsbepaling vereist uitvoerige discussie met betrokkenheid van veel stakeholders. Er is een grotere kans op lokale beleidsverschillen.

Organisatie van zorg

In de knelpuntenanalyse en bij de ontwikkeling van de richtlijnmodule is expliciet aandacht geweest voor de organisatie van zorg: alle aspecten die randvoorwaardelijk zijn voor het verlenen van zorg (zoals coördinatie, communicatie, (financiële) middelen, mankracht en infrastructuur). Randvoorwaarden die relevant zijn voor het beantwoorden van deze specifieke uitgangsvraag zijn genoemd bij de overwegingen. Meer algemene, overkoepelende, of bijkomende aspecten van de organisatie van zorg worden behandeld in de module Organisatie van zorg.

Commentaar- en autorisatiefase

De conceptrichtlijnmodule werd aan de betrokken (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd ter commentaar. De commentaren werden verzameld en besproken met de werkgroep. Naar aanleiding van de commentaren werd de conceptrichtlijnmodule aangepast en definitief vastgesteld door de werkgroep. De definitieve richtlijnmodule werd aan de deelnemende (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd voor autorisatie en door hen geautoriseerd dan wel geaccordeerd.

Literatuur

Agoritsas T, Merglen A, Heen AF, Kristiansen A, Neumann I, Brito JP, Brignardello-Petersen R, Alexander PE, Rind DM, Vandvik PO, Guyatt GH. UpToDate adherence to GRADE criteria for strong recommendations: an analytical survey. BMJ Open. 2017 Nov 16;7(11):e018593. doi: 10.1136/bmjopen-2017-018593. PubMed PMID: 29150475; PubMed Central PMCID: PMC5701989.

Alonso-Coello P, Schünemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Rada G, Rosenbaum S, Morelli A, Guyatt GH, Oxman AD; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ. 2016 Jun 28;353:i2016. doi: 10.1136/bmj.i2016. PubMed PMID: 27353417.

Alonso-Coello P, Oxman AD, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Vandvik PO, Meerpohl J, Guyatt GH, Schünemann HJ; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines. BMJ. 2016 Jun 30;353:i2089. doi: 10.1136/bmj.i2089. PubMed PMID: 27365494.

Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, Fervers B, Graham ID, Grimshaw J, Hanna SE, Littlejohns P, Makarski J, Zitzelsberger L; AGREE Next Steps Consortium. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010 Dec 14;182(18):E839-42. doi: 10.1503/cmaj.090449. Epub 2010 Jul 5. Review. PubMed PMID: 20603348; PubMed Central PMCID: PMC3001530.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, Alper BS, Meerpohl JJ, Murad MH, Ansari MT, Katikireddi SV, Östlund P, Tranæus S, Christensen R, Gartlehner G, Brozek J, Izcovich A, Schünemann H, Guyatt G. The GRADE Working Group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017 Jul;87:4-13. doi: 10.1016/j.jclinepi.2017.05.006. Epub 2017 May 18. PubMed PMID: 28529184; PubMed Central PMCID: PMC6542664.

Medisch Specialistische Richtlijnen 2.0 (2012). Adviescommissie Richtlijnen van de Raad Kwalitieit. http://richtlijnendatabase.nl/over_deze_site/over_richtlijnontwikkeling.html

Neumann I, Santesso N, Akl EA, Rind DM, Vandvik PO, Alonso-Coello P, Agoritsas T, Mustafa RA, Alexander PE, Schünemann H, Guyatt GH. A guide for health professionals to interpret and use recommendations in guidelines developed with the GRADE approach. J Clin Epidemiol. 2016 Apr;72:45-55. doi: 10.1016/j.jclinepi.2015.11.017. Epub 2016 Jan 6. Review. PubMed PMID: 26772609.

NHG, 2018. NHG-Standaard Pijn (M106). Published: juni 2018. Laatste aanpassing: Laatste aanpassing: september 2023. Link: https://richtlijnen.nhg.org/standaarden/pijn https://richtlijnen.nhg.org/standaarden/pijn

NVN, 2020. Richtlijn Lumbosacraal Radiculair Syndroom (LRS). Beoordeeld: 21-09-2020. Link: https://richtlijnendatabase.nl/richtlijn/lumbosacraal_radiculair_syndroom_lrs/startpagina_-_lrs.html

Radhakrishnan K, Litchy WJ, O'Fallon WM, Kurland LT. Epidemiology of cervical radiculopathy. A population-based study from Rochester, Minnesota, 1976 through 1990. Brain. 1994 Apr;117 ( Pt 2):325-35. doi: 10.1093/brain/117.2.325. PMID: 8186959.

Schünemann H, Brożek J, Guyatt G, et al. GRADE handbook for grading quality of evidence and strength of recommendations. Updated October 2013. The GRADE Working Group, 2013. Available from http://gdt.guidelinedevelopment.org/central_prod/_design/client/handbook/handbook.html.

Zoekverantwoording

Zoekacties zijn opvraagbaar. Neem hiervoor contact op met de Richtlijnendatabase.

Richtlijnendatabase

Cervicaal Radiculair Syndroom

Cervicaal Radiculair Syndroom

Diagnostiek: provocatietesten

Uitgangsvraag

Aanbeveling

Overwegingen

Onderbouwing

Achtergrond

Conclusies

Samenvatting literatuur

Zoeken en selecteren

Referenties

Evidence tabellen

Verantwoording

Autorisatiedatum en geldigheid

Initiatief en autorisatie

Samenstelling werkgroep

Belangenverklaringen

Inbreng patiëntenperspectief

Module

Werkwijze

Zoekverantwoording

Bijlagen