Evaluatie handvaardigheid

Beoordeeld: 09-08-2024

Uitgangsvraag

Welke vragenlijsten op niveau van vaardigheid (ability, wat kan het kind) en uitvoering (performance, wat doet het kind) in het dagelijks leven zijn het meest geschikt voor het longitudinaal evalueren van het gebruik van armen en handen gerelateerd aan het dagelijks functioneren, ter aanvulling op de Assisting Hand Assessment (AHA: Small-Kids AHA, School-Kids AHA and Ad-AHA), bij kinderen met een unilaterale CP?

Aanbeveling

Overweeg het gebruik van de CHEQ2.0 in aanvulling op de AHA (en daabij de versie die van toepassing is op de leeftijd van het kind) voor het evalueren van het gebruik van armen en handen gerelateerd aan het dagelijks functioneren bij kinderen met unilaterale CP in de leeftijd van 6-18 jaar.

Overwegingen

Voor- en nadelen van de interventie en de kwaliteit van het bewijs

In de literatuursamenvatting zijn alleen studies meegenomen voor de Children’s Hand-use Experience Questionnaire (CHEQ/CHEQ2.0) (Sköld, 2011; Amer, 2016; Ryll, 2019). De CHEQ2.0 is een online verkorte vragenlijst gebaseerd op de CHEQ. Voor de overige twee vragenlijsten (Hand Use at Home Questionnaire [HUH] en Abilhand-Kids) zijn geen relevante data beschikbaar wat betreft de responsiviteit. Voor de CHEQ/CHEQ2.0 zijn de resultaten wat betreft validiteit, betrouwbaarheid en responsiviteit samengevat en is de bewijskracht beoordeeld. In de literatuursamenvatting is zoveel mogelijk de werkwijze van de COSMIN-groep gevolgd. Deze groep geeft ook aanwijzingen hoe te bepalen welk instrument kan worden aanbevolen voor gebruik in de klinische praktijk (Mokkink, 2018). Hiervoor kunnen de beschikbare instrumenten in drie categorieën worden ingedeeld:

Instrumenten met bewijs (elk niveau) voor voldoende inhoudsvaliditeit EN tenminste lage bewijskracht voor voldoende interne consistentie. Deze instrumenten kunnen worden aanbevolen voor gebruik in de klinische praktijk.
Instrumenten die niet in categorie A of C kunnen worden ingedeeld. Deze instrumenten hebben wel potentieel om te worden aanbevolen maar verder onderzoek naar de meeteigenschappen is noodzakelijk.
Instrumenten met hoge bewijskracht dat een of meerdere meeteigenschappen onvoldoende zijn gebleken. Deze instrumenten zouden niet aanbevolen moeten worden voor gebruik in de klinische praktijk.

Indien alleen instrumenten in categorie ‘B’ beschikbaar zijn kan het instrument met het beste bewijs voor inhoudsvaliditeit voorlopig worden aanbevolen totdat meer bewijs beschikbaar komt.

Hieronder worden de resultaten uit de literatuursamenvatting langs deze criteria voor inhoudsvaliditeit en interne consistentie gelegd.

Wat betreft inhoudsvaliditeit, in de studie van Sköld (2011) werd gerapporteerd dat de activiteiten in de CHEQ2.0 in veel gevallen zelfstandig werden uitgevoerd, en met beide handen werden uitgevoerd. Er waren alleen kwantitatieve data beschikbaar, de bewijskracht was redelijk.

Wat betreft interne consistentie waren er niet voldoende data beschikbaar om deze langs de criteria (lage bewijskracht voor structural validity en Cronbach’s alpha ≥ 0.70) te leggen. Het is niet duidelijk of de interne consistentie voldoet. Hiermee valt de CHEQ in categorie B.

Voor de overige twee vragenlijsten is geen evidence beschikbaar wat betreft inhoudsvaliditeit. Ook lijkt het onwaarschijnlijk dat er voor één van deze vragenlijsten hoge bewijskracht bestaat dat een of meerdere meeteigenschappen onvoldoende zijn gebleken. Deze vragenlijsten vallen daarmee ook in categorie B. De CHEQ2.0 heeft echter het beste bewijs voor inhoudsvaliditeit en zou daarom voorlopig aanbevolen kunnen worden. Wat betreft responsiviteit wasde CHEQ2.0 de enige vragenlijst waarvoor resultaten waren gerapporteerd die langs de lat van de COSMIN-criteria voor good measurement properties konden worden gelegd en waarvoor dus conclusies konden worden getrokken. De bewijskracht was echter zeer laag wegens een risico op bias en een kleine onderzoekspopulatie.

De CHEQ2.0-vragenlijst heeft inmiddels 2 versies; een voor kinderen van 3-8 jaar, de zogenaamde Mini-CHEQ en een voor kinderen van 6-18 jaar. Mini-CHEQ is ontwikkeld vanuit hetzelfde construct als de CHEQ 2.0. De activiteiten zijn aangepast voor kinderen van 3-8 jaar. Mini-CHEQ bevat 21 activiteiten; vele zijn hetzelfde als in CHEQ2.0, anderen zijn weggelaten of vervangen. Aangezien de Mini-CHEQ helemaal nieuw is en een niet-geteste versie (voor wat betreft de klinimetrische eigenschappen) is van de CHEQ2.0, worden de resultaten gepresenteerd als het gemiddelde van de ruwe scores. De validiteit, betrouwbaarheid en responsiviteit van de Mini-CHEQ moeten nog worden onderzocht.

De CHEQ2.0 geeft ouders aan de hand van een cirkeldiagram ook een beeld van de zelfstandigheid van hun kind en niet alleen van de mogelijkheid om de handen te gebruiken. Dit kan helpen bij het maken van keuzes voor doelen in de behandeling. Daar hebben ouders en kinderen dan mogelijk ook al over nagedacht omdat de vragenlijst voorafgaand aan het bezoek van het revalidatiecentrum wordt ingevuld.

Waarden en voorkeuren van patiënten (en evt. hun verzorgers)

Met behulp van vragenlijsten kan een beeld worden verkregen van hoe de ouders en/of het kind/de jongere zelf vinden dat het uitvoeren van bimanuele vaardigheden in de thuissituatie gaat. Dat is van belang bij de evaluatie van een interventie en vergroot de betrokkenheid en motivatie van ouders en jongeren tijdens het behandeltraject. Voor de ouders en de jongeren kost het natuurlijk wel wat tijd en moeite om de CHEQ2.0 in te vullen. Uitleg over dat het invullen helpt om zelf al beter te weten welke bimanuele activiteiten moeilijk of misschien helemaal niet zelfstandig uitgevoerd kunnen worden, is noodzakelijk. Voor ouders die geschreven taal niet begrijpen zal nagegaan moeten worden wie hen eventueel kan helpen bij het invullen van de vragenlijst. De CHEQ2.0 is inmiddels in veel verschillende talen vertaald. Omdat het invullen tijd en energie kost, is het belangrijk om altijd de ingevulde vragenlijst te betrekken bij gesprekken. Anders vragen ouders, kinderen en jongeren zich al snel af waarom ze de vragenlijst nog invullen.

Kosten (middelenbeslag)

De kosten voor het afnemen van de vragenlijsten zijn heel beperkt. De CHEQ2.0 kan online worden ingevuld, daar zijn geen kosten aan verbonden voor de gebruiker.

De resultaten kunnen door de ouders of jongereworden afgedrukt in een PDF- bestand dat als bijlage in een mail naar de behandelaar kan worden gestuurd.

Aanvaardbaarheid, haalbaarheid en implementatie

Het is goed mogelijk het gebruik van de CHEQ2.0 in de praktijk te implementeren. Er zijn geen kosten aan het gebruik verbonden en voor therapeuten zijn de resultaten van de ingevulde vragenlijst makkelijk door te nemen omdat deze heel overzichtelijk in het PDF-bestand staan weergegeven. Met de CHEQ2.0 is het in de praktijk mogelijk om interventies op een eenvoudige manier te evalueren. Het invullen van een vragenlijst thuis door ouders of de jongere zelf kost immers geen tijd van een behandelaar.

Rationale van de aanbeveling: weging van argumenten voor en tegen de diagnostische procedure

Het gebruik van de CHEQ 2.0 voor kinderen met een unilaterale CP in de leeftijd van 6-18 jaar wordt met enige voorzichtigheid aanbevolen. De CHEQ 2.0 kan als vragenlijst op internet worden ingevuld en als PDF worden opgeslagen. Vervolgens kan het PDF-document naar het revalidatiecentrum of de praktijk worden gemaild waar het kind in behandeling is. Het is belangrijk om altijd de ingevulde vragenlijst te betrekken bij gesprekken omdat het invullen hiervan tijd en energie kost. Anders vragen ouders, kinderen en jongeren zich al snel af waarom ze de vragenlijst nog invullen. Voorzichtigheid is geboden bij interpretatie van de scores van de CHEQ2.0 als je verandering over tijd wilt vastleggen. Immers, de responsiviteit is niet eenduidig vastgesteld.

De CHEQ2.0 is de enige vragenlijst waarvoor conclusies getrokken kunnen worden wat betreft validiteit, betrouwbaarheid en responsiviteit. De CHEQ2.0 heeft wel potentieel om te worden aanbevolen maar verder onderzoek naar de meeteigenschappen is noodzakelijk.
Voor de jongere kinderen is de Mini-CHEQ ontwikkeld vanuit hetzelfde construct als de CHEQ2.0.

Al hoewel de klinimetrische eigenschappen van de Mini-CHEQ nog niet zijn onderzocht, is er op dit moment geen beter alternatief. Een vragenlijst met eenzelfde construct gebruiken zoals dat bij de Mini-CHEQ ten opzichte van de CHEQ2.0 voor de oudere kinderen het geval is, om de behandelingen van jongs af aan te evalueren, verdient dan de voorkeur.

Onderbouwing

Achtergrond

Het vaststellen van verbetering en verslechtering van het gebruik van de armen en handen bij kinderen met CP wordt in de meeste centra in een multidisciplinair handenteam vormgegeven. De testen die worden gebruikt hebben zowel een diagnostisch als evaluatief doel en worden ingezet in de screening of evaluatie van interventies. De focus van het testen ligt op het vaststellen van de hulpvragen en doelen op het gebied van arm-hand activiteiten, en ook de capaciteit, ability en performance van de arm-hand bij het uitvoeren van dagelijkse activiteiten of het vastleggen van arm-hand functies zoals spasticiteit, kracht, sensibiliteit.

Er is geen consensus over hoeveel en welke testen er gebruikt worden om het dagelijkse functioneren met betrekking tot het gebruik van de arm-hand bij CP te evalueren. Adequate kernsets van testen en/of vragenlijsten ter evaluatie voor CP passend bij leeftijd, ernst of soort programma zijn niet ontwikkeld. Tevens worden er bij evaluatie binnen alle domeinen van de ICF-CY nog psychometrisch onvolledig ontwikkelde testen en/of vragenlijsten gebruikt.

Omdat de belangrijkste winst ligt in het met meer zelfstandigheid de dagelijkse activiteiten kunnen uitvoeren in de relevante situaties voor het kind ligt de focus in deze vraag over evaluatie van de handvaardigheid in de thuissituatie op ability (wat kan een kind) en performance (wat doet een kind in de dagelijkse situatie). De AHA geldt als gouden standaard voor het meten van tweehandig functioneren, op het niveau van performance voor de kinderen met een unilaterale spastische CP. Het is een test waarin de assisterende hand wordt beoordeeld. Hierover gaf de richtlijn uit 2006 reeds aanbevelingen. Voor de AHA zijn zowel de betrouwbaarheid, validiteit en responsiviteit onderzocht (Krumlinde-Sundholm, 2007; Urlic, 2009). De focus ligt nu echter op evaluatie van de tweehandige vaardigheid in de thuissituatie en daarmee op het gebruik van vragenlijsten omdat de ouders het kind dagelijks zien en daar dus het meeste zicht op hebben.

Er worden vaak meerdere vragenlijsten gebruikt om de ability en performance in kaart te brengen. Van deze vragenlijsten wordt voor deze richtlijn onderzocht welke klinimetrisch gezien het best ontwikkeld zijn voor evaluatief gebruik bij de kinderen met CP. Met het oog op evaluatie is het van belang na te gaan hoe de responsiviteit van de vragenlijsten is. Daarnaast leggen we de vragenlijsten langs de lat van de twee belangrijkste COSMIN-criteria (inhoudsvaliditeit en interne consistentie). In de PICO vraag zullen daarom behalve de responsiviteit ook validiteit en betrouwbaarheid worden meegenomen.

Conclusies / Summary of Findings

Children’s Hand-use Experience Questionnaire (CHEQ) - responsiveness

Very low level of evidence*

There is very little confidence in the evidence for responsiveness of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP.

Source: Ryll (2019)

Children’s Hand-use Experience Questionnaire (CHEQ) - validity

Moderate level of evidence*

There is moderate confidence in the evidence for content validity (items were commonly performed independently and involved the use of both hands) of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP, upper limb reduction deficiency, or obstetric brachial plexus palsy.

Source: Sköld (2011)

Low level of evidence*

There is limited confidence in the evidence for construct validity (sufficient fit for most but not all items) of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP, upper limb reduction deficiency, or obstetric brachial plexus palsy.

Source: Sköld (2011)

Moderate level of evidence*

There is moderate confidence in the evidence for construct validity (sufficient fit for most but not all items) of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP.

Source: Amer (2016)

No evidence was found regarding the criterion validity of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP.

Source: -

Children’s Hand-use Experience Questionnaire (CHEQ) - reliability

Low level of evidence*

There is limited confidence in the evidence for test-retest reliability (insufficient reliability of the opening questions, sufficient reliability of the three scales) of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP.

Source: Amer (2016)

No evidence was found regarding the internal consistency, inter-rater reliability and intra-rater reliability or measurement error of the CHEQ for measuring daily functioning with regard to arm/hand use in children with CP.

Source: -

* The level of evidence was mostly performed in accordance with COSMIN guidance for systematic reviews of PROMS (Mokkink, 2018). See Table 6 in the attachment for the definitions of these quality levels.

Samenvatting literatuur

Description of studies

When mentioning CHEQ, we also include the shortened version of the CHEQ, the CHEQ 2.0.

Three studies reported on the measurement properties of the CHEQ (Ryll, 2019; Sköld, 2011; Amer, 2016), including one study that reported on responsiveness (Ryll, 2019).

Ryll (2019) assessed responsiveness (validity of change scores) of the CHEQ. The study included 44 children with unilateral CP. Children were eligible if they were between 6 and 18 years old, had unilateral CP with impairment of the upper limb, participated in an intensive bimanual training during a two-week day camp in England, and completed assessments at baseline and immediately after the training.

To assess responsiveness, three analyses were performed. Baseline and post-treatment CHEQ scores were analysed using the Goal Attainment Scale (GAS) as external anchor for change. Spearman rank correlation coefficients were calculated between change scores of the CHEQ scales and the GAS (improved versus not improved). The authors hypothesized that correlations would be between 0.3 and 0.5 for the CHEQ scales ‘grasp efficacy’ and ‘time taken’, and between 0.2 and 0.4 for the CHEQ scale ‘feeling bothered’.

In addition, effect sizes were calculated for each CHEQ scale. The authors hypothesized that effect sizes would be ≥ 0.5 for participants that improved and <0.5 for participants that did not improve according to the GAS.

Furthermore, the area under the curve was calculated to determine the probability of the CHEQ scales to correctly classify children that improved versus children that did not improve.

More information on study characteristics is provided in Table 5 in the attachment. The methodological quality for evaluating responsiveness was assessed to be doubtful (see Table 6.5 in the attachment).

Sköld (2011) described the development of the CHEQ and assessed content validity and construct validity. The study included 86 children and adolescents with unilateral CP (n=31), upper limb reduction deficiency (29), or obstetric brachial plexus palsy (26). Children were eligible if they were between ages 6 and 18, and had a diagnosis of unilateral CP, obstetric brachial plexus palsy, or upper limb reduction deficiency (not using prosthesis). Families were asked to complete the CHEQ, which could be done by either the child or a parent.

To assess content validity, a descriptive analysis was performed on the number of children who performed the 29 activities included in the CHEQ independently, and used both hands while performing these activities.

To assess structural validity, a Rasch analysis was performed to assess the internal structure (unidimensionality) of each of the three scales: grasp efficacy, time taken, and feeling bothered. Unidimensionality was analysed using principal component analysis.

More information on study characteristics is provided in Table 5 in the attachment. The methodological quality for evaluating content validity (by asking patients about relevance) was assessed to be adequate (see Table 6.1 in the attachment), and the methodological quality for evaluating structural validity (unidimensionality) was assessed to be doubtful (see Table 6.2 in the attachment).

Amer (2016) assessed construct validity (structural validity) and test-retest reliability of the internet-based version of the CHEQ. The study included 242 children with unilateral CP. Data were derived from previous studies in which children completed the CHEQ, from children recruited specifically for this study, and from children who completed the CHEQ for clinical purposes. Data were collected in Australia, Israel, Italy, the Netherlands, Sweden and the UK.

To assess structural validity, a Rasch analysis was performed for each scale (grasp efficacy, time taken, and feeling bothered) separately, based on a rating scale model.

To assess test-retest reliability, 20 children from Sweden completed the CHEQ twice within 7 to 14 days. Kappa analysis was used for the first two questions (performing the activity independently and using the affected hand as support or to grasp) and intraclass correlation coefficients were calculated for the three scales (grasp efficacy, time taken, and feeling bothered).

More information on study characteristics is provided in Table 5 in the attachment. The methodological quality for evaluating construct validity (unidimensionality) was assessed to be adequate (see Table 6.3 in the attachment). The methodological quality for evaluating test-retest reliability was assessed to be doubtful (see Table 6.4 in the attachment).

Results

Results regarding responsiveness, validity and reliability of the CHEQ are summarised below.

Responsiveness

Ryll (2019) reported on responsiveness. Associations between the three CHEQ scales and the GAS ranged between 0.34 and 0.38 (grasp efficacy ρ=0.38; time taken ρ=0.34 and feeling bothered ρ=0.37). These correlation coefficients confirmed the authors’ hypotheses.

Effect sizes (Cohen’s d and standardized response means) were higher in the group of children that improved on the GAS as compared with the group of children that did not improve on the GAS. The effect sizes in the group of children that improved were 1.01 for grasp efficacy, 0.74 for time taken, and 0.61 for feeling bothered. These results confirmed the authors’ hypotheses that effect sizes would be ≥ 0.5 in this group. The effect sizes in the group of children that did not improve on the GAS were 0.31 for grasp efficacy, 0.09 for time taken, and -0.14 for feeling bothered. These results confirmed the authors’ hypotheses that effect sizes would be < 0.5 in this group.

The AUC was 0.69 for grasp efficacy (95%CI 0.52 to 0.87), 0.67 for time taken (95%CI 0.49 to 0.84), and 0.73 for feeling bothered (95%CI 0.56 to 0.91). Only the AUC for the scale feeling bothered met the COSMIN criteria of ≥ 0.70 for good measurement properties.

Validity

Sköld (2011) reported on content validity. Descriptive analyses showed that the activities included in the CHEQ were commonly performed independently (range 43.7% to 100%), and involved the use of both hands (range 70.7% to 97.5%).

Sköld (2011) and Amer (2016) reported on construct validity (structural validity). In the study by Sköld (2011), the Rasch analysis showed misfit for four items in the grasp efficacy scale (14%) and two items in the time taken scale (7%). Five items were removed based on this analysis. Out of 29 items, for 26 items infit mean squares were ≥ 0.5 and ≤ 1.5 and met the criteria for good measurement properties. However, for three items (cut up a pancake (or other food that is easy to cut) on a plate, cut meat (or other food that is hard to cut) on a plate, and fasten a necklace (whilst around the neck)), infit mean squares were > 1.5 and did not meet the COSMIN criteria for good measurement properties. Out of 29 items, for 19 items z-standardized values were > -2 and < 2. However, for 10 items, z-standardized values were ≤ -2 or ≥ 2 and did not meet the COSMIN criteria for good measurement properties.

In the study by Amer (2016), Rasch analysis showed that out of 28 items, for 26 items infit mean squares were ≥ 0.5 and ≤ 1.5 and met the criteria for good measurement properties. However, for two items (handle playing cards and tie shoelaces), infit mean squares were > 1.5 and did not meet the COSMIN criteria for good measurement properties. Out of 28 items, for 13 items z-standardized values were > -2 and < 2. However, for 15 items, z-standardized values were ≤ -2 or ≥ 2 and did not meet the COSMIN criteria for good measurement properties.

None of the studies reported on criterion validity of the CHEQ.

Reliability

Amer (2016) reported on test-retest reliability. Kappa analysis showed fair to good agreement for the two opening questions, both were < 0.70 and therefore did not meet the COSMIN criteria for good measurement properties. The average K was 0.63 for the question on performing the activity independently and 0.57 for the question on using the affected hand as support or to grasp. Intraclass correlation coefficients were reported for the three scales separately and were all ≥ 0.70 and met the COSMIN criteria for good measurement properties. ICCs were 0.89 (95%CI 0.73 to 0.96) for grasp efficacy, 0.87 (95%CI 0.69 to 0.94) for time taken, and 0.91 (95%CI 0.79 to 0.96) for feeling bothered.

Level of evidence of the literature

Responsiveness

The level of evidence for responsiveness as evaluated in the study by Ryll (2019) was downgraded by two levels from high to low because of very serious risk of bias (only one study of doubtful quality available), and by another two levels to very low because of very serious imprecision (the total sample size was below 50).

Validity

The level of evidence for content validity as evaluated in the study by Sköld (2011) was downgraded by one level from high to moderate because of serious risk of bias (only one study of adequate quality available).

The level of evidence for construct validity (structural validity) as evaluated in the study by Sköld (2011) was downgraded by two levels from high to low because of very serious risk of bias (only one study of doubtful quality available).

The level of evidence for construct validity (structural validity) as evaluated in the study by Amer (2016) was downgraded by one level from high to moderate because of serious risk of bias (only one study of adequate quality available).

The level of evidence for criterion validity could not be assessed as none of the included studies reported on this measurement property.

Reliability

The level of evidence for test-retest reliability as evaluated in the study by Amer (2016) was downgraded by two levels from high to low because of very serious risk of bias (only one study of doubtful quality available). The level of evidence for internal consistency, inter-rater reliability and intra-rater reliability or measurement error could not be assessed as none of the included studies reported on this measurement property.

Zoeken en selecteren

A systematic review of the literature was performed to answer the following question: What are the clinimetric properties of relevant questionnaires (translated in Dutch or developed in Dutch) for evaluating daily functioning with regard to arm/hand use in children with cerebral palsy?

P: Children with cerebral palsy (CP)

I: Ability and performance questionnaires

C: See “I” (comparison of instruments if applicable)

O: Clinimetric properties: responsiveness, reliability, and validity;

Relevant outcome measures

The working group defined the outcome measures according to the Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy (Mokkink, 2010). Validity refers to ‘the degree to which a health-related patient reported outcome instrument measures the construct(s) it purports to measure’. Reliability refers to ‘the degree to which the measurement is free from measurement error’. Responsiveness refers to ‘the ability of a health-related patient reported outcome instrument to detect change over time in the construct to be measured’. See Table 1 in the attachment for definitions of the measurement properties for each of these three domains (e.g. content validity). Following COSMIN guidance, the working group considered content validity and internal consistency as critical outcome measures and all other outcomes regarding validity, reliability and responsiveness as important outcome measures. The outcomes were assessed against the criteria for good measurement outcomes as shown in Table 2 in the attachment (Prinsen, 2016).

Search and select (Methods)

The databases Medline (via OVID) and Embase (via Embase.com) were searched with relevant search terms from 2000 until April 13^th, 2023. The systematic literature search resulted in 295 hits. An update of the search was performed on 17^th November 2023. In this update, the original search was rerun twice: first without applying a filter for study design, and then with a filter for finding studies on measurement properties of measurement instruments. The detailed search strategy is depicted under the tab Methods. The systematic literature search resulted in 331 hits.

55 studies were initially selected based on title and abstract screening. After reading the full text, 46 studies were excluded (see the table with reasons for exclusion under the tab Methods), and 9 studies were included.

Studies were selected based on the following criteria:

Systematic reviews (searched in at least two databases, and detailed search strategy, risk of bias assessment and results of individual studies available) or original studies evaluating measurement properties;
Children and adolescents aged <18 years;
Reporting about at least one ability or performance questionnaire;
Full-text English language publication;
Published from 2000; and
Studies according to the PICO.

Results

Nine studies met the selection criteria. These studies provided information about the following questionnaires:

CHEQ (Sköld, 2011; Amer, 2016; Ryll, 2019);
HUH (Geerdink, 2017; van der Holst, 2018);
Abilhand-Kids (Gerber, 2016; Bleyenheuft, 2016; de Jong, 2018; Paradis, 2019);

An overview of the measurement properties evaluated in these studies is provided in Table 3 in the attachment. Since the focus of the clinical question is on evaluation of ability and/or performance over time, only studies about questionnaires for which evidence on responsiveness was available are included in the literature summary. This included the CHEQ (evidence on responsiveness provided by Ryll, 2019) and Abilhand-Kids questionnaire (Bleyenheuft, 2016).

However, only for the CHEQ data on responsiveness were reported in a format that could be compared with the COSMIN-criteria for good measurement properties (results are in accordance with the hypothesis or area under the curve ≥ 0.70).

The data on responsiveness of the Abilhand-Kids questionnaire (Bleyenheuft, 2016) were not reported in a format that could be compared with the COSMIN criteria for good measurement properties. Bleyenheuft (2016) used a global approach (analysis of variance and calculation of effect sizes for changes from baseline to post-training and from post-training to follow-up), a group approach (analysis of effect sizes for younger versus older children, and for MACS levels I/II/III), and an individual approach (analysis of t-values, which indicate whether the change observed on the ABILHAND-kids between two assessment times for a given child reflects more than the fluctuation of the measuring instrument). No hypotheses about changes in scores were formulated a priori.

More information about the characteristics of the CHEQ is provided in Table 4 in the attachment. Important study characteristics and results for the CHEQ are summarized in the evidence table (Table 5 in the attachment).

For the three studies about the CHEQ, the level of evidence assessment was mostly performed in accordance with COSMIN guidance for systematic reviews of PROMS (Mokkink, 2018). This assessment was performed per measurement property per instrument. Four factors were taken into account: (1) risk of bias; (2) inconsistency; (3) imprecision; and (4) indirectness. Following COSMIN guidance, publication bias was not assessed because of a lack of registries for these types of studies. To assess the methodological quality of the three original studies, i.e. assessing the risk of bias for each measurement property, the corresponding COSMIN Risk of Bias criteria were completed. To determine the overall quality of a study, the lowest rating of any standard in the box was taken (i.e. “the worst score counts” principle). The risk of bias assessment for the three original studies is reported in Tables 6.1-6.6 in the attachment.

For these types of studies, the level of evidence starts at ‘high’. The quality of evidence can be downgraded by one, two, or three levels per factor to moderate, low or very low level of evidence. See Table 7 in the attachment for the definitions of these quality levels (Mokkink, 2018).

For risk of bias, the level of evidence could be downgraded by three levels: one level in case of a serious risk of bias (multiple studies of doubtful quality or only one study of adequate quality), two levels in case of a very serious risk of bias (multiple studies of inadequate quality or only one study of doubtful quality), or three levels in case of an extremely serious risk of bias (only one study of inadequate quality).

For inconsistency, the level of evidence could be downgraded by one or two levels in case of unexplained inconsistency.

For imprecision, the level of evidence could be downgraded by one or two levels: one level if the total sample size was 50-100, or two levels if the total sample size was below 50. Downgrading was not performed if the sample size requirement was already included in the risk of bias assessment (for the properties content validity, structural validity, and cross-cultural validity).

For indirectness, the level of evidence could be downgraded by one or two levels if studies were performed in another population or another context of use compared with the population and context as defined in the PICO.

Referenties

Amer A, Eliasson AC, Peny-Dahlstrand M, Hermansson L. Validity and test-retest reliability of Children's Hand-use Experience Questionnaire in children with unilateral cerebral palsy. Dev Med Child Neurol. 2016 Jul;58(7):743-9. doi: 10.1111/dmcn.12991. Epub 2015 Nov 26. PMID: 26610725.
Bleyenheuft Y, Arnould C, Brandao MB, Bleyenheuft C, Gordon AM. Hand and Arm Bimanual Intensive Therapy Including Lower Extremity (HABIT-ILE) in Children With Unilateral Spastic Cerebral Palsy: A Randomized Trial. Neurorehabil Neural Repair. 2015 Aug;29(7):645-57. doi: 10.1177/1545968314562109. Epub 2014 Dec 19. PMID: 25527487.
Geerdink Y, Aarts P, van der Holst M, Lindeboom R, Van Der Burg J, Steenbergen B, Geurts AC. Development and psychometric properties of the Hand-Use-at-Home questionnaire to assess amount of affected hand-use in children with unilateral paresis. Dev Med Child Neurol. 2017 Sep;59(9):919-925. doi: 10.1111/dmcn.13449. Epub 2017 May 29. PMID: 28555780.
Gerber, C. N., Labruyère, R., & van Hedel, H. J. (2016). Reliability and Responsiveness of Upper Limb Motor Assessments for Children With Central Neuromotor Disorders: A Systematic Review. Neurorehabilitation and neural repair, 30(1), 1939. https://doi.org/10.1177/1545968315583723
van der Holst, M., Geerdink, Y., Aarts, P., Steenbeek, D., Pondaag, W., Nelissen, R. G., Geurts, A. C., & Vliet Vlieland, T. P. (2018). Hand-Use-at-Home Questionnaire: validity and reliability in children with neonatal brachial plexus palsy or unilateral cerebral palsy. Clinical rehabilitation, 32(10), 13631373. https://doi.org/10.1177/0269215518775156
de Jong LD, van Meeteren A, Emmelot CH, Land NE, Dijkstra PU. Reliability and sources of variation of the ABILHAND-Kids questionnaire in children with cerebral palsy. Disabil Rehabil. 2018 Mar;40(6):684-689. doi: 10.1080/09638288.2016.1272139
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., ... & de Vet, H. C. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of clinical epidemiology, 63(7), 737-745.
Mokkink, L. B., Prinsen, C. A., Patrick, D. L., Alonso, J., Bouter, L. M., de Vet, H.C., Terwee C. B. (2018). COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs). User manual. 78:1. Beschikbaar op: https://www.cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018-1.pdf.
Paradis, J., Dispa, D., De Montpellier, A., Ebner-Karestinos, D., Araneda, R., Saussez, G., Renders, A., Arnould, C., & Bleyenheuft, Y. (2019). Interrater Reliability of Activity Questionnaires After an Intensive Motor-Skill Learning Intervention for Children With Cerebral Palsy. Archives of physical medicine and rehabilitation, 100(9), 16551662. https://doi.org/10.1016/j.apmr.2018.12.039
Prinsen CA, Vohra S, Rose MR, Boers M, Tugwell P, Clarke M, Williamson PR, Terwee CB. How to select outcome measurement instruments for outcomes included in a "Core Outcome Set" - a practical guideline. Trials. 2016 Sep 13;17(1):449. doi: 10.1186/s13063-016-1555-2. PMID: 27618914; PMCID: PMC5020549.
Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, Terwee CB. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018 May;27(5):1147-1157. doi: 10.1007/s11136-018-1798-3. Epub 2018 Feb 12. PMID: 29435801; PMCID: PMC5891568.
Ryll UC, Eliasson AC, Bastiaenen CH, Green D. To Explore the Validity of Change Scores of the Children's Hand-use Experience Questionnaire (CHEQ) in Children with Unilateral Cerebral Palsy. Phys Occup Ther Pediatr. 2019;39(2):168-180. doi: 10.1080/01942638.2018.1438554. Epub 2018 Feb 26. PMID: 29482408.
Sköld A, Hermansson LN, Krumlinde-Sundholm L, Eliasson AC. Development and evidence of validity for the Children's Hand-use Experience Questionnaire (CHEQ). Dev Med Child Neurol. 2011 May;53(5):436-42. doi: 10.1111/j.1469-8749.2010.03896.x. Epub 2011 Mar 17. PMID: 21413973.
Streiner DL, Norman G. Health Measurement Scales. A practical guide to their development and use. 4th edition ed. New York: Oxford University Press; 2008.
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007 Jan;60(1):34-42. doi: 10.1016/j.jclinepi.2006.03.012. Epub 2006 Aug 24. PMID: 17161752.
Urlic K, Wallen M. The Assisting Hand Assessment is a reliable and valid measure of assessing hand function for children with hemiplegic cerebral palsy and obstetric brachial plexus palsy. Aust Occup Ther J. 2009 Aug;56(4):295-6. doi: 10.1111/j.1440-1630.2009.807_2.x. PMID: 20854531.

Evidence tabellen

Table 1: COSMIN definitions of domains, measurement properties, and aspects of measurement properties (Mokkink, 2010)

Term			Definition
Domain	Measurement property	Aspect of a measurement property	Definition
Reliability			The degree to which the measurement is free from measurement error
Reliability (extended definition)			The extent to which scores for patients who have not changed are the same for repeated measurement under several conditions; e.g. using different sets of items from the same health related-patient reported outcomes (HR-PRO) (internal consistency); over time (test-retest); by different persons on the same occasion (inter-rater; or by the same persons (i.e. raters or responders) on different occasions (intra-rater)
	Internal consistency		The degree of the interrelatedness among the items
	Reliability		The proportion of the total variance in the measurements which is due to ‘true’† differences between patients
	Measurement		The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured
Validity			The degree to which an HR-PRO instrument measures the construct(s) it purports to measure
	Content validity		The degree to which the content of an HR-PRO instrument is an adequate reflection of the construct to be measured
		Face validity	The degree to which (the items of) an HR-PRO instrument indeed looks as though they are an adequate reflection of the construct to be measured
	Construct validity		The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the HR-PRO instrument validly measures the construct to be measured
		Structural validity	The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured
		Hypotheses testing	Idem construct validity
		Cross-cultural validity	The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument are an adequate reflection of the performance of the items of the original version of the HR-PRO instrument
	Content validity		The degree to which the content of an HR-PRO instrument is an adequate reflection of the construct to be measured
		Face validity	The degree to which (the items of) an HR-PRO instrument indeed looks as though they are an adequate reflection of the construct to be measured
	Criterion validity		The degree to which the scores of an HR-PRO instrument are an adequate reflection of a ‘gold standard’
Responsiveness			The ability of an HR-PRO instrument to detect change over time in the construct to be measured
	Responsiveness		Idem responsiveness
Interpretability*			Interpretability is the degree to which one can assign qualitative meaning – that is, clinical or commonly understood connotations – to an instrument’s quantitative scores or change in scores.

† The word ‘true’ must be seen in the context of the CTT, which states that any observation is composed of two components – a true score and error associated with the observation. ‘True’ is the average score what would be obtained if the scale were given an infinite number of times. It refers only to the consistency of the score, and not to its accuracy (Streiner, 2008).

* Interpretability is not considered a measurement property, but an important characteristic of a measurement instrument.

Abbreviations

HR-PRO: health related-patient reported outcomes

Table 2: Criteria for Good MEasurement Properties (adapted from Prinsen, 2018; based on Terwee, 2007 and Prinsen, 2016)

Measurement Property	Rating	Criteria
Structural validity	+	CTT: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.082 IRT/Rasch: No violation or unidimensionality: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.08 AND no violation of local independence: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3’s < 0.37 AND no violation of monotonicity: adequate looking graphs OR item scalability > 0.30 AND adequate model fit IRT: X² > 0.01. Rasch: infit and outfit means squares ≥ 0.5 and ≤ 1.5 OR Z-standardised values > -2 and < 2
	?	CTT: not all information for ‘+’ reported IRT/Rasch: model fit not reported
	-	Criteria for ‘+’ not met
Internal consistency	+	At least low evidence for sufficient structural validity AND Cronbach’s alpha(s) ≥ 0.70 for each unidimensional scale or subscale
	?	Criteria for “At least low evidence for sufficient structural validity” not met
	-	At least low evidence for sufficient structural validity AND Cronbach’s alpha(s) < 0.70 for each unidimensional scale or subscale
Reliability	+	ICC or weighted Kappa ≥ 0.70
	?	ICC or weighted Kappa not reported
	-	ICC or weighted Kappa < 0.70
Measurement error	+	SDC or LoA < MIC
	?	MIC not defined
	-	SDC or LoA >MIC
Hypotheses testing for construct validity	+	The result is in accordance with the hypothesis
	?	No hypothesis defined (by the review team)
	-	The result is not in accordance with the hypothesis
Cross-cultural validity/measurement invariance	+	No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s R² <0.02)
	?	No multiple group factor analysis OR DIF analysis performed
	-	Important differences between group factors OR DIF was found
Criterion validity	+	Correlation with gold standard ≥ 0.70 or AUC ≥ 0.70
	?	Not all information for ‘+’ reported
	-	Correlation with gold standard < 0.70 or AUC < 0.70
Responsiveness	+	The result is in accordance with the hypothesis OR AUC ≥ 0.70
	?	No hypothesis defined (by the review team)
	-	The result is not in accordance with the hypothesis or AUC < 0.70

AUC: Area under the curve; CFA: confirmatory factor analysis; CFI: comparative fit index; CTT: classical test theory; DIF: differential item functioning; ICC: intraclass correlation coefficient; IRT: item response theory; LoA: limits of agreement; MIC: minimal important change; RMSEA: root mean square error of approximation; SEM: standard error of measurement; SDC: smallest detectable change; SRMR: standardised root mean residuals; TLR: Tucker-Lewis Index

“+” = sufficient
“?” = indeterminate
“-“ = insufficient

Table 3: Overview of psychometric properties evaluated in the included studies

	Children’s Hand-use Experience Questionnaire (CHEQ)	Hand Use at Home Questionnaire (HUH)^a	Abilhand-Kids
Review
Gerber 2016			Reliability (test-retest reliability) (n=113)
Original studies
Sköld (2011)	Validity (content validity and construct validity) (n=86, including 31 children with CP)
Amer (2016)	Validity (construct validity) Reliability (test-retest reliability) (n=242)
Ryll (2019)	Responsiveness (n=44)
Geerdink (2017)		Validity (construct validity) Reliability (internal consistency) (n=322, including 131 children with CP)
van der Holst (2018)		Validity (construct validity) Reliability (test-retest reliability) (n=260, including 79 children with CP)
Bleyenheuft (2016)			Responsiveness ^b (n=98)
De Jong (2018)			Reliability (interrater reliability, test-retest reliability) (n=27)
Paradis (2019)			Reliability (interrater reliability) (n=41)

^aNo data were available on responsiveness. Since the focus of the clinical question is on responsiveness, no studies on the Hand Use at Home Questionnaire (HUH) were included in the literature summary.

^bWhile this study evaluated responsiveness, no data were reported that could be compared against the COSMIN criteria for good measurement properties. Since the focus of the clinical question is on responsiveness, no studies on the Abilhand-Kids questionnaire were included in the literature summary.

Table 4: Characteristics of the CHEQ

Instrument (reference to first article)

Construct(s)

Target population

Mode of administration (e.g. self-report, interview-based, parent/proxy report etc)

(Sub)scale(s) (number of items)

Response options and range of scores/scoring

Language

Children’s Hand-use Experience Questionnaire (CHEQ)

(Sköld, 2011)

www.cheq.se

Children’s perceived quality of performance when using the affected hand in performing bimanual activities

(1) perceived efficacy of the grasp

(2) time taken to perform the activity

(3) experience of feeling bothered by impaired hand function in the activity

Children ⁄ adolescents aged 6 to 18 years with unilateral hand impairment caused by unilateral CP, upper limb reduction deficiency, or obstetric brachial plexus palsy

Self-report by the child from about 12 years of age, together with parent or guardian before this age or later if needed, or by parents or guardians as proxy

Approximate time to answer CHEQ is 30 minutes

29 activities (e.g. ‘pick money out of a purse or wallet’)

Question 1

‘Is this something you usually do independently?’

yes
no, I get help/avoid doing it
not applicable

If yes:

Question 2

‘Do you use one hand or both hands together?’

one hand
both hands, with the involved hand supporting without holding
both hands, with the involved hand holding the object

Question 3

After these opening questions, which serve to describe whether and how the activity is performed, the respondents’ experience of the performance is rated on three separate scales.

Next, the respondent’s experience is evaluated by three questions rated on four-category scales with verbal anchors on each end, constituting three dimensions of hand use:

(1) grasp efficacy, indicating how effective the grasp is perceived, where 1 is ‘ineffective’ and 4 is ‘effective’;
(2) time taken, indicating the time taken to perform the activity compared with peers, where 1 is ‘considerably longer’ and 2 is ‘equally long’;
(3) feeling bothered, indicating whether the child feels irritated, sad, or uncomfortable when doing the activity, where 1 is ‘it bothers me a lot’ and 4 is ‘it does not bother me at all’.

Available in 16 languages, including Dutch

Table 5: Evidence table on characteristics and results of studies on measurement properties

Study

Study characteristics

Patient characteristics

Measurement instrument (I)

Measurement instrument (C; golden standard)

Follow-up/Interpretability

Measurement properties

Comments

Children’s Hand-use Experience Questionnaire (CHEQ)

Sköld, 2011

Instrument assessed:

Children’s Hand-use Experience Questionnaire (CHEQ)

Setting and Country:

Sweden

Funding and conflicts of interest:

This study was supported by grants from the Norrbacka-Eugeniastiftelsen, Stiftelsen Sunnerdahl Handikappfond, Sällskapet Barnava ̊ rd, Groschinsky Memorial Fund, Eva and Oscar Ahréns Memorial Fund, The Swedish National Association for Disabled Children and Young People (RBU), The Swedish Research Council, The Centre for Health Care Science at Karolinska Institu- tet, and the Research Committee, Örebro County Council, Sweden. No information about potential conflicts of interest was provided.

Inclusion criteria:

Inclusion criteria for the study were diagnosis of unilateral CP, obstetric brachial plexus palsy, or upper limb reduction deficiency (not using prosthesis) and age 6 to 18 years.

Exclusion criteria:

Sample size:

Total: 86

CP: 31

Obstetric brachial plexus palsy: 26

Upper limb reduction deficiency: 29

Age in years (mean (SD):

Total: 12 (3)

CP: 11 (3)

Gender (% female):
Total: 51%

CP: 48%

Name:

CHEQ

Version (including language if applicable):

early version of CHEQ using a 10-category rating scale

n/a

Length of follow-up: n/a

Loss to follow-up: n/a

Percentage of missing items/total scores/outcome:

96 families were recruited, 14% of questionnaires were incompletely or incorrectly filled out, the analysis was based on data from 86 respondents

Floor effects (% of sample with the lowest score possible):

Not reported

Ceiling effects (% of sample with the highest score possible):

Not reported

Content validity

Definition: activities were by nature bimanual, found relevant to the age group of respondents, and typically performed independently

Activities performed independently (range): 43.7% - 100%

Activities performed with two hands (range):

70.7%-97.5%

Construct validity (structural validity):

definition: item-fit statistics.

Adequate model fit

The Rasch analysis showed misfit for four items in the grasp efficacy scale (14%) and two items in the time taken scale (7%). Five items were removed based on this analysis.

Infit mean squares (range):

Grasp efficacy: 0.57 to 1.77 (two items ≥ 1.5)

Time taken: 0.56 to 1.97 (one item ≥ 1.5)

Feeling bothered: 0.55 to 1.39)

Infit Z-standardized values (range):

Grasp efficacy: -2.4 to 2.8 (three items ≤ -2 or ≥2)

Time taken: -2.3 to 3.9 (six items ≤-2 or ≥2)

Feeling bothered: -2.5 to 2.0 (4 items ≤-2 or ≥2)

Authors’ conclusions

CHEQ can be used to assess children and adolescents with a unilateral hand dysfunction on their experiences of using the affected hand to perform bimanual tasks. In clinical work, CHEQ has the potential to become a useful tool for treatment planning and follow-up.

Amer, 2016

Instrument assessed:

Children’s Hand-use Experience Questionnaire (CHEQ)

Setting and Country:

Online survey in six countries: Australia, Israel, Italy, the Netherlands, Sweden, UK

Funding and conflicts of interest:

This study has been financially supported by Stiftelsen Frimurarna Barnhuset, Swedish Research Council (grant nos 521-211-2655 and 521-2011-456) and Centre for Rehabilitation Research, Region Örebro County. The authors have stated that they had no interests that could be perceived as posing a conflict or bias.

One of the co-authors is co-developer of the CHEQ.

Inclusion criteria:

Participants were children with unilateral CP. The participants represent a convenience sample: some children were participants in other studies where CHEQ was included in the data collection, other children were recruited specifically for this study, and the remainder answered the questionnaire for clinical purposes and gave consent to the data being used for research.

Exclusion criteria:

Sample size:

242

Age in years (mean (SD; range)):

9 y 10 mo (SD 3y 5mo, range 6-18 y)

Gender (% female):
105 (43%)

Name:

CHEQ

Version (including language if applicable):

internet-based version using four-category rating scales

n/a

Length of follow-up: n/a

Loss to follow-up: n/a

Percentage of missing items/total scores/outcome:

Range of children for whom activities were rated ‘not applicable’: 3-76%.

Therefore, data were missing for the three scales:

Grasp efficacy: 39%

Time taken: 32%

Feeling bothered: 34%

Floor effects (% of sample with the lowest score possible):

Grasp efficacy: 0.4%

Time taken: 1.2%

Feeling bothered: 0%

Ceiling effects (% of sample with the highest score possible):

Grasp efficacy: 2.5%

Time taken: 2.1%

Feeling bothered: 12.8%

Construct validity (structural validity):

definition: rating scale functioning according to Linacre (2004), internal structure according to Fisher (2007) and item-fit statistics.

Adequate model fit

The initial item-fit analysis showed that the item ‘fasten a necklace’ did not fit in any of the three scales and was removed. Further analyses were based on 28 activities.

Further item fit analysis showed acceptable fit for the scales grasp efficacy and time taken, but in the scale feeling bothered, three items were misfit (handle playing cards, cut meat (or other food hard to cut up) on a plate, tie shoelaces).

Infit mean squares (range):

Grasp efficacy: 0.76 to 1.38

Time taken: 0.67 to 1.35

Feeling bothered: 0.73 to 1.70 (two items ≥ 1.5)

Infit Z-standardized values (range):

Grasp efficacy: -2.6 to 2.9 (six items ≤-2 or ≥2)

Time taken: -2.91 to 2.96 (seven items ≤-2 or ≥2)

Feeling bothered: -2.57 to 4.47 (eight items ≤-2 or ≥2)

Reliability (reliability, test-retest):
definition: no definition of test-retest reliability provided. The strength of the agreement between the scores on the first and second completion of the CHEQ was analysed using Kappa analysis on the opening questions.

Performing the activity independently:

average K 0.63

Using the affected hand as support or to grasp:

average K 0.57

Intraclass correlation coefficients were calculated for the three scales.

ICC (mean, 95%CI)

Grasp efficacy (n=19):

0.89 (0.73-0.96)

Time taken (n=20):

0.87 (0.69-0.94)

Feeling bothered (n=20):

0.91 (0.79-0.96)

Authors’ conclusions

In conclusion, the results demonstrate the validity of the internet-based CHEQ with a four-category rating scale in children with CP. The test–retest reliability is high, allowing for group level comparisons and near sufficient for individual comparisons. This means that CHEQ provides a unique perspective of children’s experience of using the affected hand in daily activities and can be used as a complement to other tests measuring aspects of capacity and performance. Given that the development of a test is an ongoing process, further studies are needed to investigate the validity of the internet-based version of CHEQ for children with ULRD or OBPP and the recommended improvements to the current version.

Ryll, 2019

Instrument assessed:

Children’s Hand-use Experience Questionnaire (CHEQ)

Setting and Country:

Swedish authors used data from a two-week day camp in England

Funding and conflicts of interest:

This research was supported by funding from Stiftelsen Frimurare Barnhuset Stockholm, Guy’s and St Thomas’ Charity and Beit Issie Shapiro and supported by Breathe Arts Health Research. One of the co-authors is co-developer of the CHEQ.

Inclusion criteria:

Inclusion criteria for the analysis of responsiveness were children aged 6 to 18 years with unilateral impairment of the upper limb who participated in the intervention with required assessments at baseline and immediately following the two-week hand-arm bimanual intensive therapy program.

Exclusion criteria:

Sample size:

Age in years (median (IQR)):

9.0 (8-11)

Gender (% female):
14 (32%)

Name:

CHEQ

Version (including language if applicable):

internet-based version using four-category rating scales

Hypothesis (responsiveness hypothesis testing):

Based on prior investigations of the relationship between perceived bimanual performance measured by CHEQ and the observed bimanual performance measured by the Assisting Hand Assessment (AHA), positive relation- ships between the change scores of the CHEQ scales and the GAS as anchor were hypothesized to range from r = 0.3–0.5 for the CHEQ scale Grasp efficacy and Time utilization, as both have a common focus on bimanual activity performance. A slightly weaker association (r = 0.2–0.4) was expected for the CHEQ scale Feeling bothered, because we assumed the ability in specific task performance to be less related to feelings about this performance across tasks.

Name:

Goal Attainment Scale (GAS)

Version (including language if applicable):

Construct:

extent to which individuals’ goals are achieved during intervention

Hypothesis (responsiveness hypothesis testing): see hypothesis on the left

Length of follow-up:

n/a

Loss to follow-up:
n/a

Percentage of missing items/total scores/outcome:

n/a

Floor effects (% of sample with the lowest score possible):
No floor effects were observed for any of the CHEQ scales.

Ceiling effects (% of sample with the highest score possible):

A ceiling effect was found for the CHEQ scale ‘feeling bothered’ at post assessment (15.9%)

Minimally important change/difference:
A change of two scale steps is reported as being meaningful and in resemblance with goal achievement

Responsiveness:

definition: Responsiveness is considered one aspect of validity as it concerns the validity of change scores, i.e. longitudinal validity. According to the proposed definition by the COSMIN (COnsensus-based Standards for selection of health Measurement INstruments) group, a responsive instrument should measure changes in the construct(s) it intends to measure; if a participant changes on the construct of interest the measurement instrument assessing the same construct should reflect this.

Spearman rank correlation coefficients between GAS and CHEQ

Grasp efficacy: ρ=0.38

Time taken: ρ=0.34

Feeling bothered: ρ=0.37

Effect sizes (Cohen’s d)

Grasp efficacy

Improved: 1.01

Not improved: 0.31

Time taken

Improved: 0.74

Not improved: 0.09

Feeling bothered

Improved: 0.61

Not improved: -0.14

Area under the curve (95% CI)

Grasp efficacy: 0.73 (0.56 to 0.91)

Time taken: 0.67 (0.49 to 0.84)

Feeling bothered: 0.73 (0.56 to 0.91)

Authors’ conclusions

Some evidence was shown for CHEQ scales to capture change in bimanual performance but with limited accuracy for two out of three scales. GAS can be used as an anchor to measure the construct of perception on bimanual performance when CHEQ items directly or indirectly are considered as pool for goal setting.

Linacre J. Optimizing Rating Scale Category Effective-ness. In Smith E, Smith R, editors. Introduction to Rasch Measurement. Maple Grove, Minnesota: JAM Press, 2004: 258–78.

Fisher W. Rating scale instrument quality criteria. Rasch Meas Trans 2007; 21: 1.

Table 6.1 : Risk of bias table for the original study by Sköld (2011) – CHEQ – content validity

Content validity
Author: Sköld (2011)
Instrument: CHEQ
	very good	adequate	doubtful	inadequate	NA
Asking patients about relevance
Design requirements
Was an appropriate method used to ask patients whether each item is relevant for their experience with the condition?	Widely recognized or well justified method used	*Only quantitative (survey) method(s) used or assumable that the method was appropriate but not clearly described*	Not clear if patients were asked whether each item is relevant or doubtful whether the method was appropriate	Methods used not appropriate or patients not asked about the relevance of all items
Was each item tested in an appropriate number of patients? For qualitative studies For quantitative (survey) studies	≥ 7 *≥ 50*	4-6 ≥30	<4 or not clear <30 or not clear
Were skilled group moderators/interviewers used?	Skilled group moderators/interviewers used	Group moderators/interviewers had limited experience or were trained specifically for the study	Not clear if group moderators/interviewers were trained or group moderators/interviewers not trained and no experience		*Not applicable*
Were the group meetings or interviews based on an appropriate topic or interview guide?	Appropriate topic or interview guide	Assumable that the topic or interview guide was appropriate, but not clearly described	Not clear if a topic guide was used or doubtful if topic or interview guide was appropriate or no guide		*Not applicable*
Were the group meetings or interviews recorded and transcribed verbatim?	All group meetings or interviews were recorded and transcribed verbatim	Assumable that all group meetings or interviews were recorded and transcribed verbatim, but not clearly described	Not clear if all group meetings or interviews were recorded and transcribed verbatim or recordings not transcribed verbatim or only notes were made during the group meetings/interviews	No recordings and no notes	*Not applicable*
Analyses
Was an appropriate approach used to analyse the data?	*A widely recognized or well justified approach was used*	Assumable that the approach was appropriate, but not clearly described	Not clear what approach was used or doubtful whether the approach was appropriate	Approach not appropriate
Were at least two researchers involved in the analysis?	At least two researchers involved in the analysis	*Assumable that at least two researchers were involved in the analysis, but not clearly described*	Not clear if two researchers were included in the analysis or only one researcher involved in the analysis

Table 6.2: Risk of bias table for the original study by Sköld (2011) – CHEQ – structural validity

Structural validity
Author: Sköld (2011)
Instrument: CHEQ
Does the scale consist of effect indicators, i.e. is it based on a reflective model?¹ Yes (each subscale)
Does the study concern unidimensionality or structural validity?² unidimensionality
	very good	adequate	doubtful	inadequate	NA
Statistical methods
For Rasch: does the chosen model fit to the research question?	*Chosen model fits well to the research question*	Assumable that the chosen model fits well to the research question	Doubtful in the chosen model fits well to the research question	Chosen model does not fit to the research question	Not applicable
Was the sample size included in the analysis adequate?	Rasch model: ≥ 200 subjects	Rasch model: 100-199 subjects	*Rasch model: 50-99 subjects***	Rasch model: < 50 subjects
Were there any other important flaws in the design or statistical methods of the study?	*No other important methodological flaws*		Other minor methodological flaws (e.g. rotation method not described)	Other important methodological flaws (e.g. inappropriate rotation method)

¹ If the scale is not based on a reflective model, unidimensionality or structural validity is not relevant.

² In a systematic review, it is helpful to make a distinction between studies where factor analysis is performed on each (sub)scale separately to evaluate whether the (sub)scales are unidimensional (unidimensionality studies) and studies where factor analysis is performed on all items of an instrument to evaluate the (expected) number of subscales in the instrument and the clustering of items within subscales (structural validity studies).

Table 6.3: Risk of bias table for the original study by Amer (2016) – CHEQ – structural validity

Structural validity
Author: Amer (2016)
Instrument: CHEQ
Does the scale consist of effect indicators, i.e. is it based on a reflective model?¹ Yes (each subscale)
Does the study concern unidimensionality or structural validity?² unidimensionality
	very good	adequate	doubtful	inadequate	NA
Statistical methods
For Rasch: does the chosen model fit to the research question?	*Chosen model fits well to the research question*	Assumable that the chosen model fits well to the research question	Doubtful in the chosen model fits well to the research question	Chosen model does not fit to the research question	Not applicable
Was the sample size included in the analysis adequate?	Rasch model: ≥ 200 subjects	*Rasch model: 100-199 subjects***	Rasch model: 50-99 subjects	Rasch model: < 50 subjects
Were there any other important flaws in the design or statistical methods of the study?	*No other important methodological flaws*		Other minor methodological flaws (e.g. rotation method not described)	Other important methodological flaws (e.g. inappropriate rotation method)

¹ If the scale is not based on a reflective model, unidimensionality or structural validity is not relevant.

Table 6.4: Risk of bias table for the original study by Amer (2016) – CHEQ – test-retest reliability

Reliability (test-retest)
Author: Amer (2016)
Instrument: CHEQ
	very good	adequate	doubtful	inadequate	NA
Design requirements
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	*Unclear if patients were stable*	Patients were NOT stable
Was the time interval appropriate?	*Time interval appropriate*		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? E.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	*Unclear if test conditions were similar*	Test conditions were NOT similar
Statistical methods
For continuous scores: was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred.	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: was kappa calculated?	*Kappa calculated*			No kappa calculated	Not applicable
For ordinal scores: was a weighted kappa calculated?	Weighted kappa calculated		*Unweighted Kappa calculated or not described*		Not applicable
For ordinal scores: was the weighting scheme described? E.g. linear, quadratic	Weighting scheme described	*Weighting scheme NOT described*			Not applicable
Other
Were there any other important flaws in the design or statistical methods of the study?	*No other important methodological flaws*		Other minor methodological flaws	Other important methodological flaws

Table 6.5: Risk of bias table for the original study by Ryll (2019) – CHEQ – responsiveness

Responsiveness - construct approach [i.e. hypotheses testing; comparison with other outcome measurement instruments (CHEQ compared with GAS)]
Author: Ryll (2019)
Instrument: CHEQ
	very good	adequate	doubtful	inadequate	NA
Design requirements
Is it clear what the comparator instrument(s) measure(s)?	*Constructs measured by the comparator instrument(s) is clear*			Constructs measured by the comparator instrument(s) is not clear
Were the measurement properties of the comparator instrument(s) sufficient?	*Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population*	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	NO information on the measurement properties of the comparator instrument(s) OR evidence of poor quality of comparator instrument(s)
Statistical methods
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	*Statistical method applied NOT optimal*	Statistical method applied NOT appropriate
Other
Were there any other important flaws in the design or statistical methods of the study?	*No other important methodological flaws*		Other minor methodological flaws	Other important methodological flaws

Table 7 Definitions of quality levels (Mokkink, 2018)

Quality level	Definition
High	We are very confident that the true measurement property lies close to that of the estimate* of the measurement property
Moderate	We are moderately confident in the measurement property estimate: the true measurement property is likely to be close to the estimate of the measurement property, but there is a possibility that it is substantially different
Low	Our confidence in the measurement property estimate is limited: the true measurement property may be substantially different from the estimate of the measurement property
Very low	We have very little confidence in the measurement property estimate: the true measurement property is likely to be substantially different from the estimate of the measurement property

* Estimate of the measurement property refers to the pooled or summarized result of the measurement property of a PROM

The COSMIN working group adapted these definitions from the GRADE approach.

Table of excluded studies

Reference	Reason for exclusion
Adams, R. J., Lunsford, C. D., Stevenson, R. D., Ellington, A. L., Lichter, M. D., & Patrie, J. T. (2023). Concurrent Validity of Measures of Upper Extremity Function Derived from Videogame-Based Motion Capture for Children with Hemiplegia. Games for health journal, 10.1089/g4h.2022.0160. Advance online publication. https://doi.org/10.1089/g4h.2022.0160	Wrong intervention (Jebsen Taylor Hand Function Test)
Araneda R, Ebner-Karestinos D, Paradis J, Saussez G, Friel KM, Gordon AM, Bleyenheuft Y. Reliability and responsiveness of the Jebsen-Taylor Test of Hand Function and the Box and Block Test for children with cerebral palsy. Dev Med Child Neurol. 2019 Oct;61(10):1182-1188. doi: 10.1111/dmcn.14184. Epub 2019 Feb 14. PMID: 30761528; PMCID: PMC8284844.	Wrong intervention (Jebsen-Taylor Test of Hand Function and the Box and Block Test)
Arnould, C., Penta, M., Renders, A., & Thonnard, J. L. (2004). ABILHAND-Kids: a measure of manual ability in children with cerebral palsy. Neurology, 63(6), 1045–1052. https://doi.org/10.1212/01.wnl.0000138423.77640.37	Included in SR of Gerber (2016)
Arnould, C., Penta, M., & Thonnard, J. L. (2007). Hand impairments and their relationship with manual ability in children with cerebral palsy. Journal of rehabilitation medicine, 39(9), 708–714. https://doi.org/10.2340/16151977-0111	Wrong outcome (motor impairment, sensory impairment, manual ability)
Bourke-Taylor H. Melbourne Assessment of Unilateral Upper Limb Function: construct validity and correlation with the Pediatric Evaluation of Disability Inventory. Dev Med Child Neurol. 2003 Feb;45(2):92-6. PMID: 12578234.	Wrong intervention (Melbourne Assessment)
Burgess, A., Boyd, R. N., Ziviani, J., & Sakzewski, L. (2019). A systematic review of upper limb activity measures for 5- to 18-year-old children with bilateral cerebral palsy. Australian occupational therapy journal, 66(5), 552–567. https://doi.org/10.1111/1440-1630.12600	Descriptive SR: individual study data cannot be retrieved, provides no absolute data about validity/reliability
Carey, Hl, Hay, K., Nelin, M.A., Sowers, B., Lewandowski, D.J., Moore-Clingenpeel, M., Maitre, N.L. (2020). Caregiver perception of hand function in infants with cerebral palsy: psychometric properties of the Infant Motor Activity Log. Developmental Medicine & Child Neurology, 62(11), 1266-1273. https://doi.org/10.1111/dmcn.14644	Wrong intervention (IMAL)
Cusick, A., Vasquez, M., Knowles, L., & Wallen, M. (2005). Effect of rater training on reliability of Melbourne Assessment of Unilateral Upper Limb Function scores. Developmental medicine and child neurology, 47(1), 39–45. https://doi.org/10.1017/s0012162205000071	Wrong intervention (Melbourne Assessment)
Davids JR, Peace LC, Wagner LV, Gidewall MA, Blackhurst DW, Roberson WM. Validation of the Shriners Hospital for Children Upper Extremity Evaluation (SHUEE) for children with hemiplegic cerebral palsy. J Bone Joint Surg Am. 2006 Feb;88(2):326-33. doi: 10.2106/JBJS.E.00298.	Wrong intervention (SHUEE)
Elvrum, A. K., Saether, R., Riphagen, I. I., & Vik, T. (2016). Outcome measures evaluating hand function in children with bilateral cerebral palsy: a systematic review. Developmental medicine and child neurology, 58(7), 662–671. https://doi.org/10.1111/dmcn.13119	Descriptive SR: individual study data cannot be retrieved, provides no absolute data about validity/reliability
Geerdink, Y., Lindeboom, R., de Wolf, S., Steenbergen, B., Geurts, A. C., & Aarts, P. (2014). Assessment of upper limb capacity in children with unilateral cerebral palsy: construct validity of a Rasch-reduced Modified House Classification. Developmental medicine and child neurology, 56(6), 580–586. https://doi.org/10.1111/dmcn.12395	Wrong intervention (Rasch-reduced modified house classification)
Geijen, M., Rameckers, E., Bastiaenen, C., Gordon, A., & Smeets, R. (2020). Construct Validity of a Task-Oriented Bimanual and Unimanual Strength Measurement in Children With Unilateral Cerebral Palsy. Physical therapy, 100(12), 2237–2245. https://doi.org/10.1093/ptj/pzaa173	Wrong intervention (TAAC)
Geijen, M., Bastiaenen, C., Gordon, A., Smeets, R., & Rameckers, E. (2023). Exploring relevant parameters and investigating their reproducibility of task-oriented unimanual strength measurement in children with unilateral cerebral palsy. Disability and rehabilitation, 1–7. Advance online publication. https://doi.org/10.1080/09638288.2023.2178677	Wrong intervention (TAAC)
Gilmore, R., Sakzewski, L., & Boyd, R. (2010). Upper limb activity measures for 5- to 16-year-old children with congenital hemiplegia: a systematic review. Developmental medicine and child neurology, 52(1), 14–21. https://doi.org/10.1111/j.1469-8749.2009.03369.x	Review did not add studies to included studies.
Greaves, S., Imms, C., Dodd, K., & Krumlinde-Sundholm, L. (2010). Assessing bimanual performance in young children with hemiplegic cerebral palsy: a systematic review. Developmental medicine and child neurology, 52(5), 413–421. https://doi.org/10.1111/j.1469-8749.2009.03561.x	Descriptive SR: individual study data cannot be retrieved, provides no absolute data about validity/reliability
Hoare B, Imms C, Randall M, Carey L. Linking cerebral palsy upper limb measures to the International Classification of Functioning, Disability and Health. J Rehabil Med. 2011 Nov;43(11):987-96. doi: 10.2340/16501977-0886. PMID: 22031344.	Wrong outcome (categorising measurement instruments using the IFC domains)
Houwink, A., Geerdink, Y. A., Steenbergen, B., Geurts, A. C., & Aarts, P. B. (2013). Assessment of upper-limb capacity, performance, and developmental disregard in children with cerebral palsy: validity and reliability of the revised Video-Observation Aarts and Aarts module: Determine Developmental Disregard (VOAA-DDD-R). Developmental medicine and child neurology, 55(1), 76–82. https://doi.org/10.1111/j.1469-8749.2012.04442.x	Wrong intervention (VOAA-DDD-R)
Hoyt, C. R., Brown, S. K., Sherman, S. K., Wood-Smith, M., Van, A. N., Ortega, M., Nguyen, A. L., Lang, C. E., Schlaggar, B. L., & Dosenbach, N. U. F. (2020). Using accelerometry for measurement of motor behavior in children: Relationship of real-world movement to standardized evaluation. Research in developmental disabilities, 96, 103546. https://doi.org/10.1016/j.ridd.2019.103546	Wrong intervention (accelerometry)
James, M. A., Bagley, A., Vogler, J. B., 4th, Davids, J. R., & Van Heest, A. E. (2017). Correlation Between Standard Upper Extremity Impairment Measures and Activity-based Function Testing in Upper Extremity Cerebral Palsy. Journal of pediatric orthopedics, 37(2), 102–106. https://doi.org/10.1097/BPO.0000000000000591	Wrong outcome (correlations between impairment and activity measures: active wrist extension and SHUEE)
Jose PS, Radhakrishna VN, Sahoo B, Madhuri V. An Assessment of the Applicability of Shriners Hospital Upper Extremity Evaluation as a Decision-making Tool and Outcome Measure in Upper Limb Cerebral Palsy in Indian Children. Indian J Orthop. 2019 Jan-Feb;53(1):15-19. doi: 10.4103/ortho.IJOrtho_395_16.	Wrong intervention (SHUEE)
Klingels, K., De Cock, P., Desloovere, K., Huenaerts, C., Molenaers, G., Van Nuland, I., Huysmans, A., & Feys, H. (2008). Comparison of the Melbourne Assessment of Unilateral Upper Limb Function and the Quality of Upper Extremity Skills Test in hemiplegic CP. Developmental medicine and child neurology, 50(12), 904–909. https://doi.org/10.1111/j.1469-8749.2008.03123.x	Wrong intervention (Melbourne assessment)
Klingels, K., Jaspers, E., Van de Winckel, A., De Cock, P., Molenaers, G., & Feys, H. (2010). A systematic review of arm activity measures for children with hemiplegic cerebral palsy. Clinical rehabilitation, 24(10), 887–900. https://doi.org/10.1177/0269215510367994	Includes five studies prior to 2000, search not systematic, also studies added manually).
Klingels, K., Demeyere, I., Jaspers, E., De Cock, P., Molenaers, G., Boyd, R., & Feys, H. (2012). Upper limb impairments and their impact on activity measures in children with unilateral cerebral palsy. European journal of paediatric neurology : EJPN : official journal of the European Paediatric Neurology Society, 16(5), 475–484. https://doi.org/10.1016/j.ejpn.2011.12.008	Wrong outcome (PROM, muscle tone, muscle strength, grip strength, sensory assessment); nothing about performance measures
Krumlinde-Sundholm, L. & Eliasson A-C. (2003). Development of the Assisting Hand Assessment: A Rasch-built Measure intended for Children with Unilateral Upper Limb Impairments. Scandinavian Journal of Occupational Therapy, 10:1, 16-6. https://doi.org/10.1080/11038120310004529	Wrong intervention (AHA)
Krumlinde-Sundholm, L., Ek, L., Sicola, E., Sjöstrand, L., Guzzetta, A., Sgandurra, G., Cioni, G., & Eliasson, A. C. (2017). Development of the Hand Assessment for Infants: evidence of internal scale validity. Developmental medicine and child neurology, 59(12), 1276–1283. https://doi.org/10.1111/dmcn.13585	Wrong intervention (HAI), wrong population (at risk of CP)
Lennon N, Church C, Shields T, Kee J, Henley JD, Salazar-Torres JJ, Niiler T, Shrader MW, Ty JM. Can the Shriners Hospital Upper Extremity Evaluation (SHUEE) Detect Change in Dynamic Position and Spontaneous Function of the Upper Limb in People With Hemiplegic Cerebral Palsy? J Pediatr Orthop. 2023 Jul 1;43(6):e471-e475. doi: 10.1097/BPO.0000000000002403. Epub 2023 Mar 22. PMID: 36952245.	Wrong outcome (not focused on capacity, ability or performance)
Öhrvall, A. M., Krumlinde-Sundholm, L., & Eliasson, A. C. (2013). Exploration of the relationship between the Manual Ability Classification System and hand-function measures of capacity and performance. Disability and rehabilitation, 35(11), 913–918. https://doi.org/10.3109/09638288.2012.714051	Wrong outcome (correlations between MACS and Abilhand-kids, and between MACS 3and Box and Block test)
Park, H., Choi, J. Y., Yi, S. H., Park, E. S., Shim, D., Choi, T. Y., & Rha, D. W. (2021). Relationship between the more-affected upper limb function and daily activity performance in children with cerebral palsy: a cross-sectional study. BMC pediatrics, 21(1), 459. https://doi.org/10.1186/s12887-021-02927-2	Wrong outcome (correlations between Melbourne Assessment and PEDI-CAT)
Pike S, Lannin NA, Cusick A, Wales K, Turner-Stokes L, Ashford S. A systematic review protocol to evaluate the psychometric properties of measures of function within adult neuro-rehabilitation. Syst Rev. 2015 Jun 13;4:86. doi: 10.1186/s13643-015-0076-5.	Wrong study design (systematic review protocol)
Rammer JR, Krzak JJ, Riedel SA, Harris GF. Evaluation of upper extremity movement characteristics during standardized pediatric functional assessment with a Kinect®-based markerless motion analysis system. Annu Int Conf IEEE Eng Med Biol Soc. 2014; 2014: 2525-8. doi: 10.1109/EMBC.2014.6944136.	Wrong intervention (SHUEE, motion analysis system)
Randall M, Carlin JB, Chondros P, Reddihough D. Reliability of the Melbourne assessment of unilateral upper limb function. Dev Med Child Neurol. 2001 Nov;43(11):761-7. doi: 10.1017/s0012162201001396. PMID: 11730151.	Wrong intervention (Melbourne assessment)
Randall, M., Imms, C., & Carey, L. (2008). Establishing validity of a modified Melbourne Assessment for children ages 2 to 4 years. The American journal of occupational therapy : official publication of the American Occupational Therapy Association, 62(4), 373–383. https://doi.org/10.5014/ajot.62.4.373	Wrong intervention (Melbourne assessment)
Randall M, Imms C, Carey LM, Pallant JF. Rasch analysis of The Melbourne Assessment of Unilateral Upper Limb Function. Dev Med Child Neurol. 2014 Jul;56(7):665-72. doi: 10.1111/dmcn.12391. Epub 2014 Feb 5. PMID: 24494925.	Wrong intervention (Melbourne Assessment)
Randall, M., Imms, C., & Carey, L. (2012). Further evidence of validity of the Modified Melbourne Assessment for neurologically impaired children aged 2 to 4 years. Developmental medicine and child neurology, 54(5), 424–428. https://doi.org/10.1111/j.1469-8749.2012.04252.x	Wrong intervention (Melbourne assessment)
Randall M, Carlin JB, Chondros P, Reddihough D. Reliability of the Melbourne assessment of unilateral upper limb function. Dev Med Child Neurol. 2001 Nov;43(11):761-7. doi: 10.1017/s0012162201001396.	Wrong intervention (Melbourne assessment)
de los Reyes-Guzmán A, Dimbwadyo-Terrer I, Trincado-Alonso F, Monasterio-Huelin F, Torricelli D, Gil-Agudo A. Quantitative assessment based on kinematic measures of functional impairments during upper extremity movements: A review. Clin Biomech (Bristol, Avon). 2014 Aug;29(7):719-27. doi: 10.1016/j.clinbiomech.2014.06.013. Epub 2014 Jun 26. PMID: 25017296.	Wrong population (review not focused on CP)
Ryll, U.C., Bastiaenen, C.H., & Eliasson, A.C. Assisting Hand Assessment and Children's Hand-Use Experience Questionnaire -Observed Versus Perceived Bimanual Performance in Children with Unilateral Cerebral Palsy. Physical & occupational therapy in pediatrics - Volume 37, Issue 2, pp. 199-209. doi: 10.1080/01942638.2016.1185498	Wrong outcomes (comparison between AHA and CHEQ, not focused on the measurement properties of the CHEQ)
Sorsdahl, A. B., Moe-Nilssen, R., & Strand, L. I. (2008). Observer reliability of the Gross Motor Performance Measure and the Quality of Upper Extremity Skills Test, based on video recordings. Developmental medicine and child neurology, 50(2), 146–151. https://doi.org/10.1111/j.1469-8749.2007.02023.x	Wrong intervention (QUEST, GMPM)
Spirtos M, O'Mahony P, Malone J. Interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function for children with hemiplegic cerebral palsy. Am J Occup Ther. 2011 Jul-Aug;65(4):378-83. doi: 10.5014/ajot.2011.001222. PMID: 21834452.	Wrong intervention (Melbourne assessment)
Tedesco AP, Nicolini-Panisson RD, de Jesus A. SHUEE on the evaluation of upper limb in cerebral palsy. Acta Ortop Bras. 2015 Jul-Aug;23(4):219-22. doi: 10.1590/1413-78522015230400967. PMID: 26327806; PMCID: PMC4544533.	Wrong intervention (SHUEE)
Thomé Teixeira da Silva, L. V., Vegas, M., Aquaroni Ricci, N., Cardoso de Sá, C. S., & Alouche, S. R. (2022). Selecting assessment tools to characterize upper limb function of children with cerebral palsy: A mega-review of systematic reviews. Developmental neurorehabilitation, 25(6), 378–391. https://doi.org/10.1080/17518423.2022.2046656	Wrong study design (review van systematic reviews)
Tofani, M., Castelli, E., Sabbadini, M., Berardi, A., Murgia, M., Servadio, A., & Galeoto, G. (2020). Examining Reliability and Validity of the Jebsen-Taylor Hand Function Test Among Children With Cerebral Palsy. Perceptual and motor skills, 127(4), 684–697. https://doi.org/10.1177/0031512520920087	Wrong intervention (JTHFT)
Wagner, L. V., & Davids, J. R. (2012). Assessment tools and classification systems used for the upper extremity in children with cerebral palsy. Clinical orthopaedics and related research, 470(5), 1257–1271. https://doi.org/10.1007/s11999-011-2065-x	Descriptive SR: individual study data cannot be retrieved, provides no absolute data about validity/reliability
Wallen, M., & Stewart, K. (2015). Upper limb function in everyday life of children with cerebral palsy: description and review of parent report measures. Disability and rehabilitation, 37(15), 1353–1361. https://doi.org/10.3109/09638288.2014.963704	Descriptive SR: individual study data cannot be retrieved, provides no absolute data about validity/reliability
Wallen M, Stewart K. Grading and Quantification of Upper Extremity Function in Children with Spasticity. Semin Plast Surg. 2016 Feb;30(1):5-13. doi: 10.1055/s-0035-1571257.	Wrong study design (not a systematic review)
Wang TN, Liang KJ, Liu YC, Shieh JY, Chen HL. Psychometric and Clinimetric Properties of the Melbourne Assessment 2 in Children With Cerebral Palsy. Arch Phys Med Rehabil. 2017 Sep;98(9):1836-1841. doi: 10.1016/j.apmr.2017.01.024. Epub 2017 Feb 28. PMID: 28254639.	Wrong intervention (Melbourne Assessment)

Verantwoording

Beoordelingsdatum en geldigheid

Laatst beoordeeld : 09-08-2024

Autorisatie Nederlandse Vereniging van Revalidatieartsen onder voorbehoud van goedkeuring door de ALV.

Initiatief en autorisatie

Initiatief:

Nederlandse Vereniging van Revalidatieartsen

Geautoriseerd door:

Koninklijk Nederlands Genootschap voor Fysiotherapie
Nederlandse Orthopaedische Vereniging
Nederlandse Vereniging van Revalidatieartsen
Nederlandse Vereniging voor Neurochirurgie
Nederlandse Vereniging voor Neurologie
Nederlandse Vereniging voor Plastische Chirurgie
Ergotherapie Nederland
Nederlandse Vereniging voor Kinderfysiotherapie
CP-Net
CP Nederland

Algemene gegevens

De ontwikkeling/herziening van deze richtlijnmodule werd ondersteund door het Kennisinstituut van de Federatie Medisch Specialisten (www.demedischspecialist.nl/kennisinstituut) en werd gefinancierd uit de Kwaliteitsgelden Medisch Specialisten (SKMS). Patiëntenparticipatie bij deze richtlijn werd medegefinancierd uit de Kwaliteitsgelden Patiënten Consumenten (SKPC) binnen het programma KIDZ.

De financier heeft geen enkele invloed gehad op de inhoud van de richtlijnmodule.

Samenstelling werkgroep

Voor het ontwikkelen van de richtlijnmodule is in 2022 een multidisciplinaire werkgroep ingesteld, bestaande uit vertegenwoordigers en ervaringsdeskundigen van alle relevante specialismen (zie hiervoor de Samenstelling van de werkgroep) die betrokken zijn bij de zorg voor kinderen met cerebrale parese.

Werkgroep

Prof. dr. A.I. Buizer, (kinder)revalidatiearts, VRA
Dr. M.W. Alsem, (kinder)revalidatiearts, VRA
Dr. M.J. Nederhand, (kinder)revalidatiearts , VRA
Drs. R.A. van Stralen, orthopedisch chirurg, NOV
Prof. dr. R.J. Vermeulen, (kinder)neuroloog, NVN
Dr. K.M. Slot, (kinder)neurochirurg, NVvN
Dr. M.C. Obdeijn, plastisch chirurg, NVPC
Dr. E.A.A. Rameckers, onderzoeker en kinderfysiotherapeut, KNGF/NVFK
Dr. P.B.M. Aarts, onderzoeker en ergotherapeut, EN
Dr. M. Ketelaar, senior onderzoeker, persoonlijke titel
Drs. M.G. van Driel-Boerrigter, voorzitter, CP Nederland
Ing. E.P.E. Beije, bestuurslid en penningmeester, CP Nederland

Klankbordgroep

Dr. C.J.I. Raats, CP-Net
Dr. C.A. van Nieuwenhoven, plastisch chirurg
Dr. J. Verhof, plastisch chirurg en handchirurg
Drs. T. Tempelman, plastisch chirurg

Met ondersteuning van

Dr. M. den Ouden – Vierwind, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Drs. F. Ham, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Drs. L. van Wijngaarden, junior adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Dr. L. Oostendorp, adviseur, Kennisinstituut van de Federatie Medisch Specialisten

Belangenverklaringen

De Code ter voorkoming van oneigenlijke beïnvloeding door belangenverstrengeling is gevolgd. Alle werkgroepleden hebben schriftelijk verklaard of zij in de laatste drie jaar directe financiële belangen (betrekking bij een commercieel bedrijf, persoonlijke financiële belangen, onderzoeksfinanciering) of indirecte belangen (persoonlijke relaties, reputatiemanagement) hebben gehad. Gedurende de ontwikkeling of herziening van een module worden wijzigingen in belangen aan de voorzitter doorgegeven. De belangenverklaring wordt opnieuw bevestigd tijdens de commentaarfase.

Een overzicht van de belangen van werkgroepleden en het oordeel over het omgaan met eventuele belangen vindt u in onderstaande tabel. De ondertekende belangenverklaringen zijn op te vragen bij het secretariaat van het Kennisinstituut van de Federatie Medisch Specialisten.

Lid	Functie	Nevenfuncties	Gemelde belangen	Ondernomen actie
*Werkgroep*
Prof. dr. A.I. Buizer	(Kinder)revalidatiearts Amsterdam UMC	Voorzitter bestuur Dutch Academy of Childhood Disability (DACD)	* MegaMuscle: observationeel onderzoek naar effecten van interventies op spiereigenschappen bij cerebrale parese. Deels gefinancierd door onderzoeksinstituut Amsterdam Movement Sciences, deels door liefdadigheids-fondsen: Johanna Kinderfonds, en Phelps Stichting voo Spasticy (niet-commercieel). Betrokken als Projectleider. * Co-auteur van diverse wetenschappelijke artikelen over cerebrale parese, die mogelijk in de richtlijn zullen worden opgenomen. * Financiering aangevraagd bij BeNeFIT voor een studie naar SDR vs ITB bij niet-ambulante kinderen met CP (nog niet toegekend). * Betrokkenheid Power2Walk studie.	Geen restricties.
Dr. M.W. Alsem	(Kinder)revalidaitearts UMC Utrecht	* Co-editor tijdschrift Child: Care, Health and Development (betaald) * Lid onderzoeksconsortium TCU: Bestendig op weg naar (t)huis (onbetaald)	* ZonMW: Bestendig op weg naar (t)huis * Betrokken geweest bij ontwikkeling Kwaliteitsstandaard Psychosociale zorg in de kinderrevalidatie	Geen restricties.
Dr. M.J. Nederhand	* (Kinder)revalidatiearts 0.8: Roessingh, Centrum voor Revalidatie * Senior onderzoeker 0.1: Roessingh. Research and Development	Geen.	Geen.	Geen restricties.
Drs. R.A. van Stralen	Orthopedisch chirurg in het ErasmusMC.	Geen.	* Financiering toegekend van de for Wishdom foundation (restricted grant) voor 2 studies, getiteld 'Guided growth van het proximale femur bij kinderen met CP'. Betrokken als projectleider samen met Jaap Tolk. * Toename van eigen expertise op (deel)gebied waar het advies/richtlijn zich op richt. * Vernieuwde aanpak van eigen organisatie. * Boegbeeldfunctie bij een patiënten- of beroepsorganisatie.	Geen restricties.
Prof. dr. R.J. Vermeulen	* (Kinder)neuroloog bij Academisch ziekenhuis Maastricht. * Hoogleraar kinderneurologie, Universiteit Maastricht.	Geen.	* Revalidatie NL - Iunilaterale cerebrale parese, functionele electrische stimulatie, verbeteren van lopen. Betrokken als projectleider.. * Stchting Vooruit -Behandeling van dystonie bij kinderen. Betrokken als projectleider. * Stichting Janivo - Behandeling van dystonie bij kinderen. Geen projectleider. * Lid van de "general management committee" van de European academy of child hood disability (EACD). Wij hebben richtlijn ontwikkeling maar classificatie van visuele stoornissen gesponsort. * Aanwezig bij gebruikersdag voor professionals die werken met baclofen pompen (2-12-2022), gefinancierd door NVN (zonder invloed van MEdtronic op de presentaties), vergoeding voor tijd aan het ziekenhuis. * Betrokkenheid Power2Walk studie.	Geen restricties.
Dr. K.M. Slot	(Kinder)neurochirurg Amsterdam UMC.	Geen.	* Financiering aangevraagd bij BeNeFIT voor een studie naar SDR vs ITB bij niet-ambulante kinderen met CP (nog niet toegekend).	Geen restricties.
Dr. M.C. Obdeijn	Plastisch chirurg in het Amsterdam UMC.	* Opleider * Lid van het Concillium Plastico Chirurgicum (onbetaald). *Lid van Raad Opleidingen (onbetaald).	Door deel te nemen aan de richtlijn commissie kan mijn reputatie en bekendheid als ervaren CP chirurg toenemen.	Geen restricties.
Dr. E.A.A. Rameckers	Senior onderzoeker en kinderfysiotherapeut Adelante revalidatie, Hasselt universiteit 25%.	* Werkzaam Universiteit Maastricht (betaald). * Senior onderzoeker Adelante kenniscentrum (betaald).	* Projectleider bij veel kinderrevalidatieprojecten: Wij wheelen mee, Power2walk, klaar om te eten, Promis studie. * Ik ben co-auteur van diverse wetenschappelijke artikelen over cerebrale parese, die mogelijk in de richtlijn zullen worden opgenomen.	Geen restricties.
Dr. P.B.M. Aarts	(Pre-pensioen) als hoofd unit Kinderrevalidatie van de Sint Maartenskliniek.	* Voorzitter bestuur stichting CP net (onbetaald). * Senior onderzoeker bij 3 nog lopende promotieonderzoeken (onbetaald). *Scholing (betaald vanuit VOF EDUTIVEAA).	Co-auteur van diverse wetenschappelijke artikelen over arm-hand diagnostiek en behandeling bij kinderen met cerebrale parese, waarvan er mogelijk iets in de richtlijn genoemd wordt.	Geen restricties.
Dr. M. Ketelaar	Senior onderzoeker Kenniscentrum Revalidatiegeneeskunde, UMC Utrecht en De Hoogstraat Revalidatie.	Bestuurslid CP-Net (onbetaald).	* Projectleider diverse studies bij kinderen en jongeren met CP. * Co-auteur van diverse wetenschappelijke artikelen over CP, die mogelijk in de richtlijn zullen worden opgenomen (gefinancierd door ‘neutrale’ subsidiegevers, zoals ZonMw). * Lid van Committee Education and Training van de European Academy of Childhood Disability (EACD).	Geen restricties.
Drs. M.G. van Driel-Boerrigter	* Voorzitter van CP Nederland (onbezoldigd) * Bestuurslid CP-Net (onbezoldigd)	Geen.	Geen.	Geen restricties.
Ing. E.P.E. Beije	–Penningmeester bij CP-Nederland (onbetaald).	Geen.	* Zoon 10 jaar heeft CP.	Geen restricties.
*Klankbordgroep*
Dr. C.J.I. Raats	Projectcoördinator, Stichting CP-Net.	Zelfstandig adviseur/ trainer/ projectleider (ZZP) op het gebied van kwaliteit van zorg en patiëntgerichte zorg voor diverse opdrachtgevers, zoals zorginstellingen, patiëntenorganisaties, beroepsorganisaties, brancheorganisaties, kennisinstituten.	Stichting CP-Net houdt zich o.a. bezig met de implementatie van de richtlijn CP.	Geen restricties.

Inbreng patiëntenperspectief

Er werd aandacht besteed aan het patiëntenperspectief door een afgevaardigde van de patiëntenvereniging CP Nederland in de werkgroep uit te nodigen. De verkregen input is meegenomen bij het opstellen van de uitgangsvragen, de keuze voor de uitkomstmaten en bij het opstellen van de overwegingen (zie kop Waarden en voorkeuren van patiënten). De conceptrichtlijn is tevens voor commentaar voorgelegd aan deelnemers van de patiëntenverenigingen en de eventueel aangeleverde commentaren zijn bekeken en verwerkt.

Kwalitatieve raming van mogelijke financiële gevolgen in het kader van de Wkkgz

Bij de richtlijnmodule is conform de Wet kwaliteit, klachten en geschillen zorg (Wkkgz) een kwalitatieve raming uitgevoerd om te beoordelen of de aanbevelingen mogelijk leiden tot substantiële financiële gevolgen. Bij het uitvoeren van deze beoordeling is de richtlijnmodule op verschillende domeinen getoetst (zie het stroomschema op de Richtlijnendatabase).

Module	Uitkomst raming	Toelichting
Module Evaluatie handvaardigheid	Geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.

Werkwijze

AGREE

Deze richtlijnmodule is opgesteld conform de eisen vermeld in het rapport Medisch Specialistische Richtlijnen 2.0 van de adviescommissie Richtlijnen van de Raad Kwaliteit. Dit rapport is gebaseerd op het AGREE II instrument (Appraisal of Guidelines for Research & Evaluation II; Brouwers, 2010).

Knelpuntenanalyse en uitgangsvragen

Tijdens de voorbereidende fase inventariseerde de werkgroep de knelpunten in de zorg voor kinderen met cerebrale parese. De werkgroep beoordeelde de aanbevelingen uit de eerdere richtlijn (VRA, 2018) op noodzaak tot revisie aan de hand van een onderhoudsplan die in 2021 is opgesteld voor deze richtlijn. Tevens zijn er knelpunten aangedragen door CP-Nederland, CP-Net, EN, KNGF/NVFK, NOV, NVD, NVK, NVLF, NVPC, RN, en VRA via een invitational conference. Een verslag hiervan is opgenomen onder aanverwante producten.

Op basis van de uitkomsten van de knelpuntenanalyse zijn door de werkgroep concept-uitgangsvragen opgesteld en definitief vastgesteld.

Uitkomstmaten

Na het opstellen van de zoekvraag behorende bij de uitgangsvraag inventariseerde de werkgroep welke uitkomstmaten voor de patiënt relevant zijn, waarbij zowel naar gewenste als ongewenste effecten werd gekeken. Hierbij werd een maximum van acht uitkomstmaten gehanteerd. De werkgroep waardeerde deze uitkomstmaten volgens hun relatieve belang bij de besluitvorming rondom aanbevelingen, als cruciaal (kritiek voor de besluitvorming), belangrijk (maar niet cruciaal) en onbelangrijk. Tevens definieerde de werkgroep tenminste voor de cruciale uitkomstmaten welke verschillen zij klinisch (patiënt) relevant vonden.

Methode literatuursamenvatting

Een uitgebreide beschrijving van de strategie voor zoeken en selecteren van literatuur is te vinden onder ‘Zoeken en selecteren’ onder Onderbouwing. Indien mogelijk werd de data uit verschillende studies gepoold in een random-effects model. Review Manager 5.4 werd gebruikt voor de statistische analyses. De beoordeling van de kracht van het wetenschappelijke bewijs wordt hieronder toegelicht.

Beoordelen van de kracht van het wetenschappelijke bewijs

De kracht van het wetenschappelijke bewijs werd bepaald volgens de GRADE-methode. GRADE staat voor ‘Grading Recommendations Assessment, Development and Evaluation’ (zie http://www.gradeworkinggroup.org/). De basisprincipes van de GRADE-methodiek zijn: het benoemen en prioriteren van de klinisch (patiënt) relevante uitkomstmaten, een systematische review per uitkomstmaat, en een beoordeling van de bewijskracht per uitkomstmaat op basis van de acht GRADE-domeinen (domeinen voor downgraden: risk of bias, inconsistentie, indirectheid, imprecisie, en publicatiebias; domeinen voor upgraden: dosis-effect relatie, groot effect, en residuele plausibele confounding).

GRADE onderscheidt vier gradaties voor de kwaliteit van het wetenschappelijk bewijs: hoog, redelijk, laag en zeer laag. Deze gradaties verwijzen naar de mate van zekerheid die er bestaat over de literatuurconclusie, in het bijzonder de mate van zekerheid dat de literatuurconclusie de aanbeveling adequaat ondersteunt (Schünemann, 2013; Hultcrantz, 2017).

Aangezien deze richtlijn een niet veelvoorkomende heterogene aandoening betreft, en tevens behandelingen beschrijft die voor een selecte groep van deze populatie van toepassing is, betreffen de geïncludeerde studies vaak onvoldoende power. Bij de beoordeling van de bewijskracht is er daarom in veel gevallen gedowngraded voor imprecisie (tot GRADE low of very low). Hiermee kunnen literatuurconclusies geen duidelijke richting geven aan de besluitvorming. In het kader van passend bewijs is de werkgroep van mening dat dit GRADE low in veel gevallen de sterkste evidentie is in deze richtlijn.

GRADE	Definitie
Hoog	er is hoge zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is zeer onwaarschijnlijk dat de literatuurconclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Redelijk	er is redelijke zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is mogelijk dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Laag	er is lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; er is een reële kans dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Zeer laag	er is zeer lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; de literatuurconclusie is zeer onzeker.

Bij het beoordelen (graderen) van de kracht van het wetenschappelijk bewijs in richtlijnen volgens de GRADE-methodiek spelen grenzen voor klinische besluitvorming een belangrijke rol (Hultcrantz, 2017). Dit zijn de grenzen die bij overschrijding aanleiding zouden geven tot een aanpassing van de aanbeveling. Om de grenzen voor klinische besluitvorming te bepalen moeten alle relevante uitkomstmaten en overwegingen worden meegewogen. De grenzen voor klinische besluitvorming zijn daarmee niet één op één vergelijkbaar met het minimaal klinisch relevant verschil (Minimal Clinically Important Difference, MCID). Met name in situaties waarin een interventie geen belangrijke nadelen heeft en de kosten relatief laag zijn, kan de grens voor klinische besluitvorming met betrekking tot de effectiviteit van de interventie bij een lagere waarde (dichter bij het nuleffect) liggen dan de MCID (Hultcrantz, 2017).

Overwegingen (van bewijs naar aanbeveling)

Om te komen tot een aanbeveling zijn naast (de kwaliteit van) het wetenschappelijke bewijs ook andere aspecten belangrijk en worden meegewogen, zoals aanvullende argumenten uit bijvoorbeeld de biomechanica of fysiologie, waarden en voorkeuren van patiënten, kosten (middelenbeslag), aanvaardbaarheid, haalbaarheid en implementatie. Deze aspecten zijn systematisch vermeld en beoordeeld (gewogen) onder het kopje ‘Overwegingen’ en kunnen (mede) gebaseerd zijn op expert opinion. Hierbij is gebruik gemaakt van een gestructureerd format gebaseerd op het evidence-to-decision framework van de internationale GRADE Working Group (Alonso-Coello, 2016a; Alonso-Coello 2016b). Dit evidence-to-decision framework is een integraal onderdeel van de GRADE methodiek.

Formuleren van aanbevelingen

De aanbevelingen geven antwoord op de uitgangsvraag en zijn gebaseerd op het beschikbare wetenschappelijke bewijs en de belangrijkste overwegingen, en een weging van de gunstige en ongunstige effecten van de relevante interventies. De kracht van het wetenschappelijk bewijs en het gewicht dat door de werkgroep wordt toegekend aan de overwegingen, bepalen samen de sterkte van de aanbeveling. Conform de GRADE-methodiek sluit een lage bewijskracht van conclusies in de systematische literatuuranalyse een sterke aanbeveling niet a priori uit, en zijn bij een hoge bewijskracht ook zwakke aanbevelingen mogelijk (Agoritsas, 2017; Neumann, 2016). De sterkte van de aanbeveling wordt altijd bepaald door weging van alle relevante argumenten tezamen. De werkgroep heeft bij elke aanbeveling opgenomen hoe zij tot de richting en sterkte van de aanbeveling zijn gekomen.

In de GRADE-methodiek wordt onderscheid gemaakt tussen sterke en zwakke (of conditionele) aanbevelingen. De sterkte van een aanbeveling verwijst naar de mate van zekerheid dat de voordelen van de interventie opwegen tegen de nadelen (of vice versa), gezien over het hele spectrum van patiënten waarvoor de aanbeveling is bedoeld. De sterkte van een aanbeveling heeft duidelijke implicaties voor patiënten, behandelaars en beleidsmakers (zie onderstaande tabel). Een aanbeveling is geen dictaat, zelfs een sterke aanbeveling gebaseerd op bewijs van hoge kwaliteit (GRADE gradering HOOG) zal niet altijd van toepassing zijn, onder alle mogelijke omstandigheden en voor elke individuele patiënt.

Implicaties van sterke en zwakke aanbevelingen voor verschillende richtlijngebruikers

	Sterke aanbeveling	Zwakke (conditionele) aanbeveling
Voor patiënten	De meeste patiënten zouden de aanbevolen interventie of aanpak kiezen en slechts een klein aantal niet.	Een aanzienlijk deel van de patiënten zouden de aanbevolen interventie of aanpak kiezen, maar veel patiënten ook niet.
Voor behandelaars	De meeste patiënten zouden de aanbevolen interventie of aanpak moeten ontvangen.	Er zijn meerdere geschikte interventies of aanpakken. De patiënt moet worden ondersteund bij de keuze voor de interventie of aanpak die het beste aansluit bij zijn of haar waarden en voorkeuren.
Voor beleidsmakers	De aanbevolen interventie of aanpak kan worden gezien als standaardbeleid.	Beleidsbepaling vereist uitvoerige discussie met betrokkenheid van veel stakeholders. Er is een grotere kans op lokale beleidsverschillen.

Organisatie van zorg

In de knelpuntenanalyse en bij de ontwikkeling van de richtlijnmodule is expliciet aandacht geweest voor de organisatie van zorg: alle aspecten die randvoorwaardelijk zijn voor het verlenen van zorg (zoals coördinatie, communicatie, (financiële) middelen, mankracht en infrastructuur). Randvoorwaarden die relevant zijn voor het beantwoorden van deze specifieke uitgangsvraag zijn genoemd bij de overwegingen. Meer algemene, overkoepelende, of bijkomende aspecten van de organisatie van zorg worden behandeld in de module Organisatie van zorg.

Commentaar- en autorisatiefase

De conceptrichtlijnmodule werd aan de betrokken (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd ter commentaar. De commentaren werden verzameld en besproken met de werkgroep. Naar aanleiding van de commentaren werd de conceptrichtlijnmodule aangepast en definitief vastgesteld door de werkgroep. De definitieve richtlijnmodule werd aan de deelnemende (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd voor autorisatie en door hen geautoriseerd dan wel geaccordeerd.

Literatuur

Agoritsas T, Merglen A, Heen AF, Kristiansen A, Neumann I, Brito JP, Brignardello-Petersen R, Alexander PE, Rind DM, Vandvik PO, Guyatt GH. UpToDate adherence to GRADE criteria for strong recommendations: an analytical survey. BMJ Open. 2017 Nov 16;7(11):e018593. doi: 10.1136/bmjopen-2017-018593. PubMed PMID: 29150475; PubMed Central PMCID: PMC5701989.

Alonso-Coello P, Schünemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Rada G, Rosenbaum S, Morelli A, Guyatt GH, Oxman AD; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ. 2016 Jun 28;353:i2016. doi: 10.1136/bmj.i2016. PubMed PMID: 27353417.

Alonso-Coello P, Oxman AD, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Vandvik PO, Meerpohl J, Guyatt GH, Schünemann HJ; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines. BMJ. 2016 Jun 30;353:i2089. doi: 10.1136/bmj.i2089. PubMed PMID: 27365494.

Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, Fervers B, Graham ID, Grimshaw J, Hanna SE, Littlejohns P, Makarski J, Zitzelsberger L; AGREE Next Steps Consortium. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010 Dec 14;182(18):E839-42. doi: 10.1503/cmaj.090449. Epub 2010 Jul 5. Review. PubMed PMID: 20603348; PubMed Central PMCID: PMC3001530.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, Alper BS, Meerpohl JJ, Murad MH, Ansari MT, Katikireddi SV, Östlund P, Tranæus S, Christensen R, Gartlehner G, Brozek J, Izcovich A, Schünemann H, Guyatt G. The GRADE Working Group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017 Jul;87:4-13. doi: 10.1016/j.jclinepi.2017.05.006. Epub 2017 May 18. PubMed PMID: 28529184; PubMed Central PMCID: PMC6542664.

Medisch Specialistische Richtlijnen 2.0 (2012). Adviescommissie Richtlijnen van de Raad Kwalitieit. http://richtlijnendatabase.nl/over_deze_site/over_richtlijnontwikkeling.html

Neumann I, Santesso N, Akl EA, Rind DM, Vandvik PO, Alonso-Coello P, Agoritsas T, Mustafa RA, Alexander PE, Schünemann H, Guyatt GH. A guide for health professionals to interpret and use recommendations in guidelines developed with the GRADE approach. J Clin Epidemiol. 2016 Apr;72:45-55. doi: 10.1016/j.jclinepi.2015.11.017. Epub 2016 Jan 6. Review. PubMed PMID: 26772609.

Schünemann H, Brożek J, Guyatt G, et al. GRADE handbook for grading quality of evidence and strength of recommendations. Updated October 2013. The GRADE Working Group, 2013. Available from http://gdt.guidelinedevelopment.org/central_prod/_design/client/handbook/handbook.html.

Zoekverantwoording

Zoekacties zijn opvraagbaar. Neem hiervoor contact op met de Richtlijnendatabase.

Richtlijnendatabase

Cerebrale parese bij kinderen

Cerebrale parese bij kinderen

Evaluatie handvaardigheid

Uitgangsvraag

Aanbeveling

Overwegingen

Onderbouwing

Achtergrond

Conclusies / Summary of Findings

Samenvatting literatuur

Zoeken en selecteren

Referenties

Evidence tabellen

Verantwoording

Beoordelingsdatum en geldigheid

Initiatief en autorisatie

Algemene gegevens

Samenstelling werkgroep

Belangenverklaringen

Inbreng patiëntenperspectief

Werkwijze

Zoekverantwoording

Bijlagen