Klinimetrie na amputatie

Publicatiedatum: 25-11-2020

Beoordeeld op geldigheid: 19-11-2020

Uitgangsvraag

Welke klinische meetinstrumenten zijn geschikt om loopvaardigheid en mobiliteit te evalueren?

Aanbeveling

Aanbeveling-1

Gebruik de L-test, zes minuten wandeltest (6MWT), de Amputee Mobility Predictor with Prosthesis (AMPPRO) en SIGAM/WAP als set om bij patiënten na een beenamputatie de loopvaardigheid en mobiliteit te evalueren.

Aanbeveling-2

Overweeg om naast de capaciteitstesten ook de M-plus of de walking questionnaire af te nemen om ook inzicht te krijgen in de zelfgerapporteerde loopcapaciteit/mobiliteit.

Overwegingen

Voor- en nadelen van de interventie en de kwaliteit van het bewijs

In de literatuursamenvatting is gekozen voor meetinstrumenten die zowel in Nederland als internationaal gebruikt worden om de loopvaardigheid en/of mobiliteit te meten. Van de gekozen instrumenten is de validiteit, betrouwbaarheid, meetfout en responsiviteit onderzocht en de kracht van het bewijs voor deze meeteigenschappen gegradeerd. Wat opvalt is dat er voor geen van de instrumenten gegevens over de responsiviteit beschikbaar zijn. Hiernaast konden er geen conclusies worden getrokken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van de vooraf geselecteerde vragenlijsten, aangezien er geen gevalideerde Nederlandse versie van deze vragenlijsten beschikbaar bleek te zijn. Hier ligt duidelijk een kennislacune.

Hoewel de gekozen meetinstrumenten gebruikt worden om de loopvaardigheid/mobiliteit van de patiënt te kunnen bepalen, lijken de testen verschillende aspecten van deze constructen te meten. Om te komen tot een minimale set meetinstrumenten heeft de werkgroep er daarom voor gekozen om de onderzochte meetinstrumenten verder te clusteren naar de gemeten aspecten van mobiliteit/loopvaardigheid, en per aspect één meetinstrument te selecteren. De gemaakte keuzes worden hieronder nader toegelicht.

De Berg Balance Scale (BB), Four-square step test (FSST), Timed get up and go (TUG) test en de L-test zijn te classificeren als instrumenten waarmee de statische en dynamische balans wordt geëvalueerd. Een adequate balans is voorwaarde scheppend voor loopvaardigheid en mobiliteit. Op basis van de resultaten uit de geïncludeerde studies en de bewijskracht voor de validiteit, betrouwbaarheid en de meetfout, is de L-test te prefereren boven BBS, FSST en TUG test.

De tien meter looptest wordt veel gebruikt om de comfortabele loopsnelheid te bepalen. Er zijn echter geen studies naar de meeteigenschappen van deze test gevonden.

De twee minuten wandeltest (2MWT) en de zes minuten wandeltest (6MWT) zijn testen die vaker worden gebruikt om de loopsnelheid en duurvermogen van de patiënt met een beenprothese te evalueren. Op basis van de literatuursamenvatting is de 6MWT (redelijke GRADE voor de betrouwbaarheid en de gerapporteerde meetfout) te prefereren boven de 2MWT (lage bewijskracht voor de betrouwbaarheid en zeer lage GRADE voor de meetfout).

De Amputee Mobility Predictor with Prosthesis (AMPPRO) kan gezien worden als een testbatterij welke de capaciteit voor verschillende aspecten van mobiliteit en loopvaardigheid van de patiënt met prothese kwantificeert in een score. De construct validiteit van deze test is waarschijnlijk goed, en test is waarschijnlijk ook betrouwbaarheid.

De SIGAM/WAP wordt gezien als een relatief eenvoudig te gebruiken instrument om de algemene loopvaardigheid van de patiënt te classificeren. Uit de literatuursamenvatting komt naar voren dat het niet mogelijk is om conclusies te trekken over de validiteit, de meetfout en de responsiviteit van dit meetinstrument. Daarnaast is het onduidelijk of deze test betrouwbaar is. Het gebruik van deze test is in Nederland is echter behoorlijk ‘ingeburgerd’, de test wordt momenteel ook als prestatie-indicator gebruikt en er is op dit moment is er geen alternatief voor handen. Ondanks de beperkte evidentie raadt de werkgroep het gebruik van de SIGAM/WAP daarom aan.

Met vragenlijsten (zoals de Locomotor Capability Index (LCI), Prosthetic Evaluation Questionnaire (PEQ), walking questionnaire of Prosthetic limb users survey of Mobility (Plus-M)) wordt verhoudingsgewijs meer het perspectief van de patiënt gemeten. Ze focussen meer op contextuele dimensies van loopvaardigheid: ‘hoe tevreden bent u met activiteit X met uw prothese’ (vrij vertaald item uit een van de genoemde lijsten). Voor de Plus-M, de LCI en de PEQ zijn er echter geen gevalideerde Nederlandse versies beschikbaar, en op basis van de literatuur is het dus ook niet mogelijk om een voorkeur voor één van deze lijsten aan te geven. Van deze drie lijsten wordt de Plus-M internationaal gezien waarschijnlijk het meest gebruikt. De walking questionnaire wordt momenteel internationaal en binnen studieverband nog beperkt gebruikt en is daarom niet opgenomen in de literatuursamenvatting. Deze vragenlijst is echter wel in het Nederlands gevalideerd (De Laat, 2012).

Op basis van bovenstaande stelt de werkgroep voor om de L-test, de 6MWT, de AMPRO en de SIGAM/WAP als set af te nemen wanneer er behoefte is om de status van de patiënt in kaart te brengen of wanneer er behoefte is om het effect van een interventie op de loopvaardigheid en/of mobiliteit te kwantificeren. Om meer inzicht te krijgen in de zelfgerapporteerde loopcapaciteit, kan overwogen worden om naast deze capaciteitstesten de Plus-M of de walking questionnaire af te nemen.

Waarden en voorkeuren van patiënten (en eventueel hun verzorgers)

Door de loopvaardigheid en mobiliteit gedurende de behandeling te evalueren, verkrijgen de patiënt en behandelaar inzicht in de voortuitgang van de patiënt gedurende de revalidatie. De uitkomsten van de meetinstrumenten kunnen zo een therapeutisch doel dienen: door het stellen van einddoelen kan patiënt toewerken aan verbetering van de loopvaardigheid en mobiliteit ten opzichte van een startpunt. Daarnaast is het voor de behandelaar mogelijk om met behulp van de resultaten uit de testen interventies verder te fine-tunen, zodat deze nog beter aansluit bij de behoefte van de patiënt. Aangezien de belasting van de gekozen testen relatief laag is, wordt verwacht dat de meeste patiënten positief zullen zijn over het gebruik van de testen om de loopvaardigheid en/of mobiliteit te evalueren. Dit kan versterkt worden door de uitkomsten van de meetinstrumenten aan de patiënt terug te koppelen.

Kosten (middelenbeslag)

Er worden met betrekking tot de kosten geen noemenswaardige voordelen of bezwaren gezien die van invloed zijn op de besluitvorming. In principe kan de gekozen combinatie testen binnen een behandelsessie worden afgenomen. De aanschaf van gevalideerde vragenlijsten kan kosten met zich mee brengen. Echter, deze kosten zijn erg laag in vergelijking met andere benodigde faciliteiten voor behandeling van patiënten met een amputatie (zie behandelkader Beenamputatie (VRA, 2019)) en personeelskosten van behandelaren.

Aanvaardbaarheid voor de overige relevante stakeholders

Vanuit het werkveld was er een reeds langer bestaande behoefte om het gebruik van meetinstrumenten te uniformeren (zie ook het verslag van de invitational conference). Het afnemen van de L-test, de 6MWT, de AMPRO en de SIGAM/WAP geeft een beperkte tijdsinvestering, en de kosten van afname zijn beperkt. De SIGAM/WAP wordt hiernaast al reeds als prestatie-indicator gebruikt. De werkgroep verwacht daarom voor dit aspect geen noemenswaardige bezwaren vanuit het veld die van invloed zijn op de besluitvorming omtrent de keuze voor de te gebruiken meetinstrumenten.

Haalbaarheid en implementatie

De geselecteerde capaciteitstesten worden al in Nederland gebruikt. Voor sommige behandelteams zal het uniformeren aan het voorstel echter betekenen dat zij moeten wisselen van meetinstrument. Het is de verwachting dat dit geen grote bezwaren zal opleveren.

Rationale/ balans tussen de argumenten voor en tegen de interventie

Aanbeveling-1

Op basis van de uitkomsten van de literatuursamenvatting en het feit dat de SIGAM/WAP al ingeburgerd is, raadt de werkgroep aan om de L-test, de 6MWT, de AMPPRO en de SIGAM/WAP als set te gebruiken om de loopvaardigheid en mobiliteit te evalueren.

Aanbeveling-2

Met de L-test, 6MWT, AMPPRO en SIGAM/WAP kan bepaald worden wat de patiënt kan uitvoeren, maar deze maten geven weinig inzicht in de door de patiënt ervaren loopcapaciteit/mobiliteit. Het kan daarom nuttig zijn om de set aan te vullen met een vragenlijst. Helaas is het op basis van de literatuursamenvatting niet mogelijk om één specifieke vragenlijst aan te wijzen. Aangezien de M-plus internationaal vaak gebruikt wordt zou deze vragenlijst voor de hand liggen. Een andere optie betreft de walking questionnaire, deze vragenlijst is in Nederland ontwikkeld maar wordt momenteel nog beperkt gebruikt.

Onderbouwing

Het objectiveren van de loopfunctie en mobiliteit is belangrijk voor de evaluatie en verslaglegging van de status van de patiënt na een beenamputatie over tijd en ter evaluatie van het effect van een behandeling of interventie. Binnen de (geriatrische) revalidatie worden verschillende meetinstrumenten gebruikt om mobiliteit en loopvaardigheid bij deze patiënten vast te stellen. Het kan daarbij gaan om vragenlijsten die de zelfgerapporteerde loopcapaciteit/mobiliteit in kaart brengen en capaciteitstesten (testen die meten wat de patiënten kunnen qua loopvaardigheid/mobiliteit). Er is geen consensus welke van de vragenlijsten en/of capaciteitstesten de voorkeur genieten. Vanuit het werkveld is er echter een reeds langer bestaande behoefte om het gebruik van meetinstrumenten te uniformeren. Het doel van deze module is dan ook het komen tot een core-set van meetinstrumenten voor mensen met een beenamputatie. Voor de uitwerking van deze uitgangsvraag hanteerde werkgroep de volgende definities: loopvaardigheid betreft de kwaliteit en kwantiteit van het lopen van de patiënt met prothese; mobiliteit betreft de verplaatsmogelijkheden van de patiënt met of zonder prothese. Uit de definities blijkt dat er een zekere overlap is tussen loopvaardigheid en mobiliteit.

Special Interest Group Amputation/ Werkgroep Amputatie en Prothesiologie (SIGAM/WAP) voor loopvaardigheid

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit van de SIGAM/WAP bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Laag

GRADE

Het onduidelijk of de SIGAM/WAP een betrouwbaar meetinstrument zou kunnen zijn bij patiënten na een beenamputatie.

Bronnen: (De Laat, 2019; Rommers, 2008)

GRADE

Voor ordinale uitkomsten zijn er geen gangbare parameters om de meetfout te kwantificeren. De meetfout is daarom onbekend.

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de SIGAM/WAP bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Amputee Mobility Predictor with a Prosthesis (AMPPRO) voor loopvaardigheid

Redelijk

GRADE

De (construct) validiteit van de AMPPRO is waarschijnlijk goed bij patiënten na een beenamputatie.

Bron: (Gailey, 2002)

Redelijk

GRADE

De AMPPRO is waarschijnlijk een betrouwbaar meetinstrument bij patiënten na een beenamputatie.

Bronnen: (Gailey, 2002; Resnik 2011)

Zeer laag

GRADE

We zijn onzeker over de gerapporteerde meetfout van de AMPPRO bij patiënten na een beenamputatie. Ook is de meetfout moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bron: (Resnik 2011)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de AMPRO bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Zes minuten wandel test (6MWT) voor loopvaardigheid

Zeer laag

GRADE

Het onduidelijk of de 6MWT een valide meetinstrument is bij patiënten na een beenamputatie.

Bron: (Lin, 2008)

Redelijk

GRADE

De 6MWT is waarschijnlijk een betrouwbaar meetinstrument bij patiënten na een beenamputatie.

Bronnen: (Cox, 2017; Lin, 2008; Resnik, 2011)

Redelijk

GRADE

We hebben een redelijk vertrouwen in de gerapporteerde meetfout van de 6MWT bij patiënten na een beenamputatie, maar deze is moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bronnen: (Cox 2017; Lin, 2008; Resnik, 2011)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de 6MWT bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Twee minuten wandel test (2MWT) voor loopvaardigheid

Zeer laag

GRADE

Het onduidelijk of de 2MWT een valide meetinstrument is bij patiënten na een beenamputatie.

Bron: (Brooks, 2001)

Laag

GRADE

De 2MWT zou mogelijk een betrouwbaar meetinstrument kunnen zijn bij patiënten na een beenamputatie.

Bronnen: (Brooks, 2002; Resnik, 2011)

Zeer laag

GRADE

Het onduidelijk of we op de gerapporteerde meetfout van de 2MWT bij patiënten na een beenamputatie kunnen vertrouwen. Ook is de meetfout moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bron: (Resnik, 2011)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van 2MWT bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Tien meter loop test voor loopvaardigheid

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van de tien meter looptest bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschappen hebben onderzocht.

Timed get up and go (TUG) test voor loopvaardigheid

Laag

GRADE

De TUG-test zou mogelijk een valide instrument kunnen zijn bij patiënten na een beenamputatie.

Bronnen: (Clemens, 2018; Schoppen, 1999)

Redelijk

GRADE

De TUG-test is waarschijnlijk een betrouwbaar meetinstrument bij patiënten na een beenamputatie.

Bronnen: (Clemens, 2018; Resnik, 2011; Schoppen, 1999)

Redelijk

GRADE

We hebben een redelijk vertrouwen in de gerapporteerde meetfout van de TUG-test bij patiënten na een beenamputatie, maar deze is moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bronnen: (Clemens, 2018; Resnik, 2011)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de TUG-test bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

L-test voor loopvaardigheid

Redelijk

GRADE

De validiteit van de L-test is waarschijnlijk goed bij patiënten na een beenamputatie.

Bron: (Deathe, 2005)

Laag

GRADE

De L-test zou mogelijk een betrouwbaar meetinstrument kunnen zijn bij patiënten na een beenamputatie.

Bronnen: (Deathe, 2005; Hunter, 2018)

Laag

GRADE

We hebben een laag vertrouwen in de gerapporteerde meetfout van de L-test bij patiënten na een beenamputatie, maar deze is moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bronnen: (Deathe, 2005; Hunter, 2018)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de L-test bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Four-square step test (FSST) voor loopvaardigheid

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit van de FSST bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Zeer laag

GRADE

Het onduidelijk of de FSST test een betrouwbaar meetinstrument is bij patiënten na een beenamputatie.

Bron: (Cardoso, 2019)

Zeer laag

GRADE

Het is onduidelijk of we op de gerapporteerde meetfout van de FSST kunnen vertrouwen bij patiënten na een beenamputatie. Ook is de meetfout moeilijk te interpreteren doordat het minimaal klinisch belangrijke verschil voor patiënten onbekend is.

Bron: (Cardoso, 2019)

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de FSST bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Adem gas analyse voor bepaling energieverbruik tijdens het lopen

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van adem gas analyse bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschappen hebben onderzocht.

Berg Balance Scale (BBS) voor mobiliteit

Zeer laag

GRADE

Het onduidelijk of de BBS een valide meetinstrument is bij patiënten na een beenamputatie.

Bron: (Major, 2013)

Zeer laag

GRADE

Het onduidelijk of de BBS een intern consistent meetinstrument is bij patiënten na een beenamputatie.

Bron: (Major, 2013)

Laag

GRADE

De BBS zou mogelijk een betrouwbaar meetinstrument kunnen zijn bij patiënten na een beenamputatie wanneer de totale score van het instrument gebruikt wordt.

Bronnen: (Major, 2013; Wong, 2014)

GRADE

Het is niet mogelijk om een conclusie te trekken over de meetfout van de BBS bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

GRADE

Het is niet mogelijk om een conclusie te trekken over de responsiviteit van de BBS bij patiënten na een beenamputatie. Er zijn geen studies die deze meeteigenschap hebben onderzocht.

Locomotor Capability Index (LCI) voor mobiliteit

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van een Nederlandse vertaling van de LCI bij patiënten na een beenamputatie.

Prosthetic limb users survey of Mobility (Plus-M) voor mobiliteit

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van een Nederlandse vertaling van de Plus-M bij patiënten na een beenamputatie.

Prosthetic Profile of the Amputee (PPA) voor mobiliteit

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van een Nederlandse vertaling van de PPA bij patiënten na een beenamputatie.

Prosthetic Evaluation Questionnaire (PEQ) voor mobiliteit

GRADE

Het is niet mogelijk om een conclusie te trekken over de validiteit, betrouwbaarheid, meetfout en responsiviteit van een Nederlandse vertaling van de PEQ bij patiënten na een beenamputatie.

Beschrijving studies: Loopvaardigheid

SIGAM/WAP

Rommers (2008) vertaalde de SIGAM naar het Nederlands (SIGAM/WAP) en onderzocht de interbeoordelaarsbetrouwbaarheid op twee manieren. Er werden 20 deelnemers door twee beoordelaars getest en er werden twee patiënt casuïstieken beoordeeld door 120 beoordelaars. De 20 deelnemers werden niet expliciet beschreven, behalve dat er deelnemers met een transtibiale amputatie (TTA), knie-exarticulatie (KE) en transfemorale amputatie (TFA) in de groep zaten. Ook waren er meerdere leeftijdsgroepen onder de 20 deelnemers vertegenwoordigd.

De Laat (2019) onderzocht de intrabeoordelaarsbetrouwbaarheid (test-hertest) van de SIGAM/WAP bij 80 deelnemers. De test-hertest betrouwbaarheid werd berekend voor zowel stabiele deelnemers, als niet stabiele deelnemers. Deelnemers hadden een gemiddelde leeftijd van 61 jaar (SD: 15) en de amputaties hadden zowel een vasculaire (n=61) als non-vasculaire (n=19) etiologie. Van de 80 deelnemers hadden 70 deelnemers een unilaterale amputatie, tegenover 10 deelnemers met een bilaterale amputatie. Het niveau van amputatie was TFA of KE (n=26), of TTA of Syme (n=44). De test-hertest betrouwbaarheid werd op het einde van de rehabilitatieopname onderzocht, met de hertest na drie weken door dezelfde beoordelaar. Deelnemers werden gevraagd om aan te geven of hun loopvaardigheid in de drie weken was veranderd.

AMPPRO

Gailey (2002) onderzocht de construct validiteit (bij n=167 deelnemers), inter- en intrabeoordelaarsbetrouwbaarheid van de AMPPRO (bij n=24 deelnemers). Deelnemers met een bilaterale amputatie werden voor de constructvaliditeit geëxcludeerd, maar mochten wel deelnemen aan de betrouwbaarheidsstudie. In de validiteitsstudie namen er 86 mannen en 81 vrouwen deel. Het amputatieniveau (unilateraal TTA (n=10), unilateraal TFA (n=8), bilateraal (n=6)), het geslacht (10 mannen, 14 vrouwen), en de gemiddelde leeftijd (68,3 jaar (SD: 17,98)) werden beschreven voor deelnemers aan de betrouwbaarheidsstudie. Gailey (2002) onderzocht de construct validiteit door te stellen dat de AMPPRO tussen de verschillende K-Levels zou moeten kunnen discrimineren. Ook werden correlaties met de Amputee Activity Survey en de zes minuten wandeltest onderzocht. De deelnemers vulden de Amputee Activity Survey in en werden geclassificeerd in een MFCL-niveau. De AMPPRO werd vervolgens afgenomen op een zelf-geselecteerde snelheid en daarna werd een zes minuten wandeltest uitgevoerd. Zowel op de eerste als op de tweede testdag (binnen 21 dagen na de eerste testdag) werd de AMPPRO beoordeeld door twee beoordelaars.

Resnik (2011) rekruteerde 44 deelnemers met een unilaterale beenamputatie (TFA n=23, KE n=2, TTA n=19) met een gemiddelde leeftijd van 66 (SD: 13) jaar. Van de 44 deelnemers waren er 42 mannelijke en twee vrouwelijke deelnemers. De intrabeoordelaarsbetrouwbaarheid en de meetfout van de AMP werden onderzocht. Hoewel het niet expliciet is gemaakt dat de AMPPRO werd afgenomen, lijkt het door de beschreven procedures hoogstwaarschijnlijk dat de AMPPRO werd afgenomen bij deelnemers die de prothese droegen. Deelnemers werden binnen een week na de eerste afname opnieuw getest. Ook werden er een aantal andere capaciteitstesten afgenomen (i.e. zes minuten wandel test, AMP). Het was niet duidelijk hoeveel (rust)tijd er tussen deze capaciteitstesten op de meetdagen zat, maar de capaciteitstesten werden gezamenlijk in ongeveer 30 tot 45 minuten afgenomen.

Zes minuten wandel test (6MWT)

Lin (2008) onderzocht de construct validiteit, de test-hertest betrouwbaarheid, en de meetfout van de 6MWT. Voor de 6MWT werden deelnemers geïnstrueerd om in een gang heen en weer te lopen over een afstand van 45,72 meter (150 voet). De testen die werden afgenomen voor de construct validiteit waren de ‘Timed Up and Go’ (TUG) test en de ‘Single leg balance’ test. Deze testen werden twee weken na de 6MWT afgenomen. De tweede en derde test van de 6MWT (hertest) vonden op dezelfde dag plaats, met 20 minuten rust tussen de testen. Deelnemers hadden een gemiddelde leeftijd van 46 (SD: 14,8) jaar. Er namen negen mannen en vier vrouwen deel met een gemiddelde prothese-ervaring van 7,61 (SD: 9,25) jaar. Van de deelnemers waren er negen met een traumatische oorzaak van amputatie en vier met een vasculaire oorzaak.

Resnik (2011) rekruteerde 44 deelnemers met een unilaterale beenamputatie (TFA n=23, KE n=2, TTA n=19) met een gemiddelde leeftijd van 66 (SD: 13) jaar. Van de 44 deelnemers waren er 42 mannelijke en twee vrouwelijke deelnemers. De intrabeoordelaarsbetrouwbaarheid en de meetfout van de 6MWT werden onderzocht. De 6MWT werd in een gang of open ruimte afgenomen, welke groter was dan 30,5 meter. Het is te veronderstellen dat de deelnemers werden gevraagd heen en weer te lopen over een afstand van ongeveer 30 meter, hoewel dit niet expliciet benoemd werd. Deelnemers werden binnen een week na de eerste afname opnieuw getest. Ook werden er een aantal andere capaciteitstesten afgenomen. Het was niet duidelijk hoeveel (rust)tijd er tussen deze capaciteitstesten op de meetdagen zat, maar de capaciteitstesten werden gezamenlijk in ongeveer 30 tot 45 minuten afgenomen.

Cox (2017) onderzocht de intrabeoordelaarsbetrouwbaarheid en meetfout van de 6MWT bij mensen met een beenamputatie (vermoedelijk met een uni- of bilaterale TTA of een unilaterale TFA). Er werden twee configuraties van de 6MWT onderzocht (heen en weer, over een afstand van 20 meter, en rechthoekig in een opstelling van zes bij vier meter). In de laatste drie dagen van opname werden deelnemers gevraagd om op twee verschillende dagen beide configuraties te lopen. Er zat 30 minuten rust tussen de testen op één dag. Er namen 18 mannen en zeven vrouwen (gemiddelde leeftijd: 63,1 jaar (SD: 13,8)) deel aan de studie. De oorzaak van amputatie was diabetes (n=9), vasculair (n=7), traumatisch (n=4), of anders (n=5). Deelnemers hadden een gemiddelde revalidatietijd van 32,3 (SD: 11,1) dagen.

Twee minuten wandel test (2MWT)

Brooks (2001) onderzocht de construct validiteit van de 2MWT aan de hand van de Houghton schaal en de “physical functioning” subschaal van de SF-36. De 2MWT en de SF-36 werden voor het eerst afgenomen binnen 48 uur vóór ontslag en daarna drie maanden na ontslag. De Houghton schaal werd alleen binnen 48 uur vóór ontslag afgenomen. Deelnemers (n=290) hadden een unilaterale TTA (n=179), een unilaterale TFA (n=60), of een bilaterale (n=51) amputatie. De gemiddelde leeftijd was 66,3 (SD: 13,1) jaar.

Brooks (2002) onderzocht de inter- en intrabeoordelaarsbetrouwbaarheid van de 2MWT bij deelnemers met een amputatie van de onderste extremiteit(en). In totaal deden er 33 deelnemers mee (gemiddelde leeftijd: 63,6 (SD: 2), dagen sinds amputatie: 107,8 (SD: 16,1)). Hiervan waren er 23 man en 10 vrouw. Er waren twee beoordelaars die elk op twee opeenvolgende dagen een test beoordeelden, waardoor deelnemers in totaal vier testen ondergingen. Tussen de twee testen op één dag zat 30 minuten rust.

Resnik (2011) rekruteerde 44 deelnemers met een unilaterale beenamputatie (TFA n=23, KE n=2, TTA n=19) met een gemiddelde leeftijd van 66 (SD: 13) jaar. Van de 44 deelnemers waren er 42 mannelijke en twee vrouwelijke deelnemers. De intrabeoordelaarsbetrouwbaarheid en de meetfout van de twee minuten wandeltest werden onderzocht. Deelnemers werden binnen een week na de eerste afname opnieuw getest. Ook werden er een aantal andere capaciteitstesten afgenomen (i.e. Timed Up and Go test, AMP). Het was niet duidelijk hoeveel (rust)tijd er tussen deze capaciteitstesten op de meetdagen zat, maar de capaciteitstesten werden gezamenlijk in ongeveer 30 tot 45 minuten afgenomen. De 2MWT werd afgenomen als onderdeel van de 6MWT: de 6MWT werd afgenomen maar de afgelegde afstand na twee minuten werd als uitslag voor de 2MWT vastgelegd.

Tien meter loop test

Er konden geen studies worden geïncludeerd die (één van) de meeteigenschappen voor de tien meter loop test hebben onderzocht.

Timed get up and go (TUG) test

Schoppen (1999) onderzocht in 32 deelnemers met een unilaterale amputatie door een vasculaire oorzaak de construct validiteit en de inter- en intrabeoordelaarsbetrouwbaarheid. Er werden 23 mannen en negen vrouwen gerekruteerd, met een gemiddelde leeftijd van 73,5 jaar (range 61-86) voor deelnemers met een TTA en 72,4 jaar (range: 68 tot 81) voor deelnemers met een TFA. Construct validiteit werd onderzocht aan de hand van de Groningen Activity Restriction Scale en de “physical mobility” subscore van de Sickness Impact Profile-68. Interbeoordelaarsbetrouwbaarheid werd onderzocht door op één dag twee testen af te nemen bij de deelnemers. Beide testen werden door een andere beoordelaar beoordeeld. Er zaten vijf tot 10 minuten rust tussen de testen. De intrabeoordelaarsbetrouwbaarheid werd beoordeeld met een hertest twee weken na de eerste test. De gemiddelde tijd op de TUG test van de twee testen (afgenomen door twee verschillende beoordelaars) werd gebruikt voor de correlatie met de Groningen Activity Restriction Scale. Het is niet geheel duidelijk of het gemiddelde van deze twee TUG testen ook voor de correlatie tussen de TUG test en de fysieke mobiliteitsschaal van de Sickness Impact Profile-68 gebruikt werd.

Resnik (2011) rekruteerde 44 deelnemers met een unilaterale beenamputatie (TFA n=23, KE n=2, TTA n=19) met een gemiddelde leeftijd van 66 (SD: 13) jaar. Van de 44 deelnemers waren er 42 mannelijke en twee vrouwelijke deelnemers. De intrabeoordelaarsbetrouwbaarheid en de meetfout van de TUG-test werden onderzocht. Uit de beschrijvingen in de procedures lijkt het erop dat de deelnemers op beide meetmomenten één keer de TUG test hebben afgelegd. Deelnemers werden binnen een week na de eerste afname opnieuw getest. Ook werden er een aantal andere capaciteitstesten afgenomen (6MWT, AMP). Het was niet duidelijk hoeveel (rust)tijd er tussen deze capaciteitstesten op de meetdagen zat, maar de capaciteitstesten werden gezamenlijk in ongeveer 30 tot 45 minuten afgenomen.

Clemens (2018) onderzocht zowel de construct validiteit, de test-hertest betrouwbaarheid, als de meetfout bij deelnemers met een unilaterale amputatie van non-vasculaire oorzaak. Deelnemers (n=118 voor validiteit, n=51 voor betrouwbaarheid) hadden een gemiddelde leeftijd van 48,1 (SD: 13,7) jaar. Er waren 64 mannen en 54 vrouwen met een TTA (n=55) of een TFA (n=63). De construct validiteit werd beoordeeld aan de hand van de ‘Prosthetic Limb Users Survey of Mobility’ (Plus-M) en de ‘Activities Specific Balance Scale’. Ook werd geacht dat de TUG test onderscheid kon maken tussen deelnemers met een TTA of TFA. De TUG test werd vier keer op één dag afgenomen met één minuut rust tussen de testen. De deelnemers liepen twee keer met de klok mee en twee keer tegen de klok in. De derde en vierde testafname werden gebruikt voor het berekenen van de test-hertest betrouwbaarheid. De totale tijd werd gebruikt voor de correlatie met de PLUS-M, maar het bleek niet uit de procedurebeschrijving hoe deze totale tijd berekend werd. Voor de bekende groepen validiteit (als onderdeel van construct validiteit) werd de langzaamste testafname gebruikt.

L-test

Deathe (2005) onderzocht de construct validiteit, de inter- en intrabeoordelaarsbetrouwbaarheid, en de meetfout bij deelnemers (n=93) met een unilaterale amputatie met traumatische (n=56) of vasculaire oorzaak (n=37). De gemiddelde leeftijd van de deelnemers was 66,9 (SD: 14,2) jaar. Er werden 73 mannen en 20 vrouwen gerekruteerd die een TTA (n=69) of TFA (n=24) hadden. Construct validiteit werd onderzocht aan de hand van de tien meter looptest, de 2MWT, de Activities-specific Balance Confidence schaal, de Frenchay Activities Index, en de Prosthetic Evaluation Questionnaire-Mobility Subscale (PEQ Mobility Subscale). Er werd geacht dat de L-test kon discrimineren tussen TTA/TFA, vasculair/non-vasculair, leeftijd, en het wel/niet gebruiken van een hulpmiddel. Deelnemers voerden de looptesten uit en vulden de vragenlijsten in. De L-test werd op twee momenten op de eerste dag uitgevoerd en beoordeeld door twee verschillende beoordelaars (interbeoordelaarsbetrouwbaarheid). De hertest vond twee weken later plaats en werd beoordeeld door de eerste beoordelaar (intrabeoordelaarsbetrouwbaarheid). Van de 93 deelnemers kwamen er 75 niet opdagen voor de hertest.

Hunter (2018) onderzocht de intrabeoordelaarsbetrouwbaarheid in drie verschillende patiëntengroepen, namelijk: deelnemers met een TTA door vasculaire oorzaak (n=20, gemiddelde leeftijd: 60,36 (SD: 7,84), aantal mannen: 18, maanden sinds amputatie: 3,49 (SD: 3,61)), met een TTA door non-vasculaire oorzaak (n=20, gemiddelde leeftijd: 55,85 (SD: 14,08), aantal mannen: 17, maanden sinds amputatie: 20,43 (SD: 17,61)), en met een bilaterale of TFA (n=20, gemiddelde leeftijd: 58,21 (SD: 14,88), aantal mannen: 13, maanden sinds amputatie: 15,55 (SD: 15,43)). Eén beoordelaar beoordeelde alle testen. Tussen de test en hertest zat 14 dagen.

Four-square step test (FSST)

Cardoso (2019) onderzocht de test-hertest betrouwbaarheid in deelnemers met een unilaterale amputatie (n=27). De gemiddelde leeftijd was 51 (SD: 21,2) jaar en de oorzaak van amputatie was traumatisch (n=16), dysvasculair (n=3), oncologisch (n=2), infectieus (n=3), congenitaal (n=1), of anders (n=3). Het niveau van amputatie was TFA (n=5), TTA (n=20), KE (n=1), of heup exarticulatie (n=1). De deelnemers voerden de FSST twee keer uit (met rust tussen de testen indien nodig) en werden gevraagd om twee tot vier dagen nadien terug te keren voor een hertest.

Adem gas analyse (als maat voor het energieverbruik van het lopen)

Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor een adem gas analyse.

Beschrijving studies: Mobiliteit

Berg Balance Scale (BBS)

Major (2013) onderzocht de construct validiteit, de interne consistentie en interbeoordelaarsbetrouwbaarheid. Er werden 20 mannen en tien vrouwen gerekruteerd met een gemiddelde leeftijd van 54 (SD: 12) jaar en een gemiddelde ervaring met een prothese van 18 (SD: 14) jaar. Deelnemers hadden een unilaterale TTA (n=13), unilaterale TFA (n=14), bilaterale TTA (n=2), of een bilaterale TTA/TFA (n=1) met een dysvasculaire (n=7), traumatische (n=14), infectieuze (n=6), of congenitale (n=3) oorzaak. Construct validiteit werd onderzocht met behulp van de Activities-specific Balance Confidence scale, de PEQ Mobility Subscale , de Frenchay Activities Index, de L-test, en de 2MWT. Ook werd geacht dat de BBS kon discrimineren tussen mensen met/zonder valangst, tussen TFA/TTA, tussen dysvasculair/non-dysvasculair, tussen wel/geen gebruik van een hulpmiddel, en tussen mensen die wel/niet vielen in de afgelopen 12 maanden. De BBS werd afgenomen door de eerste beoordelaar. Deelnemers mochten 20 minuten zittend rusten alvorens de test voor een tweede keer uit te voeren, welke beoordeeld werd door de tweede beoordelaar.

Wong (2014) onderzocht de inter- en intrabeoordelaarsbetrouwbaarheid bij vier mannen en één vrouw (gemiddelde leeftijd: 53 (SD: 15,7), tijd sinds amputatie: 8,2 jaar (SD: 7,9)). Trials van de deelnemers werden opgenomen op video. Twee beoordelaars beoordeelden deze trials. De opgenomen trials werden later door 16 beoordelaars beoordeeld (inclusief de twee initiële beoordelaars). Intrabeoordelaarsbetrouwbaarheid werd berekend op basis van de scores van de twee initiële beoordelaars. Deze hadden tijdens de oorspronkelijke video opnames de trials al beoordeeld en beoordeelden op een later moment de video opnames als hertest. Interbeoordelaarsbetrouwbaarheid werd berekend op basis van de beoordelingen van de 16 beoordelaars die de video opnamen beoordeelden.

Locomotor Capability Index (LCI)

De LCI is een subschaal van de meer omvangrijke PPA. Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor een Nederlandse versie van de LCI.

Prosthetic limb users survey of mobility (Plus-M)

Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor een Nederlandse versie van de Plus-M.

Prosthetic Profile of the Amputee (PPA)

Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor een Nederlandse versie van de PPA.

Prosthetic Evaluation Questionnaire (PEQ)

Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de Nederlandse versie van de PEQ.

Resultaten

De resultaten die uit de geïncludeerde studies zijn geëxtraheerd werden per instrument en per meeteigenschappen samengevoegd in Tabel 1.

Tabel 1 - Resultaten per instrument per meeteigenschap

Instrument	Meeteigenschap	Auteur	Resultaat	*Risk of bias* beoordeling*	Individu-ele uitkomst beoorde-ling**
LOOPVAARDIGHEID
SIGAM/WAP	Validiteit	Er werden geen studies geïncludeerd die de meetfout van dit meetinstrument onderzochten.
	Betrouwbaarheid	De Laat 2019	Stabiele patiënten uit een revalidatiecentrum: ICC(2,1)=0,79 (95%CI: 0,56-0,90) Stabiele patiënten uit een revalidatie afdeling van een ziekenhuis: ICC(2,1)=0,98 (95%CI: 0,95-0,99) Stabiele patiënten (revalidatiecentrum + revalidatie afdeling): ICC(2,1)=0,90 (95%CI: 0,84-0,94)	Inadequaat	?
	Betrouwbaarheid	Rommers 2008	Twee beoordelaars hadden 100% overeenstemming bij alle 20 patiënten. 118/120 beoordelaars hadden 100% overeenstemming bij 2 patiënt casus. Twee gevallen waarin geen overeenstemming was werden na discussie opgelost, waardoor er uiteindelijk 120/120 100% overeenstemmingen waren.	Inadequaat	?
	Meetfout	Er werden geen studies geïncludeerd die de meetfout van dit meetinstrument onderzochten.
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
AMPPRO	Construct validiteit (convergente validiteit)	Gailey 2002	AMPPRO correlatie met 6MWT, Pearson’s rho: r=0,82 (p<0,0001) AMPRO correlatie met AAS, Pearson’s r: r=0,77 (p<0,0001)	Twijfelachtig	?
	Construct validiteit (discriminatoire / bekende groepen validiteit)	Gailey 2002	Gemiddelde AMPPro score (SD) in K-Level groepen: K 0-1: 25 (7,37) K 2: 34,65 (6,49) K 3: 40,5 (3,9) K 4: 44,67 (1,75) Significant verschil tussen alle groepen (p=0,0001)	Zeer goed	+
	Betrouwbaarheid	Gailey 2002	AMPPRO interbeoordelaars-betrouwbaarheid, dag 1: ICC=0,99 AMPPRO interbeoordelaars-betrouwbaarheid, dag 2: ICC=0,99 AMPPRO intrabeoordelaars-betrouwbaarheid, beoordelaar 1, (dag 1-2): ICC=0,96 AMPPRO intrabeoordelaars-betrouwbaarheid, beoordelaar 2, (dag 1-2): ICC=0,98	Adequaat	+
	Betrouwbaarheid	Resnik 2011	AMPPRO, intrabeoordelaars-betrouwbaarheid, ICC (95%CI): ICC(2,1) = 0,88 (0,79-0,93)	Adequaat	+
	Meetfout	Resnik 2011	AMPPRO, Standard error of Measurement (SEM), score: SEM = 1,5 (i.e. 3,8% van de gemiddelde score van de eerste meting) AMPPRO, Minimal Detectable Change met 90% vertrouwen (MDC), punten: MDC₉₀ = 3,4 (i.e. 8,5% van de gemiddelde score van de eerste meting)	Adequaat	?
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
6MWT	Construct validiteit (convergente validiteit)	Lin 2008	Correlatie tussen TUG en 6MWT: r= -0,76 (p=0,004) Correlatie tussen de gemiddelde score van de single leg balance test en de 6MWT (aangedane zijde, open ogen): r = 0,63 (p-waarde niet gerapporteerd) Correlatie tussen de gemiddelde score van de single leg balance test en de 6MWT (aangedane zijde, ogen gesloten): r = 0,61 (p-waarde niet gerapporteerd)	Inadequaat	+
	Betrouwbaarheid	Lin 2008	ICC van 3 trials: ICC(3,1) = 0,97	Adequaat	+
		Resnik 2011	6MWT, intrabeoordelaars-betrouwbaarheid, ICC (95%CI): ICC(2,1) = 0,97 (0.95-0.99)	Adequaat	+
		Cox 2017	Configuratie 1: ICC=0,97 (95%CI: 0,93-0,98) Configuratie 2: ICC=0,97 (95%CI: 0,94-0,99)	Adequaat	+
	Meetfout	Lin 2008	6MWT Bland-Altman (trial 2 minus trial 1), meter: LoA = -44,6 to 63,5 (i.e. ±9,9% van de gemiddelde afstand van de eerste meting) Gemiddelde = 9,45 (95%CI bevat 0) 6MWT Bland-Altman (trial 3 minus trial 2), meter: LoA = -25,8 to 57,8 (i.e. ±7,5% van de gemiddelde afstand van de eerste meting) Gemiddelde = 16 (95%CI bevat geen 0)	Adequaat	?
		Resnik 2011	6MWT, Standard error of Measurement (SEM), meters: SEM = 63,6 (i.e. 19,2% van de gemiddelde afstand van de eerste meting) 6MWT, Minimal Detectable Change met 90% vertrouwen (MDC), meters: MDC₉₀ = 147,5 (i.e. 44,4% van de gemiddelde afstand van de eerste meting)	Adequaat	?
		Cox 2017	Configuratie 1, Standard Error of Measurement (SEM), meter: SEM=12,6 (i.e. 7,2% van de gemiddelde afstand van de eerste meting) Configuratie 2, Standard Error of Measurement, meter: SEM=12,5 (i.e. 7,8% van de gemiddelde afstand van de eerste meting) Configuratie 1, Minimal Detectable Change (MDC) met 95% vertrouwen, meter: MDC₉₅=34,8 (i.e. 20,1% van de gemiddelde afstand van de eerste meting)	Adequaat	?
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
2MWT	Construct validiteit (convergente validiteit)	Brooks 2001	Correlatie tussen 2MWT en SF-36 bij ontslag: r=0,22 (p=0,008) Correlatie tussen 2MWT en SF-36 tijdens follow-up: r=0,479 (p<0,001) Correlatie tussen 2MWT en Houghton scale bij ontslag: r=0,493 (p<0.001)	Twijfelachtig	-
	Betrouwbaarheid	Brooks 2002	Interbeoordelaarsbetrouwbaarheid bij intramurale patiënten Dag 1: ICC=0,98 Dag 2: ICC=0,98 Interbeoordelaarsbetrouwbaarheid bij poliklinische patiënten Dag 1: ICC=0,98 Dag 2: ICC=0,99 Intrabeoordelaarsbetrouwbaarheid bij intramurale patiënten Rater 1 (dag 1,2): ICC=0,90 Rater 2 (dag 1,2): ICC=0,94 Intrabeoordelaarsbetrouwbaarheid bij poliklinische patiënten Rater 1 (dag 1,2): ICC=0,95 Rater 2 (dag 1,2): ICC=0,96	Adequaat	+
	Betrouwbaarheid	Resnik 2011	2MWT, intrabeoordelaars-betrouwbaarheid, ICC (95%CI): ICC(2,1) = 0,83 (0,71-0,90)	Twijfelachtig	+
	Meetfout	Resnik 2011	2MWT, Standard error of Measurement (SEM), meters: SEM = 48,5 (i.e. 42,5% van de gemiddelde afstand van de eerste meting) 2MWT, Minimal Detectable Change (MDC) met 90% vertrouwen, meters: MDC₉₀ = 112,5 (i.e. 98,7% van de gemiddelde afstand van de eerste meting)	Twijfelachtig	?
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
Tien meter loop test	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de 10 meter looptest.
TUG-test	Construct validiteit (convergente validiteit)	Schoppen 1999	Spearman’s r correlatie tussen TUG en GARS: r=0,39 (p=0,03) Spearman’s r correlatie tussen TUG and SIP68 total score: r=0,40 (significant, p-waarde niet gerapporteerd) Spearman’s r correlatie tussen TUG en SIP-68 subscale “mobility control”: r=0,46 (significant, p-value niet gerapporteerd) Spearman’s r correlatie tussen TUG en SIP68 subschaal “mobility range”: r=0,36 (significant, p-waarde niet gerapporteerd) Spearman’s r correlatie tussen TUG en SIP68 subschaal “Somatic Autonomy”: r=0,28 (niet significant, p-waarde niet gerapporteerd) Spearman’s r correlatie tussen TUG en SIP68 subschaal “Physic autonomy”: r=0,31 (niet significant, p-waarde niet gerapporteerd) Spearman’s r correlatie tussen TUG and SIP68 subschaal “Social behaviour”: r=0,19 (niet significant, p-waarde niet gerapporteerd) Spearman’s r correlatie tussen TUG en SIP68 subschaal “emotional stability”: r= -0,04 (niet significant, p-waarde niet gerapporteerd)	Inadequaat	- (GARS) + (SIPS68)
	Construct validiteit (convergente validiteit)	Clemens 2018	TUG correlatie met PLUS-M, Spearman’s r: r= -0,56 (p<0,001) TUG correlatie ABC, Spearman: r= -0,46 (p<0,01)	Twijfelachtig	+ (PLUS-M) - (ABC)
	Construct validiteit (discriminatoire / bekende groepen validiteit)	Clemens 2018	TUG (totale tijd), het verschil tussen TTA en TFA in seconden (SD): TTA: 10,04 (2,3), range: 6,73-16,8) TFA: 12,77 (5,04), range: 7,83-33,07 Significant verschil tussen groepen (p<0,0001)	Adequaat	+
	Betrouwbaarheid	Schoppen 1999	Intrabeoordelaarsbetrouwbaarheid, Spearman’s r: r=0,93 (p<0,001) Het verschil tussen de gemiddelde scores van de beoordelaar was significant (p=0,047) Interbeoordelaarsbetrouwbaarheid, Spearman’s r: r=0,96 (p<0,001) Geen verschil tussen de gemiddelde scores van de beoordelaars (p=0,31)	Twijfelachtig	+
		Resnik 2011	TUG, intrabeoordelaars-betrouwbaarheid, ICC (95%CI): ICC(2,1) = 0,88 (0,80-0,94)	Adequaat	+
		Clemens 2018	TUG (totale tijd), ICC (95%CI): ICC(2,1)=0,98 (0,97-0,99)	Adequaat	+
	Meetfout	Resnik 2011	TUG, Standard error of Measurement (SEM), seconden: SEM = 1,6 (i.e. 13% van de gemiddelde tijd van de eerste meting) TUG, Minimal Detectable Change (MDC) met 90% vertrouwen, seconds: MDC₉₀ = 3,6 (i.e. 29,2% van de gemiddelde tijd van de eerste meting)	Adequaat	?
	Meetfout	Clemens 2018	Standard Error of Measurement (SEM) in seconden: SEM=0,55 (het percentage kon niet worden berekend) Minimal detectable change (MDC) met 90% vertrouwen, seconden: MDC₉₀=1,28 (het percentage kon niet worden berekend)	Adequaat	?
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
L-test	Construct validiteit (convergente validiteit)	Deathe 2005	L-test correlatie met andere meetinstrumenten, Pearson’s r: TUG: r=0,93 (p=0,00) 2 min WT: r= -0,86 (p=0,00) 10 meter WT: r=0,97 (p=0,00) ABC: r= -0,48 (p=0,00) FAI: r= -0,54 (p=0,00) PEQ-MS: r= -0,22 (p=0,04)	Zeer goed	+
	Construct validiteit (discriminatoire / bekende groepen validiteit	Deathe 2005	L-test score-verschillen tussen groepen, seconden (SD): TTA: 29,6 (12,8) TFA: 41,7 (16,8) Significant verschil tussen groepen (p<0,001) Traumatisch: 26,4 (7,8) Vasculair: 42,0 (17,8) Significant verschil tussen groepen (p<0,001) Loophulpmiddel gebruikt: Nee: 25,5 (6,4) Ja: 43,3 (17,5) Significant verschil tussen groepen (p<0,001) Autowalk: Ja: 30 (12,1) Nee: 44,5 (17,5) Significant verschil tussen groepen (p<0,001) Leeftijd: Onder 55: 25,4 (6,8) 55 of ouder: 39,7 (17,1) Significant verschil tussen groepen (p<0,001)	Twijfelachtig	+
	Betrouwbaarheid	Deathe 2005	Intrabeoordelaarsbetrouwbaarheid L-test, tijdspunt 1 versus 3: ICC(2,1)=0,97 (95%CI: 0,93-0,98) Interbeoordelaarsbetrouwbaarheid L-test, tijdspunt 1 versus 2: ICC(2,2)=0,96 (95%CI: 0,94-0,97)	Adequaat	+
	Betrouwbaarheid	Hunter 2018	ICC (95%CI): TTA(vasculair): ICC = 0,97 (0,89-0,99) TTA(non-vasculair): ICC = 0,95 (0,80-0,98) Complex: ICC = 0,997 (0,993-0,999)	Adequaat	+
	Meetfout	Deathe 2005	Standard Error of measurement (SEM) op tijdspunt 1 versus 2, seconden: SEM=3,0 (i.e. 9,2% van de gemiddelde tijd van de eerste meting)	Adequaat	?
	Meetfout	Hunter 2018	Standard Error of Measurement, seconden: TTA(vasculair): 1,15 (i.e. 3,6% van de gemiddelde tijd van de eerste meting) TTA(non-vasculair): 0,77 (i.e. 3,3% van de gemiddelde tijd van de eerste meting) Complex: 1,07 (i.e. 3% van de gemiddelde tijd van de eerste meting) Minimal detectable change (MDC) met 95% zekerheid, seconden: TTA(vasculair): 3,19 (i.e. 10,2% van de gemiddelde tijd van de eerste meting) TTA(non-vasculair): 2,15 (i.e. 9,2% van de gemiddelde tijd van de eerste meting) Complex: 2,98 (i.e. 8,2% van de gemiddelde tijd van de eerste meting)	Adequaat	+ (MCID: 4,5 seconden, uit Rushton 2015)
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
FSST	Validiteit	Er werden geen studies geïncludeerd die de validiteit van dit meetinstrument onderzochten.
	Betrouwbaarheid	Cardoso 2019	ICC (95%CI) Beste trial met krukken: ICC(3,1)=0,69 (0,4-0,85) Beste trial zonder krukken: ICC(3,1)=0,86 (0,71-0,94) Gemiddelde van twee trials met krukken: ICC(3,k)=0,81 (0,45-0,93) Gemiddelde van 2 trials zonder krukken: ICC(3,k)= 0,89 (0,72-0,96)	Adequaat	- (best trial met krukken) +
	Meetfout	Cardoso 2019	Standard Error of Measurement (SEM), seconden: Beste trial met krukken: SEM=0,52 (i.e. 5,8% van de gemiddelde tijd van de eerste meting) Beste trial zonder krukken: SEM=0,41 (i.e. 4,3% van de gemiddelde tijd van de eerste meting) Gemiddelde van twee trials met krukken: SEM=0,14 (i.e. 1,5% van de gemiddelde tijd van de eerste meting) Gemiddelde van 2 trials zonder krukken: SEM=0,13 (i.e. 1,5% van de gemiddelde tijd van de eerste meting) Minimal detectable change (MDC) met 90% zekerheid in seconden (95%CI): Beste trial met krukken: MDC₉₀=1,22 (0,83-1,62) (i.e. 13,6% van de gemiddelde tijd van de eerste meting) Beste trial zonder krukken: MDC₉₀=0,95 (0,49-1,41) (i.e. 10,1% van de gemiddelde tijd van de eerste meting) Gemiddelde van twee trials met krukken: MDC₉₀=0,34 (0,20-0,48) (i.e. 3,8% van de gemiddelde tijd van de eerste meting) Gemiddelde van 2 trials zonder krukken: MDC₉₀= 0,31 (0,15-0,47) (i.e. 3,6% van de gemiddelde tijd van de eerste meting)	Adequaat	?
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
Adem gas analyse (als maat voor energie-verbruik tijdens het lopen)	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor een adem gas analyse.
MOBILITEIT
BBS	Construct validiteit (convergente validiteit)	Major 2013	Spearman’s r correlatie tussen ABC scale en BBS: r=0,634 (p<0,001) Spearman’s r correlatie tussen PEQ-MS scale en BBS: r=0,584 (p=0,001) Spearman’s r correlatie tussen FAI en BBS: r=0,607 (p<0,001) Spearman’s r correlation between 2MWT and BBS: r=0,675 (p<0,001) Spearman’s r correlatie tussen L-test en BBS: r= -0,802 (p<0,001)	Twijfelachtig	+
	Construct validiteit (discriminatoire / bekende groepen validiteit)	Major 2013	Score voor de groep met “zelf-gerapporteerde angst voor vallen”, mediaan (IQR): Ja (n=10): 49 (46-52) Nee (n=20): 53 (50-55) Significant verschil tussen groepen (p=0,008) Score voor de groep met een “unilaterale amputatie”, mediaan (IQR): TTA (n=13): 53 (49-55) TFA (n=14): 52 (49-54) Geen significant verschil tussen groepen (p=0,325) Score voor de groep “oorzaak”, mediaan (IQR): Dysvasculair (n=7): 48 (45-52) Anders (n=23): 53 (50-55) Geen significant verschil tussen groepen (p=0,061) Score voor de groep met “dagelijks gebruik van een mobiliteitshulpmiddel”, mediaan (IQR): Ja (n=12): 49 (45-50) Nee (n=18): 54 (54 (52-55) Significant verschil tussen groepen (p<0,001) Score voor de groep met “2 of meer zelf-gerapporteerde vallen in de afgelopen 12 maanden”, mediaan (IQR): Ja (n=7): 50 (49-43) Nee (n=22): 53 (49-55) Geen significant verschil tussen groepen (p=0,381)	Twijfelachtig	-
	Interne consistentie	Major 2013	Crohnbach’s alpha voor rater 1: α = 0,827 Crohnbach’s alpha voor rater 2: α = 0,826	Twijfelachtig	+
	Betrouwbaarheid	Major 2013	ICC: ICC(2,1) =0,945	Adequaat	+
	Betrouwbaarheid	Wong 2014	Intrabeoordelaarsbetrouwbaarheid (5 patiënten): ICC(2,k)=0,99 (95%CI: 0,96-1,00) Interbeoordelaarsbetrouwbaarheid, BBS totale score (16 raters): ICC(2,k)=0,99 (95%CI: 0,99-1,00) Interbeoordelaarsbetrouwbaarheid, BBS item 1 (16 raters): ICC(2,k)=0,823 (95%CI: 0,601-0,975) Interbeoordelaarsbetrouwbaarheid, BBS item 2 (16 raters): ICC(2,k)=0,996 (95%CI: 0,988-1,00) Interbeoordelaarsbetrouwbaarheid, BBS item 3 (16 raters): ICC(2,k)=0,075 (95%CI: -0,015-0,532) Interbeoordelaarsbetrouwbaarheid, BBS item 4 (16 raters): ICC(2,k)=0,721 (95%CI: 0,45-0,957) Interbeoordelaarsbetrouwbaarheid, BBS item 5 (16 raters): ICC(2,k)=0,834 (95%CI: 0,62-0,977) Interbeoordelaarsbetrouwbaarheid, BBS item 6 (16 raters): ICC(2,k)=0,981 (95%CI: 0,946-0,998) Interbeoordelaarsbetrouwbaarheid, BBS item 7 (16 raters): ICC(2,k)=0,902 (95%CI: 0,752-0,987) Interbeoordelaarsbetrouwbaarheid, BBS item 8 (16 raters): ICC(2,k)=0,946 (95%CI: 0,853-0,993) Interbeoordelaarsbetrouwbaarheid, BBS item 9 (16 raters): ICC(2,k)=0,871 (95%CI: 0,689-0,983) Interbeoordelaarsbetrouwbaarheid, BBS item 10 (16 raters): ICC(2,k)=0,969 (95%CI: 0,911-0,996) Interbeoordelaarsbetrouwbaarheid, BBS item 11 (16 raters): ICC(2,k)=0,901 (95%CI: 0,749-0,987) Interbeoordelaarsbetrouwbaarheid, BBS item 12 (16 raters): ICC(2,k)=0,912 (95%CI: 0,772-0,989) Interbeoordelaarsbetrouwbaarheid, BBS item 13 (16 raters): ICC(2,k)=0,867 (95%CI: 0,680-0,982) Interbeoordelaarsbetrouwbaarheid, BBS item 14 (16 raters): ICC(2,k)=0,976 (95%CI: 0,930-0.997)	Zeer goed (voor totale scores, aangenomen dat deze op een continue schaal gemeten kon worden) Inadequaat (voor individuele item betrouwbaarheid, aangenomen dat individuele items ordinaal gescoord werden waardoor er een (gewogen) Kappa berekend zou behoren te worden)	+
	Meetfout	Er werden geen studies geïncludeerd die de meetfout van dit meetinstrument onderzochten.
	Responsiviteit	Er werden geen studies geïncludeerd die de responsiviteit van dit meetinstrument onderzochten.
LCI	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de Nederlandse versie van de Locomotive Capability Index.
Plus-M	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de Nederlandse versie van de Prosthetic Limb Users Survey of Mobility.
PPA	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de Nederlandse versie van de Prosthetic Profile of the Amputee.
PEQ	Er werden geen studies geïncludeerd die een meeteigenschap onderzochten voor de Nederlandse versie van de Prosthetic Evaluation Questionnaire.
*Risk of bias beoordeling op basis van de COSMIN risk of bias tool: lowest score counts. **De e.v.t. geformuleerde hypothesen staan in de evidence tabellen (voor construct validity / hypothesis testing).

Bewijskracht van de literatuur: Loopvaardigheid

SIGAM/WAP

De bewijskracht voor de meeteigenschap validiteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

De bewijskracht voor de meeteigenschap betrouwbaarheid is met twee niveaus verlaagd gezien beperkingen in de onderzoeksopzet (twee niveaus voor risk of bias: er zijn meerdere studies als inadequaat beoordeeld).

De bewijskracht voor de meeteigenschap meetfout kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

AMPPRO

De bewijskracht voor de meeteigenschap validiteit is met één niveau verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: Convergente validiteit was twijfelachtig, maar de discriminatoire validiteit werd als zeer goed beoordeeld).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met één niveaus verlaagd gezien het geringe aantal patiënten (één niveau voor imprecisie: het deelnemersaantal was 68).

De bewijskracht voor de meeteigenschap meetfout is met drie niveaus verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er was slechts één studie die als adequaat werd beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 44).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

6MWT

De bewijskracht voor de meeteigenschap validiteit is met vijf niveaus verlaagd gezien beperkingen in de onderzoeksopzet (drie niveaus voor risk of bias: er was slechts één studie en deze werd als inadequaat beoordeeld); en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 13).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met één niveau verlaagd gezien het geringe aantal patiënten (één niveau voor imprecisie: het deelnemersaantal was 82).

De bewijskracht voor de meeteigenschap meetfout is met één niveau verlaagd gezien het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 82).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

2MWT

De bewijskracht voor de meeteigenschap validiteit is met drie niveaus verlaagd gezien beperkingen in de onderzoeksopzet (twee niveaus voor risk of bias: er was slechts één studie en deze werd als twijfelachtig beoordeeld); het geringe aantal patiënten (één niveau voor imprecisie: er werden verschillende analyses uitgevoerd met deelnemersaantallen variërend tussen 56 en 142).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met twee niveaus verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er waren twee studies die als adequaat en twijfelachtig werden beoordeeld) en het geringe aantal patiënten (één niveaus voor imprecisie: het deelnemersaantal was 77).

De bewijskracht voor de meeteigenschap meetfout is met vier niveaus verlaagd gezien beperkingen in de onderzoeksopzet (twee niveaus voor risk of bias: er was slechts één studie die als twijfelachtig werd beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 44).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

Tien meter loop test

De bewijskracht voor de meeteigenschappen validiteit, betrouwbaarheid, meetfout en responsiviteit konden niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschappen onderzochten en rapporteerden.

TUG-test

De bewijskracht voor de meeteigenschap validiteit is met twee niveaus verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er zijn twee studies die validiteit rapporteren en deze werden als inadequaat en twijfelachtig/adequaat beoordeeld (twijfelachtig voor convergente validiteit; adequaat voor discriminatoire validiteit)) en inconsistentie (1 niveau voor inconsistentie: voor het testen van hypotheses spreken de uitkomstbeoordelingen elkaar tegen).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met één niveau verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er waren twee studies die als twijfelachtig werden beoordeeld en één als adequaat).

De bewijskracht voor de meeteigenschap meetfout is met één niveau verlaagd gezien het geringe aantal patiënten (één niveau voor imprecisie: het deelnemersaantal was 95).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

L-test

De bewijskracht voor de meeteigenschap validiteit is met één niveau verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er is één studie die zowel als zeer goed (convergente validiteit) als twijfelachtig (discriminatoire validiteit) werd beoordeeld).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met twee niveaus verlaagd gezien het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 47 (Hunter 2018 verdeelde n=60 over drie groepen waarbij de resultaten per groep werden beschreven. Hierdoor zit in elk beschreven resultaat n=20)).

De bewijskracht voor de meeteigenschap meetfout is met twee niveaus verlaagd gezien het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 47 (Hunter 2018 verdeelde n=60 over drie groepen waarbij de resultaten per groep werden beschreven. Hierdoor zit in elk beschreven resultaat n=20)).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

FSST

De bewijskracht voor de meeteigenschap validiteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

De bewijskracht voor de meeteigenschap betrouwbaarheid is met drie niveaus verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er is maar één studie en deze werd als adequaat beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 27).

De bewijskracht voor de meeteigenschap meetfout is met drie niveaus verlaagd gezien beperkingen in de onderzoeksopzet (één niveau voor risk of bias: er is maar één studie en deze werd als adequaat beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 27).

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

Adem gas analyse

Bewijskracht van de literatuur: Mobiliteit

BBS

De bewijskracht voor de meeteigenschap validiteit is met vier niveaus verlaagd gezien beperkingen in de onderzoeksopzet (twee niveaus voor risk of bias: er is slechts één studie en deze werd als twijfelachtig beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 30).

De bewijskracht voor de meeteigenschap interne consistentie is met vier niveaus verlaagd gezien beperkingen in de onderzoeksopzet (twee niveaus voor risk of bias: er is slechts één studie en deze werd als twijfelachtig beoordeeld) en het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 30).

De bewijskracht voor de meeteigenschap betrouwbaarheid is met twee niveaus verlaagd gezien het geringe aantal patiënten (twee niveaus voor imprecisie: het deelnemersaantal was 35, in Wong 2014 werden 5 deelnemers tevens door 16 beoordelaars beoordeeld).

De bewijskracht voor de meeteigenschap meetfout kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

De bewijskracht voor de meeteigenschap responsiviteit kon niet worden beoordeeld, aangezien er geen studies werden geselecteerd die deze meeteigenschap onderzochten.

LCI, PPA, Plus-M en PEQ

Om de uitgangsvraag te kunnen beantwoorden is er een systematische literatuuranalyse verricht naar de volgende zoekvraag:

Wat is de validiteit, betrouwbaarheid en responsiviteit van instrumenten die loopvaardigheid en/of mobiliteit meten bij patiënten met een (bilaterale) amputatie van de onderste extremiteit op een niveau vanaf een enkel-exarticulatie tot en met een heup-exarticulatie?

P: patiënten na een (bilaterale) beenamputatie op het niveau van de enkel of hoger;

I: loopvaardigheid: SIGAM/WAP, Amputee Mobility Predictor with Prosthesis (AMPPRO), zes minuten wandel test (6MWT), twee minuten wandel test (2MWT), tien meter loop test, Timed get up and go (TUG) test, L-test, Four-square step test (FSST), adem gas analyse (als maat voor het energieverbruik tijdens het lopen) / Mobiliteit: Berg Balance Scale (BBS), Locomotor Capability Index (LCI), Prosthetic limb users survey of Mobility (Plus-M), Prosthetic Profile of the Amputee (PPA), Prosthetic Evaluation Questionnaire (PEQ);

C: meetinstrumenten met elkaar vergeleken (indien relevant);

O: validiteit, betrouwbaarheid, responsiviteit.

Relevante instrumenten

Alleen de in de PICO genoemde instrumenten zijn voor opname in de literatuursamenvatting overwogen. Er is gekozen voor deze uitkomstmaten omdat deze zowel in Nederland als internationaal regelmatig worden gebruikt. Studies naar vragenlijsten werden alleen definitief geïncludeerd als uit de zoekactie (zie hieronder) bleek dat er ook een gevalideerde Nederlandse versie van de vragenlijst beschikbaar was. Een uitzondering hierop betreft de AMPPRO, daar de items van dit meetinstrument worden gescoord door de hulpverlener. De werkgroep is van mening dat bij dit meetinstrument de invloed van de vertaling op de uitkomst van de test minimaal is.

Relevante uitkomstmaten

De werkgroep definieerde de meeteigenschappen volgens de taxonomie van de Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) (Mokkink, 2010).

Zoeken en selecteren (Methode)

In de databases Medline (via OVID), en Cinahl is op 01 mei 2019 met relevante zoektermen gezocht naar meeteigenschappen (validiteit, betrouwbaarheid, responsiviteit) van de in de PICO gedefinieerde meetinstrumenten. De zoekverantwoording is weergegeven onder het tabblad Verantwoording. De literatuurzoekactie leverde 287 treffers op. Studies werden geselecteerd op grond van de volgende selectiecriteria: De studiepopulatie betrof deelnemers na een (bilaterale) beenamputatie, er werd ten minste één meeteigenschap onderzocht van ten minste één van de gespecificeerde meetinstrumenten, en voor vragenlijsten moest de Nederlandse versie van de vragenlijst zijn onderzocht. Voor ‘capaciteitstesten’ werd de afname in een andere taal als minder problematisch gezien, waardoor al deze studies werden geïncludeerd. Op basis van titel en abstract werden in eerste instantie 48 studies voorgeselecteerd. Na raadpleging van de volledige tekst, werden vervolgens 37 studies geëxcludeerd (zie exclusietabel onder het tabblad Verantwoording). Onder de geexcludeerde artikelen waren twee reviews (Condie, 2006; Stevens, 2010). Vier van de in deze reviews geïncludeerde studies waren wel relevant voor de beantwoording van de PICO. In totaal zijn dus 15 studies (11 plus vier) definitief geselecteerd.

Resultaten

Vijftien studies zijn opgenomen in de literatuuranalyse. De belangrijkste studiekarakteristieken en resultaten zijn opgenomen in de evidencetabellen. De beoordeling van de individuele studieopzet (risk of bias) is opgenomen in de risk of bias tabellen. Het risico op bias werd met de COSMIN Risk of Bias tool per in de studie gerapporteerde meeteigenschap beoordeeld. Hierbij werd het ‘lowest score counts’-principe gebruikt: de laagst behaalde score in de risk of bias beoordeling is de score die geldend is. De uitkomsten werden daarna per studie en per meeteigenschap beoordeeld aan de hand van kwaliteitscriteria (zie Tabel 2). Om eventuele hypothesen over de mate van correlatie (als hypothesis testing) te kunnen beoordelen werd de volgende schaal gehanteerd: een sterke correlatie heeft een correlatiecoëfficiënt (r) van 1 tot 0,8, een redelijke correlatie van 0,8 tot 0,5, een zwakke correlatie van 0,5 tot 0,3 en geen correlatie van 0,3 tot 0.

De beoordeling van de bewijskracht van de literatuur is grotendeels uitgevoerd zoals beschreven in de COSMIN-handleiding voor systematische reviews van PROMS (Mokkink, 2018). De meeteigenschappen werden echter per meetinstrument volgens de GRADE-systematiek beoordeeld, in plaats van één overkoepelende uitkomstbeoordeling en één overkoepelende GRADE-beoordeling per meetinstrument. Conform de COSMIN-handleiding is publicatie bias niet beoordeeld. In de beoordeling van de bewijskracht konden er drie niveaus worden afgewaardeerd voor risico op bias: één niveau voor een serieus risico (er zijn meerdere studies van twijfelachtige kwaliteit of er is één studie van adequate kwaliteit), twee niveaus voor een zeer serieus risico (er zijn meerdere studies van inadequate kwaliteit of er is één studie van twijfelachtige kwaliteit) en drie niveaus voor een extreem serieus risico (er is slechts één studie van inadequate kwaliteit). Voor inconsistentie kon er per meeteigenschap met één of twee niveaus afgetrokken wanneer er onverklaarbare heterogeniteit bestond tussen de gerapporteerde uitkomsten. Voor imprecisie konden er maximaal twee niveaus afgewaardeerd worden, waarbij het totaal aantal deelnemers bepalend is voor de afwaardering (i.e. één niveau voor n=50-100, twee niveaus voor n<50). Voor indirectheid kon er met één of twee niveaus afgewaardeerd wanneer de deelnemersgroep van de geïncludeerde studie niet geheel overeenkwam met de deelnemersgroep van de richtlijnmodule (zoals gedefinieerd in de PICO) en/of wanneer de studie het meetinstrument in een andere context onderzocht zou hebben dan bedoeld in deze richtlijnmodule.

Tabel 2 - Kwaliteitscriteria voor uitkomsten (Uit Prinsen (2018), gebaseerd op Terwee (2007) en Prinsen (2016))

Meeteigenschap	Beoordeling	Criteria
Structural validity	+	CTT: CFA: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.082 IRT/Rasch: No violation of unidimensionality: CFI or TLI or comparable measure >0.95 OR RMSEA <0.06 OR SRMR <0.08 AND no violation of local independence: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3's < 0.37 AND no violation of monotonicity: adequate looking graphs OR item scalability >0.30 AND adequate model fit: IRT: χ2 >0.01 Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Zstandardized values > ‐2 and <2
	?	CTT: Not all information for ‘+’ reported IRT/Rasch: Model fit not reported
	-	Criteria for ‘+’ not met
Internal consistency	+	At least low evidence4 for sufficient structural validity AND Cronbach's alpha(s) ≥ 0.70 for each unidimensional scale or Subscale.
	?	Criteria for “At least low evidence for sufficient structural validity” not met
	-	At least low evidence for sufficient structural validity AND Cronbach’s alpha(s) < 0.70 for each unidimensional scale or subscale
Reliability	+	ICC or weighted Kappa ≥ 0.70
	?	ICC or weighted Kappa not reported
	-	ICC or weighted Kappa < 0.70
Measurement error	+	SDC or LoA < MIC
	?	MIC not defined
	-	SDC or LoA > MIC
Hypotheses testing for construct validity	+	The result is in accordance with the hypothesis
	?	No hypothesis defined (by the review team)
	-	The result is not in accordance with the hypothesis
Cross‐cultural validity\measurement invariance	+	No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden's R2 < 0.02)
	?	No multiple group factor analysis OR DIF analysis performed
	-	Important differences between group factors OR DIF was found
Criterion validity	+	Correlation with gold standard ≥ 0.70 OR AUC ≥ 0.70
	?	Not all information for ‘+’ reported
	-	Correlation with gold standard < 0.70 OR AUC < 0.70
Responsiveness	+	The result is in accordance with the hypothesis OR AUC ≥ 0.70
	?	No hypothesis defined (by the review team)
	-	The result is not in accordance with the hypothesis OR AUC < 0.70
AUC: area under the curve, CFA: confirmatory factor analysis, CFI: comparative fit index, CTT: classical test theory, DIF: differential item functioning, ICC: intraclass correlation coefficient, IRT: item response theory, LoA: limits of agreement, MIC: minimal important change, RMSEA: Root Mean Square Error of Approximation, SEM: Standard Error of Measurement, SDC: smallest detectable change, SRMR: Standardized Root Mean Residuals, TLI = Tucker‐Lewis Index “+” = sufficient “−” = insufficient “?” = indeterminate

Brooks, D., Parsons, J., Hunter, J. P., Devlin, M., & Walker, J. (2001). The 2-minute walk test as a measure of functional improvement in persons with lower limb amputation. Archives of physical medicine and rehabilitation, 82(10), 1478-1483.
Brooks, D., Hunter, J. P., Parsons, J., Livsey, E., Quirt, J., & Devlin, M. (2002). Reliability of the two-minute walk test in individuals with transtibial amputation. Archives of physical medicine and rehabilitation, 83(11), 1562-1565.
Cardoso, J. R., Beisheim, E. H., Horne, J. R., & Sions, J. M. (2019). Retracted: Test‐Retest Reliability of Dynamic Balance Performance‐Based Measures Among Adults With a Unilateral Lower‐Limb Amputation. PM&R, 11(3), 243-251.
Clemens, S. M., Gailey, R. S., Bennett, C. L., Pasquina, P. F., Kirk-Sanchez, N. J., & Gaunaurd, I. A. (2018). The Component Timed-Up-and-Go test: the utility and psychometric properties of using a mobile application to determine prosthetic mobility in people with lower limb amputations. Clinical rehabilitation, 32(3), 388-397.
Condie, E., Scott, H., & Treweek, S. (2006). Lower limb prosthetic outcome measures: a review of the literature 1995 to 2005. JPO: Journal of Prosthetics and Orthotics, 18(6), P13-P45.
Cox, P. D., Frengopoulos, C. A., Hunter, S. W., Sealy, C. M., Deathe, A. B., & Payne, M. W. (2017). Impact of course configuration on 6-Minute Walk Test Performance of people with lower extremity amputations. Physiotherapy Canada, 69(3), 197-203.
De Laat, F. A., Roorda, L. D., Geertzen, J. H., & Rommers, C. (2020). Test–retest reliability of the special interest group on amputation medicine/Dutch working group on amputations and prosthetics mobility scale, in persons wearing a prosthesis after a lower-limb amputation. Disability and rehabilitation, 42(12), 1762-1766.
De Laat, F. A., Rommers, G. M., Geertzen, J. H., & Roorda, L. D. (2012). Construct validity and test-retest reliability of the walking questionnaire in people with a lower limb amputation. Archives of physical medicine and rehabilitation, 93(6), 983-989.
Deathe, A. B., & Miller, W. C. (2005). The L test of functional mobility: measurement properties of a modified version of the timed “up & go” test designed for people with lower-limb amputations. Physical therapy, 85(7), 626-635.
Gailey, R. S., Roach, K. E., Applegate, E. B., Cho, B., Cunniffe, B., Licht, S., ... & Nash, M. S. (2002). The amputee mobility predictor: an instrument to assess determinants of the lower-limb amputee's ability to ambulate. Archives of physical medicine and rehabilitation, 83(5), 613-627.
Hunter, S. W., Frengopoulos, C., Holmes, J., Viana, R., & Payne, M. W. (2018). Determining reliability of a dual-task functional mobility protocol for individuals with lower extremity amputation. Archives of physical medicine and rehabilitation, 99(4), 707-712.
Lin, S. J., & Bose, N. H. (2008). Six-minute walk test in persons with transtibial amputation. Archives of physical medicine and rehabilitation, 89(12), 2354-2359.
Major, M. J., Fatone, S., & Roth, E. J. (2013). Validity and reliability of the Berg Balance Scale for community-dwelling persons with lower-limb amputation. Archives of physical medicine and rehabilitation, 94(11), 2194-2202.
Mokkink, L. B., Prinsen, C. A., Patrick, D. L., Alonso, J., Bouter, L. M., de Vet, H.C., Terwee C. B. (2018). COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs). User manual. 78:1. Beschikbaar op: https://www.cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018-1.pdf.
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., ... & de Vet, H. C. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of clinical epidemiology, 63(7), 737-745.
Nederlandse Vereniging van Revalidatieartsen (VRA; 2019). Behandelkader beenamputatie. Available from: https://revalidatiegeneeskunde.nl/sites/default/files/attachments/Kwaliteit/Behandelkaders/behandelkader_beenamputatie_-_12_april_2019_def.pdf.
Prinsen, C. A., Mokkink, L. B., Bouter, L. M., Alonso, J., Patrick, D. L., De Vet, H. C., & Terwee, C. B. (2018). COSMIN guideline for systematic reviews of patient-reported outcome measures. Quality of Life Research, 27(5), 1147-1157.
Prinsen, C. A., Vohra, S., Rose, M. R., Boers, M., Tugwell, P., Clarke, M., ... & Terwee, C. B. (2016). How to select outcome measurement instruments for outcomes included in a “Core Outcome Set”–a practical guideline. Trials, 17(1), 449.
Resnik, L., & Borgia, M. (2011). Reliability of outcome measures for people with lower-limb amputations: distinguishing true change from statistical error. Physical therapy, 91(4), 555-565.
Rommers, G. M., Ryall, N. H., Kap, A., De Laat, F., & Van der Linde, H. (2008). The mobility scale for lower limb amputees: the SIGAM/WAP mobility scale. Disability and rehabilitation, 30(15), 1106-1115.
Schoppen, T., Boonstra, A., Groothoff, J. W., de Vries, J., Göeken, L. N., & Eisma, W. H. (1999). The Timed “up and go” test: reliability and validity in persons with unilateral lower limb amputation. Archives of physical medicine and rehabilitation, 80(7), 825-828.
Stevens, P. M. (2010). Clinimetric properties of timed walking events among patient populations commonly encountered in orthotic and prosthetic rehabilitation. JPO: Journal of Prosthetics and Orthotics, 22(1), 62-74.
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., ... & de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of clinical epidemiology, 60(1), 34-42.
Wong, C. K. (2014). Interrater reliability of the Berg Balance Scale when used by clinicians of various experience levels to assess people with lower limb amputations. Physical therapy, 94(3), 371-378.

Study reference

Study characteristics

Patient characteristics

Measurement properties and procedures

Follow-up

Interpretability of results

Outcome measures and effect size ⁴

Comments

Brooks 2001

Instruments assessed: 2 minute walking test

Setting and country: Hospital, Canada

Sampling method: convenience sample

Funding and conflicts of interest: no commercial party has direct conflict of interest in the results of this study.

Inclusion criteria:

Patients who completed an inpatient amputee rehabilitation programme, unilateral of bilateral transtibial or transfemoral amputation, fit with prosthesis, able to walk.

Exclusion criteria:

Not reported.

N total at baseline:

G: n=290

Sample characteristics¹:

Mean age ± SD:

G: 66.3 (13.1), range: 21-94

Sex (male/female), %:

G: 73% M/ 27% F

Amputation type, n:

Transtibial (uni): 179

Transfemoral (uni): 60

Bilateral: 51

Comorbidities, n:

Peripheral vascular disease: 194

Diabetes: 165

Coronary artery disease: 43

Congestive heart failure: 43

Stroke: 32

Arthritis: 12

COPD: 26

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing):

The hypothesis was that the 2MWT would correlate moderately to physical functioning (SF-36) and the Houghton scale. A second hypothesis was that the 2MWT would be responsive to progress with rehabilitation.

The 2MWT was administered following initial prosthesis fitting, within 48h prior to discharge, and at 3 months of follow-up.

The SF-36 was administered within 48h of admission, within 48h prior to discharge, and at 3-months of follow-up.

Houghton scale was administered only once, within 48h prior to discharge

Incomplete outcome data:

2MWT and SF-36 at discharge

G: 142/290 analyzed (49%)

Reason: Retrospective data analysis

2MWT and SF-36 at follow-up

G: 67/290 analyzed (23.1%)

Reason: Retrospective data analysis

2MWT and Houghton scale at discharge

G: 56/290 analyzed 17.9%)

Reason: Retrospective data analysis

How were missing data handled?

G: participants were included for analysis when they had at least 2 measures of the 2MWT

Length of follow-up (if applicable):

3 months

Loss-to-follow-up (if applicable):

See incomplete data

Was the distribution of the (total) scores in the study sample described?:

2MWT2, meters (SD) at baseline: 27.9 (18.1)

2MWT2, meters (SD) at discharge: 41.1 (28.5)

2MWT2, meters (SD) at follow-up: 69.6 (40.9)

Not reported for SF-36 and Houghton scale.

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing):

2MWT and SF-36 at discharge

G: r=0.22

Significant correlation (p=0.008)

2MWT and SF-36 at follow-up

G: r=0.479

Significant correlation (p<0.001)

2MWT and Houghton scale at discharge

G: r=0.493

Significant correlation (p<0.001)

Responsiveness as a measurement property is not assessed; Change scores of the 2MWT and the other instruments are not analyzed.

Brooks 2002

Instruments assessed: 2 minute walking test

Setting and country: hospital, Canada

Sampling method:

Funding and conflicts of interest: no commercial party has direct conflict of interest in the results of this study.

Inclusion criteria:

Limb prosthesis, minimum of 2 weeks rehabilitation

Exclusion criteria:

prosthetic modifications planned, medical restrictions preventing participation, cognitive impairment, unable to give consent, poor motivation, unable to participate on two consecutive treatment days.

N total at baseline:

G: n=33

Sample characteristics¹:

Mean age ± SD:

G: 63.6 (2), range: 42-80)

Sex (male/female):

G: 23M/ 10F

Comorbidities, n:

Peripheral vascular disease: 20

Diabetes: 11

Osteomyelitis: 1

Sarcoma: 1

Days since amputation (SD):

G: 107.8 (16.1), range: 25-365

Describe the assessed measurement properties and their procedures:

Reliability

Reliability, inter- and intra-rater reliability:

Participants were familiar with the 2MWT or were given a practice test 1 day before actual testing. Raters were blinded for each other’s score. Each subject performed 4 walks (2 on consecutive treatment days). Each rater assessed one test per day. There were 30 minutes between tests on a day to prevent fatigue. Tests were performed in a corridor >40m.

Incomplete outcome data:

Not reported

How were missing data handled?

Not reported.

Length of follow-up (if applicable):

30 minutes of rest in between tests, 2 consecutive testing days

Loss-to-follow-up (if applicable):

Not reported. Participants were excluded when not available for 2 consecutive testing days.

Was the distribution of the (total) scores in the study sample described?:

2MWT for inpatient transtibial amputees (n=27), meters (SD)

Day 1, rater 1: 50.1 (4.6)

Day 1, rater 2: 51.5 (4.2)

Day 2, rater 1: 57 (5)

Day 2, rater 2: 57.5 (5.2)

2MWT for outpatient transtibial amputees (n=6), meters (SD)

Day 1, rater 1: 121.1 (18.6)

Day 1, rater 2: 123.2 (15.5)

Day 2, rater 1: 137.9) (14.7)

Day 2, rater 2: 140.7 (15.9)

Percentage of the sample with the lowest score possible:

Percentage of the sample with the highest score possible:

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Interrater reliability for inpatients (n=27)

Day 1: ICC=0.98

Day 2: ICC=0.98

Interrater reliability for outpatients (n=6)

Day 1: ICC=0.98

Day 2: ICC=0.99

Intrarater reliability for inpatients (n=27)

Rater 1 (day 1,2): ICC=0.90

Rater 2 (day 1,2): ICC=0.94

Intrarater reliability for outpatients (n=6)

Rater 1 (day 1,2): ICC=0.95

Rater 2 (day 1,2): ICC=0.96

ICC model not reported

Cardoso 2019

Instruments assessed: Four Square Step Test

Setting and country: USA

Sampling method: Through advertising

Funding and conflicts of interest: none.

Inclusion criteria:

Unilateral lower limb amputation, age >18, current prosthesis use

Exclusion criteria:

Residual limb issue, receiving treatment, condition that could affect safe participation.

N total at baseline:

G: n=27

Sample characteristics¹:

Mean age ± SD:

G: 51 (21.2)

Sex (male/female):

G: 12 M/ 15 F

Type of amputation, n:

Hip disarticulation: 1

Transfemoral: 5

Knee disarticulation: 1

Transtibial: 20

Reason for amputation, n:

Trauma: 16

Dysvascular: 3

Other: 3

Cancer: 2

Infection: 2

Congenital: 1

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (test-retest) and

Measurement error:

FSST was administered. Participants were asked to return 2-4 days after the first test was administered for a second test.

Incomplete outcome data:

Not reported

How were missing data handled?

Not reported

Length of follow-up (if applicable):

Breaks between tests were given when needed. 2-4 days between testing days.

Loss-to-follow-up (if applicable):

Not reported

Was the distribution of the (total) scores in the study sample described?

FSST, seconds (SD)

Best with canes (test 1): 8.95 (2.01)

Best with canes (test 2): 9.13 (3.95)

Best without canes (test 1): 9.45 (4.33)

Best without canes (test 2): 8.28 (2.85)

Average of 2 trials with canes (test 1): 9.06 (1.5)

Average of 2 trials with canes (test 2): 8.31 (1.36)

Average of 2 trials without canes (test 1): 8.59 (1.55)

Average of 2 trials without canes (test 2): 7.98 (1.64)

Percentage of the sample with the lowest score possible:

Percentage of the sample with the highest score possible:

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability

FSST, ICC (95%CI)

Best with canes: ICC(3,1)=0.69 (0.4-0.85)

Best without canes: ICC(3,1)=0.86 (0.71-0.94)

Average of 2 trials with canes: ICC(3,k)=0.81 (0.45-0.93)

Average of 2 trials without canes: ICC(3,k)= 0.89 (0.72-0.96)

Measurement error

FSST, Standard Error of Measurement in seconds:

Best with canes: SEM=0.52

Best without canes: SEM=0.41

Average of 2 trials with canes: SEM=0.14

Average of 2 trials without canes: SEM=0.13

FSST, Minimal Detectable Change at 90% confidence in seconds (95%CI):

Best with canes: MDC₉₀=1.22 (0.83-1.62)

Best without canes: MDC₉₀=0.95 (0.49-1.41)

Average of 2 trials with canes: MDC₉₀=0.34 (0.20-0.48)

Average of 2 trials without canes:MDC₉₀= 0.31 (0.15-0.47)

ICC model 3,1 and 3,k were used

Clemens 2018

Instruments assessed: TUG

Setting and country: Conference, USA

Sampling method: Convenience

Funding and conflicts of interest: Study was funded by US department of defence, veteran affairs joint incentive fund. Authors reported that there were no COIs.

Inclusion criteria:

Age 18-80, non-vascular cause of amputation, unilateral transtibial / transfemoral / knee disarticulation amputation, well-fitting socket

Exclusion criteria:

Open wounds, bilateral amputation, requiring more than std care for independent transfers and ambulation.

N total at baseline:

G:n=118 total (n=51 for reliability, n=118 for validity)

Sample characteristics¹:

Mean age ± SD:

G: 48.1 (13.7)

Sex (male/female):

G: 64M/ 54F

Time since amputation, years (SD):

G: 10.2 (11.7)

Amputation level, n:

Transtibial: 55

Transfemoral: 63

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

Hypotheses: moderate negative correlations between TUG and self report measures of mobility and balance. Transtibial amputees would perform the test in less time than those with transfemoral amputations.

Self-report measures were administered before the performance based measures.

Self-report measures were: Prosthetic Limb Users Survey of Mobility (12-item, higher score is better), and the Activities-specific Balance Confidence Scale (16 activities, score range: 0-100%).

Reliability

Reliability (test-retest) and measurement error

TUG test was performed 4 times: 2 clockwise, 2 counter-clockwise. N=51 were used for the reliability results.

Incomplete outcome data:

Only n=51 were used for test-retest reliability

Reason: 67 other participants did not perform their last 2 trials at a self-selected speed.

How were missing data handled?

Excluded from analyses

Length of follow-up (if applicable):

1 minute rest between trials

Loss-to-follow-up (if applicable):

Was the distribution of the (total) scores in the study sample described?

Not reported for the whole group

Percentage of the sample with the lowest score possible:

Not reported for the whole group

Percentage of the sample with the highest score possible:

Not reported for the whole group

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

TUG correlation with PLUS-M, Spearman’s rho: r= -0.56

Significant correlation (p<0.001)

TUG correlation with ABC, Spearman’s rho: r= -0.46

Significant correlation (p<0.01)

TUG total time, difference between transtibial (TTA, n=55) and transfemoral (TFA, n=63), seconds (SD):

TTA: 10.04 (2.3), range: 6.73-16.8)

TFA: 12.77 (5.04), range: 7.83-33.07

Significant difference between groups (p<0.0001)

Reliability

Reliability (test-retest)

TUG total time, ICC (95%CI): ICC(2,1)=0.98 (0.97-0.99)

Unclear over which trials the ICC was calculated.

Measurement error

TUG, Standard Error of Measurement in seconds: SEM=0.55

TUG, Minimal detectable change at 90% confidence in seconds: MDC₉₀=1.28

ICC 2,1 was used

Not reported over which trials the test-retest reliability was calculated.

Cox 2017

Instruments assessed: 6 minute walking test (two course configurations)

Setting and country: Rehabilitation institute, UK

Sampling method: (probably) convenience

Funding and conflicts of interest: authors declared that there were no COIs

Inclusion criteria:

Age >18 years, admitted for initial prosthetic training after their first lower extremity amputation (either unilateral or bilateral transtibial/femoral), no cognitive or language deficit, ability to understand the study, ability to provide consent.

Exclusion criteria:

Unable or unwilling to comply with the study protocol, prosthetic/medical instability that made it unsafe to participate.

N total at baseline:

G: 25

Sample characteristics¹:

Mean age ± SD:

G: 63.1 (13.8)

Sex (male/female):

G: 18M/ 7F

Cause of amputation, n:

Diabetic: 9

Vascular: 7

Trauma: 4

Other: 5

BMI, mean kg/m² (SD):

G: 24.9 (4.3)

Length of stay in rehabilitation, mean days (SD):

G: 32.3 (11.1)

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (intrarater) and measurement error

Two configurations were used.

Configuration 1: a 20 meter path, where participants had to do 180 degree turns (walking up and down).

Configuration 2: a 20 meter rectangular course (6x4x6x4 meter), where participants made 90 degree turns.

30 minutes of rest were given between trials. There were 2 trials per day and 2 testing days (4 trials in total). On both days both configurations were trialled once.

Participants completed the tests in the last 3 days before discharge.

Incomplete outcome data:

N=52 consented to participate. Only n=25 completed all testing on 2 days.

How were missing data handled?

Removed from analysis.

Length of follow-up (if applicable):

2 testing days (between day reliability) within the last 3 days of admission.

Loss-to-follow-up (if applicable):

N=27 did not complete all tests.

Was the distribution of the (total) scores in the study sample described?:

6MWT Configuration 1 (trial 1), mean meters (SD): 173 (70.1)

6MWT Configuration 1 (trial 2), mean meters (SD): 193.0 (75)

6MWT Configuration 2 (trial 1), mean meters (SD): 160.3 (72.5)

6MWT Configuration 2 (trial 2), mean meters (SD): 166.1 (73.5)

Percentage of the sample with the lowest score possible:

G: NA

Percentage of the sample with the highest score possible:

G: NA

Minimally important change/difference determined or referred? (yes/no)

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability (test retest)

Configuration 1: ICC=0.97 (95%CI: 0.93-0.98)

Configuration 2: ICC=0.97 (95%CI: 0.94-0.99)

Measurement error

Configuration 1, Standard Error of Measurement, meter: SEM=12.6

Configuration 2, Standard Error of Measurement, meter: SEM=12.5

Configuration 1, Minimal detectable change at 95% confidence, meter: MDC₉₅=34.8

Configuration 2, Minimal detectable change at 95% confidence, meter: MDC₉₅=34.7

ICC model not reported

De Laat 2019

Instruments assessed: SIGAM/WAP

Setting and country: outpatient rehabilitation center and hospital department, Netherlands

Sampling method:

Funding and conflicts of interest: Authors declared that there were no COIs

Inclusion criteria:

Lower limb amputation, using prosthesis, age >18, able to understand and fill in questionnaires.

Exclusion criteria:

Not reported.

N total at baseline:

G: n=80

Sample characteristics¹:

Mean age ± SD:

G: 61 (15)

Sex (male/female):

G: 49M/31F

Cause of amputation, n:

Vascular: 61

Non-vascular 19

Amputation extent, n:

Unilateral: 70

Bilateral: 10

Level of amputation, n:

Transfemoral or knee disarticulation: 26

Transtibial or Syme: 44

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (test-retest, intrarater)

Assessment took place at the end of outpatient rehabilitation. Second assessment took place three weeks afterwards by the same physician or physiotherapist.

Participants also rated their ability to walk, compared to the first time they were assessed. The Global Rating of Change Questionnaire was used for this purpose.

Incomplete outcome data:

How were missing data handled?

Length of follow-up (if applicable):

3 weeks between test days

Loss-to-follow-up (if applicable):

Was the distribution of the (total) scores in the study sample described?:

Only the first assessment of stable patients were provided.

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability (test-retest)

Stable patients (rehabilitation center): ICC(2,1)=0.79 (95%CI: 0.56-0.90)

Stable patients (rehabilitation department hospital): ICC(2,1)=0.98 (95%CI: 0.95-0.99)

Stable patients (rehabilitation center and hospital combined): ICC(2,1)=0.0.90 (95%CI: 0.84-0.94)

Non-stable patients (rehabilitation center and hospital combined): ICC(2,1)=0.55 (95%CI: 0.24-0.76)

Two-way random effects ICC model (ICC2,1)

Test-retest reliability for non-stable participants is difficult to interpret, since generally you want to measure participants twice without a change in the construct.

Deathe 2005

Instruments assessed: L-test

Setting and country: Regional outpatient clinic, Canada

Sampling method: convenience

Funding and conflicts of interest: Financial support by Parkwood Hospital Foundation. The Canadian Institute of Health Research and the Michael Smith Foundation for Health research supported a postdoc salary of one of the researchers.

Inclusion criteria:

Age >19, unilateral transtibial or transfemoral amputation, amputation related to vascular or traumatic causes, had their prosthesis for a minimum of 6 months.

Exclusion criteria:

Unable to speak or read English or follow verbal instructions, did not complete the necessary scales and walk tests, had a prosthetic or medical problem, claudication in contralateral limb, active heart failure, unstable diabetes, COPD.

N total at baseline:

G: n=102

Sample characteristics for n=93 that completed all tests¹:

Mean age ± SD ):

G: 66.9 (14.2)

Sex (male/female):

G: 73M/20F

Amputation level, n:

Transtibial: 69

Transfemoral: 24

Amputation cause, n:

Traumatic: 56

Vascular: 37

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

Hypotheses:

Positive correlation with TUG
Positive correlation with 10 meter WT
Negative correlation with 2 minute WT
Negative correlation with self-report scales: Activities specific balance confidence (ABC), Frenchay Activities Index (FAI), Prosthetic evaluation questionnaire (mobility subscale) (PEQ-MS)
Able to discriminate between transfemoral and transtibial
Able to discriminate between vascular and non-vascular cause of amputation
L-test higher among older individuals
L-test higher among individuals who used a mobility device
L-test higher among those who had to think about stepping

Subjects performed other walking tests (TUG, 10 meter WT, and the 2 minute WT) and completed self-report questionaires (Activities specific balance confidence, Frenchay Activities Index, Prosthetic evaluation questionnaire (mobility subscale)).

N=93 for validity

Reliability

Reliability (inter-rater reliability) and measurement error

Participants performed three trials of the L-test on two time-points on a day. 2 weeks later, 3 trials of the L-test were performed. The 3rd trial was used for analysis. At least 2 minutes of rest were given between tests.

Intrarater: timepoint 1 versus 3 (n=27)

Interrater: timepoint 1 versus 2 (n=93)

Incomplete outcome data:

N=9 did not complete all of the tests for time points 1 and 2. N=75 did not return for the retest 2 weeks later.

How were missing data handled?

Exclusion from analysis

Length of follow-up (if applicable):

2 weeks between retest.

Loss-to-follow-up (if applicable):

N=75 did not return for the retest 2 weeks later.

Was the distribution of the (total) scores in the study sample described? (yes/no):

L-test for n=93, day 1 (timepoint 1), mean seconds (SD): 32.6 (14.9)

L-test for n=93, day 1 (timepoint 2), mean seconds (SD): 32.9 (16.8)

L-test for n=27 for the retest sample, day 2 (timepoint 3), mean seconds (SD): 29.7 (8)

Other scores were not reported.

Percentage of the sample with the lowest score possible:

Percentage of the sample with the highest score possible:

Minimally important change/difference determined or referred?

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

L-test correlation with other measures, Pearson’s rho:

TUG: r=0.93 (p=0.00)

2 min WT: r= -0.86 (p=0.00)

10 meter WT: r=0.97 (p=0.00)

ABC: r= -0.48 (p=0.00)

FAI: r= -0.54 (p=0.00)

PEQ-MS: r= -0.22 (p=0.04)

L-test scores differences between groups, seconds (SD):

Transtibial: 29.6 (12.8)

Transfemoral: 41.7 (16.8)

Significant difference between groups (P<0.001)

Traumatic: 26.4 (7.8)

Vascular: 42.0 (17.8)

Significant difference between groups (P<0.001)

Walking aid used:

No: 25.5 (6.4)

Yes: 43.3 (17.5)

Significant difference between groups (P<0.001)

Autowalk

Yes: 30 (12.1)

No: 44.5 (17.5)

Significant difference between groups (P<0.001)

Age:

Under 55: 25.4 (6.8)

55 or over: 39.7 (17.1)

Reliability

Reliability

Intrarater reliability L-test, time point 1 versus 3: ICC(2,1)=0.97 (95%CI: 0.93-0.98)

Interrater reliability L-test, time point 1 versus 2: ICC(2,2)=0.96 (95%CI: 0.94-0.97)

Measurement error

L-test Standard Error of measurement (SEM) at timepoint 1 versus 2, seconds: SEM=3.0

ICC2,1(intrarater) and 2,2(interrater) were used

Gailey 2002

Instruments assessed: AMPPro

Setting and country: hospital, rehabilitation centre, extended-care facilities, patient support groups, USA

Sampling method: convenience

Funding and conflicts of interest:

Inclusion criteria:

Medically stable, able to follow basic instructions, able to perform activities without risk, had appropriately fitted prosthesis, pain free at time of testing.

Exclusion criteria:

Mental deterioration, advanced neurologic order, congestive heart failure / angina pectoris / COPD, ulcers or infections, irreducible pronounced knee or hip flextures

Note: Bilateral amputations were included for the reliability study, but excluded for the validity study

N total at baseline:

G: n=191 (n=167 for validity, n=24 for reliability)

Sample characteristics¹:

Mean age ± SD (or median age (range)):

Reliability: 68.3 (17.98)

Validity: -

Sex (male/female):

Reliability: 10M/14F

Validity: 86M/81F

Amputation characteristics (reliability), n

Unilateral transtibial: 10

Unilateral transfemoral: 8

Bilateral: 6

Amputation cause (validity), n:

Disease: 76

Trauma: 61

Tumor: 24

Congenital: 6

Amputation level (reliability), n)

Ankle disarticulation: 2

Transtibial: 82

Knee disarticulation: 7

Transfemoral: 67

Hip disarticulation: 7

Transpelvic: 2

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

AMP should discriminate between MFCL levels

AAS was administered orally during the initial interview. Participants were assigned an MFCL level. AMPPro was conducted. Thereafter the 6MWT was performed by the participants.

Reliability

Reliability (test retest: inter- and intrarater)

Participants were asked to perform each of the 20 items of the AMPPro. Subjects performed the test at self-selected pace to avoid fatigue. The trial on day 1 was rated independently by 2 raters. The trial on day 2 was rated by 2 raters independently. The trial on day 2 was scheduled within 21 days from trial 1. All participants were considered functionally stable since they had long gone finished their rehabilitation.

Incomplete outcome data:

Validity: NA

Reliability: Only n=27 had complete data from 2 testing days (it was stated that n=24 participated in the study)

How were missing data handled?

Excluded from analysis

Length of follow-up (if applicable):

Within 21 days (reliability)

Loss-to-follow-up (if applicable):

Reliability: Only n=27 had complete data from 2 testing days (it was stated that n=24 participated in the study)

Was the distribution of the (total) scores in the study sample described? (yes/no):

Mean AMPPro score (SD) in MFCL groups:

MFCL 0-1: 25 (7.37)

MFCL 2: 34.65 (6.49)

MFCL 3: 40.5 (3.9)

MFCL 4: 44.67 (1.75)

Significant difference between all groups (P=0.0001)

Mean 6MWT walking distance in meters (SD) in MFCL groups:

MFCL 0-1: 49.86 (29.82)

MFCL 2: 189.9 (111.30)

MFCL 3: 298.64 (102.37)

MFCL 4: 419.46 (86.15)

Significant difference between all groups (P=0.0001)

Mean AAS score (SD) in MFCL groups:

MFCL 0-1: -35.5 (25.19)

MFCL 2: -7.51 (27.47)

MFCL 3: 11.28 (20.29)

MFCL 4: 27.77 (14.06)

Significant difference between all groups (P=0.0001)

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

Mean AMPPro score (SD) in MFCL groups:

MFCL 0-1: 25 (7.37)

MFCL 2: 34.65 (6.49)

MFCL 3: 40.5 (3.9)

MFCL 4: 44.67 (1.75)

Significant difference between all groups (P=0.0001)

AMPPro correlation with 6MWT, Pearson’s rho: r=0.82 (p<0.0001)

AMPPro correlation with AAS, Pearson’s rho: r=0.77 (p<0.0001)

Reliability

Reliability

AMPPro interrater reliability, day 1: ICC=0.99

AMPPro interrater reliability, day 2: ICC=0.99

AMPPro intrarater reliability, rater 1 (day1-day2): ICC=0.96

AMPPro intrarater reliability, rater 2 (day1-day2): ICC=0.98

ICC model not reported

Hunter 2018

Instruments assessed: L-test

Setting and country: outpatient amputee clinic, Canada

Sampling method: probably convenience (sampling took place in an outpatient clinic)

Funding and conflicts of interest: No COIs declared by the authors. Supported by a Frederick Banting and Charles Best Canada Graduate Scholarships-development fund from the faculty of health sciences (university of Western Ontario).

Inclusion criteria:

Age 18 or older, functional use of the English language, lower extremity amputation (transtibial (vascular or non-vascular) / complex amputations (transfemoral or bilateral)), using their prosthesis in the community for walking, using the prosthesis for at least 6 months.

Exclusion criteria:

Physical problem that limited ambulation, if the person did not have a prosthesis.

N total at baseline:

TTA(vasc): 20

TTA(non-vasc): 20

Complex: 20

Sample characteristics¹:

Mean age ±SD:

TTA(vasc): 60.36 (7.84)

TTA(non-vasc): 55.85 (14.08)

Complex: 58.21 (14.88)

Sex, n male:

TTA(vasc): 18

TTA(non-vasc): 17

Complex: 13

Time since amputation, months (SD):

TTA(vasc): 3.49 (3.61)

TTA(non-vasc): 20.43 (17.61)

Complex: 15.55 (15.43)

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (test-retest: interrater) and measurement error:

One rater performed all assessments.

Participants completed the L-test. Standardized instructions were given.

Incomplete outcome data:

N=8 (8/68) could not return for a retest. No other incomplete data seems to be reported.

Reason: no available rides, illness, scheduling issues, other.

How were missing data handled?

Excluded from analyses

Length of follow-up (if applicable):

Retest within 14 days.

Loss-to-follow-up (if applicable):

N=8 (8/68) could not return for a retest

Reason: no available rides, illness, scheduling issues, other.

Was the distribution of the (total) scores in the study sample described? (yes/no):

L-test, trial 1, seconds (SD):

TTA(vasc): 31.31 (7.30)

TTA(non-vasc): 23.49 (3.56)

Complex: 36.18 (19.88)

Percentage of the sample with the lowest score possible:

Percentage of the sample with the highest score possible:

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability

L-test ICC (95%CI):

TTA(vasc): ICC = 0.97 (0.89-0.99)

TTA(non-vasc): ICC = 0.95 (0.80-0.98)

Complex: ICC = 0.997 (0.993-0.999)

Measurement error

L-test, Standard Error of Measurement, seconds:

TTA(vasc): 1.15

TTA(non-vasc): 0.77

Complex: 1.07

L-test, Smallest Detectable Change at 95% confidence, seconds:

TTA(vasc): 3.19

TTA(non-vasc): 2.15

Complex: 2.98

Single = task L-test is the original L-test.

ICC model unclear (ICC for single measure)

Bland-altman plots only for dual-tasks L-test.

Lin 2008

Instruments assessed: 6 minute walking-test

Setting and country: USA

Sampling method: recruitment at support group and local advertising

Funding and conflicts of interest: Supported by the Texas Physical Therapy Foundation

Inclusion criteria:

Transtibial amputation, walking independently, absence of skin breakdown of the residual limb in past 3 months, well-controlled medical conditions

Exclusion criteria:

Use of assistive devices, recent illness, hospital admissions

N total at baseline:

G: 13

Sample characteristics¹:

Mean age ± SD:

G: 46 (14.8)

Sex (male/female):

G: 9M/ 4F

Cause of amputation, n:

Trauma: 9

Vascular: 4

Prosthesis experience, years (SD):

G: 7.61 (9.25)

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

Participants who had a longer one-leg balance time would walk a longer distance in the 6MWT
Participants who had a faster time in the TUG test would walk a longer distance in the 6MWT

Three trials on one day. A second session was scheduled within 2 weeks to perform the TUG and single leg balance test.

Reliability

Reliability (within day test-retest of 3 trials, inter- or intrarater)

Three trials within one day

Incomplete outcome data:

For TUG and single leg balance test:

N=1 drop-out

Reason: one person dropped out during testing.

How were missing data handled?

Excluded from analyses

Length of follow-up (if applicable):

Within day reliability (20 min rest between trials). For validity: within 2 weeks

Loss-to-follow-up (if applicable):

None.

Was the distribution of the (total) scores in the study sample described? (yes/no):

6MWT, meters (SD):

Trial 1: 544.6 (64.5)

Trial 2: 554 (71.4)

Trial 3: 570 (80.1)

TUG, seconds:

Can only be approximated from Figure 3

Single leg balance test, mean seconds (SD):

Eyes open (sound leg): 24.77 (9.82)

Eyes open (prosthetic leg): 2.26 (1.06)

Eyes closed (sound leg): 9.33 (0.42)

Eyes closed (prosthetic leg): 1.10 (0.37)

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported (30 sec was the ceiling for the single leg balance test)

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

Correlation between TUG and 6MWT: r= -0.76 (p=0.004)

Correlation between mean score of the singe leg balance test (prosthetic leg, eyes open): r = 0.63 (p-value not reported)

Correlation between mean score of the singe leg balance test (prosthetic leg, eyes closed): r = 0.61 (p-value not reported)

Reliability

Reliability (within day)

6MWT, ICC of 3 trials: ICC(3,1) = 0.97

Measurement error

6MWT Bland-Altman (trial 2 minus trial 1), meter:

LoA = -44.6 to 63.5

Mean = 9.45 (95%CI includes 0)

6MWT Bland-Altman (trial 3 minus trial 2), meter:

LoA = -25.8 to 57.8

Mean = 16 (95%CI does not include 0)

ICC(3,1) was used.

Major 2013

Instruments assessed: Berg Balance Scale

Setting and country: research lab

Sampling method: Unclear

Funding and conflicts of interest: Supported by National Institutes of Health, National Institute on

Disability and Rehabilitation Research, and the

David Rubin Enrichment Fund

Inclusion criteria:

Uni or bilateral amputation proximal to the ankle, used a prosthesis for ambulation, no upper-extremity amputation, residual limb in good condition

Exclusion criteria:

Not reported

N total at baseline:

G: n=30

Sample characteristics¹:

Mean age ± SD:

G: 54 (12)

Sex (male/female):

G: 20M/ 10F

Prothesis experience, years (SD):

G: 18 (14)

Amputation cause, n:

Dysvascular: 7

Traumatic: 14

Infection: 6

Congenital: 3

Amputation level, n:

Uni transtibial: 13

Uni transfemoral: 14

Bi: transtibial: 2

Bi transtibial / transfemoral: 1

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

Persons with fear of falling would perform worse on the BBS
Persons with unilateral transfemoral amputation would perform worse on the BBS than persons with unilateral transtibial amputation
Persons with dysvascular amputation would perform worse on the BBS
Persons using a mobility aid (daily) would perform worse on the BBS
Persons experiencing multiple falls would perform worse on the BBS
BBS would share a positive monotonic relationship with the ABC Scale, PEQ-MS, FAI, and 2MWT and a negative relationship with the LTest

Participants filled in questionnaires (Activities-specific Balance Confidence scale, Prosthesis Evaluation Questionnaire-Mobility Subscale, Frenchay Activities Index) and did performance tests (BBS, L-test, 2MWT). There was 5 minutes of rest between performance tests. The ratings of rater 1 on the BBS were used for validity.

Reliability

Internal consistency and reliability (inter-rater)

BBS was performed by rater 1. Participants received 20 minutes of seated rest. BBS was performed again by rater 2.

Incomplete outcome data:

1 datapoint is missing on self-reported falls

Reason: unclear

How were missing data handled?

Length of follow-up (if applicable):

NA. All tests were performed on 1 day. (5 minutes rest between performance tests, 20 minutes seated rest for reliability testing of BBS)

Loss-to-follow-up (if applicable):

1 datapoint is missing on self-reported falls

Reason: unclear

Was the distribution of the (total) scores in the study sample described? (yes/no):

ABC Scale, mean score (SD): 79 (20)

PEQ-MS mean score (SD): 8 (2)

FAI, mean score (SD): 33 (7)

2MWT, mean meters (SD): 113.2 (39.8)

L-test, mean seconds (SD): 34.5 (19.3)

BBS, mean score (SD): 51 (5)

Percentage of the sample with the lowest score possible:

G: 0% (lowest score in the sample was 33 points on the BBS)

Percentage of the sample with the highest score possible:

G: 10% reached the maximum score on the BBS (56 points on BBS was the ceiling)

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

Rank correlation between ABC scale and BBS: r=0.634 (p<0.001)

Rank correlation between PEQ-MS scale and BBS: r=0.584 (p=0.001)

Rank correlation between FAI and BBS: r=0.607 (p<0.001)

Rank correlation between 2MWT and BBS: r=0.675 (p<0.001)

Rank correlation between L-test and BBS: r= -0.802 (p<0.001)

BBS score for group “self-reported fear of falling”, median (IQR):

Yes (n=10): 49 (46-52)

No (n=20): 53 (50-55)

Significant difference between groups (p=0.008)

BBS score for group “unilateral amputation level”, median (IQR):

Transtibial (n=13): 53 (49-55)

Transfemoral (n=14): 52 (49-54)

No significant difference between groups (p=0.325)

BBS score for group “cause”, median (IQR):

Dysvascular (n=7): 48 (45-52)

Other (n=23): 53 (50-55)

No significant differences between groups (p=0.061)

BBS score for group “daily use of mobility aid”, median (IQR):

Yes (n=12): 49 (45-50)

No (n=18): 54 (54 (52-55)

Significant difference between groups (p<0.001)

BBS score for group “self-reported 2 or more fall is past 12 months”, median (IQR):

Yes (n=7): 50 (49-43)

No (n=22): 53 (49-55)

No significant difference between groups (p=0.381)

Reliability

Internal consistency

BBS, Cronbach’s alpha for rater 1: α = 0.827

BBS, Cronbach’s alpha for rater 2: α = 0.826

Reliability (test-retest, interrater):

BBS (within-day), ICC: ICC(2,1) =0.945

From the manuscript: “It was anticipated that the BBS would share a positive monotonic relationship with the ABC Scale, PEQ-MS, FAI, and 2MWT and a negative relationship with the LTest.”

ICC model 2,1 was used.

Cronbach’s alpha was used for internal consistency

Spearman’s rank correlation was used.

Resnik 2011

Instruments assessed: 2MWT, 6MWT, TUG, AMP(Pro)

Setting and country: Multi-site (medical and non-medical centers), USA

Sampling method: convenience (through information packages, posters, brochures and/or clinical referral

Funding and conflicts of interest: Supported by Veteran Affairs VA RR&D A3772C and DOD W81XWH-07-1-0689VA, CoI not reported.

Inclusion criteria:

Current prosthesis users (unilateral), limb loss at least 2 years prior, medically stable, using current prosthesis for at least 6 months, not participated in a rehabilitation program for at least 6 months.

Exclusion criteria:

Any condition that prevented prosthetic fitting and use, unable to ambulate independently for 3m, admitted to a hospital in previous 3 months.

N total at baseline:

G: 44

Sample characteristics¹:

Mean age (SD):

G: 66 (13)

Sex (male/female):

G: 42M/2F

Type of unilateral amputation, n (%):

TFA: 23 (52.3%)

KD: 2 (4.5%)

TTA: 19 (43.2%)

Mean weight, lb (SD):

G: 173lb (23)

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (intrarater)

Each therapist rated the first and second tests (intrarater). Retests were performed within 1 week.

The 2MWT was measured while performing the 6MWT. i.e. the distance covered at 2 minutes during the 6MWT was the participant’s 2MWT score.

Incomplete outcome data:

No incomplete data reported

How were missing data handled?

No missing data reported

Length of follow-up (if applicable):

Retest within 1 week (actual time between test-retest not reported)

Loss-to-follow-up (if applicable):

No loss to follow up reported.

Was the distribution of the (total) scores in the study sample described? (yes/no):

2MWT, mean distance in meters (SD):

Test 1: 114 (36)

Test 2: 121 (37)

6MWT, mean distance in meters (SD):

Test 1: 332 (115)

Test 2: 344 (121)

TUG, mean time in seconds (SD):

Test 1: 12.3 (4.5)

Test 2: 13.0 (5.6)

AMP, mean score (SD):

Test 1: 40 (4)

Test 2: 41 (4)

Percentage of the sample with the lowest score possible:

Not reported for the performance measures

Percentage of the sample with the highest score possible:

Not reported for the performance measures

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability

2MWT, intrarater reliability, ICC (95%CI):

ICC(2,1) = 0.83 (0.71-0.90)

6MWT, intrarater reliability, ICC (95%CI):

ICC(2,1) = 0.97 (0.95-0.99)

TUG, intrarater reliability, ICC (95%CI):

ICC(2,1) = 0.88 (0.80-0.94)

AMP, intrarater reliability, ICC (95%CI):

ICC(2,1) = 0.88 (0.79-0.93)

Measurement error

2MWT, Standard error of Measurement (SEM), meters:

SEM = 48.5

6MWT, Standard error of Measurement (SEM), meters:

SEM = 63.6

TUG, Standard error of Measurement (SEM), seconds:

SEM = 1.6

AMP, Standard error of Measurement (SEM), score:

SEM = 1.5

2MWT, Minimal Detectable Change at 90% confidence (MDC), meters:

MDC₉₀ = 112.5

6MWT, Minimal Detectable Change at 90% confidence (MDC), meters:

MDC₉₀ = 147.5

TUG, Minimal Detectable Change at 90% confidence (MDC), seconds:

MDC₉₀ = 3.6

AMP, Minimal Detectable Change at 90% confidence (MDC), points:

MDC₉₀ = 3.4

AMP was conducted. No description whether this was the Pro or noPro version. However, it seems from the procedures that the Pro version was tested.

Test-retest reliability analyzed with RM-ANOVA and ICC(2,1)

Rommers 2008

Instruments assessed: SIGAM/WAP Mobility Scale

Setting and country: Rehabilitation centres, Netherlands

Sampling method:

Funding and conflicts of interest: Not Reported

Inclusion criteria:

Exclusion criteria:

N total at baseline:

G: n=20

Sample characteristics¹:

Not reported. N=20 had transtibial, knee disarticulation and transfemoral amputations. Different age groups were included in this sample as wel.

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (test retest: interrater)

20 patients were assessed by 2 raters.

2 patient cases were provided to 120 raters.

Incomplete outcome data:

Not reported

How were missing data handled?

Length of follow-up (if applicable):

Loss-to-follow-up (if applicable):

Not reported

Was the distribution of the (total) scores in the study sample described? (yes/no):

Not reported

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability

Two raters agreed 100% in all 20 participants assessed

118/120 raters agreed 100% in 2 patient cases. Two disagreements were resolved after discussion, where after 120/120 raters agreed 100% in 2 patient cases.

No statistical calculation were performed for reliability

Schoppen 1999

Instruments assessed: Timed Up and Go test

Setting and country: Home setting, Netherlands

Sampling method: recruited from an orthopaedic workshop

Funding and conflicts of interest: Supported by the Health Science Promotion Program

Inclusion criteria:

Unilateral transtibial or transfemoral amputation, vascular cause of amputation, age>60, able to walk without aids >6 meters.

Exclusion criteria:

Not reported.

N total at baseline:

G: n=32

Sample characteristics¹:

Mean age (range):

Transtibial: 73.5 (61-86)

Transfemoral: 72.4 (68-81)

Sex (male/female):

G: 23M/ 9F

Disease characteristics (e.g. severity/status/ duration):

Describe the assessed measurement properties and their procedures:

Validity

Construct validity (hypothesis testing)

TUG test has a moderate correlation with the Groningen Activity Restriction Scale
There is only a correlation with the physical mobility subscore of the Sickness Impact Profile-68

Reliability

Internal consistency

Reliability (test-retest, inter- and intrarater)

For interrater: The test was performed on 2 different times on the same day with 5-10 minutes inbetween

For intrarater: The test was performed with 1 rater and with 2 weeks interval between tests.

Incomplete outcome data:

Test-retest (interrater): 1 persons did not complete the retest

How were missing data handled?

Excluded from analysis (1 person for interrater reliability)

Length of follow-up (if applicable):

2 weeks for intrarater

5-10 minutes for interrater

Loss-to-follow-up (if applicable):

Test-retest (interrater): 1 persons did not complete the retest

Was the distribution of the (total) scores in the study sample described? (yes/no):

TUG, seconds (range): 24.5 (9-102)

Percentage of the sample with the lowest score possible:

Not reported

Percentage of the sample with the highest score possible:

Not reported

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Construct validity (hypothesis testing)

Spearman’s rank correlation between TUG and GARS: r=0.39 (p=0.03)

Spearman’s rank correlation between TUG and SIP68 total score: r=0.40 (significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “mobility control”: r=0.46 (significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “mobility range”: r=0.36 (significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “Somatic Autonomy”: r=0.28 (not significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “Physic autonomy”: r=0.31 (not significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “Social behaviour”: r=0.19 (not significant, p-value not reported)

Spearman’s rank correlation between TUG and SIP68 subscale “emotional stability”: r= -0.04 (not significant, p-value not reported)

Reliability

Reliability

Intrarater reliability, spearman’s rank: r=0.93 (p<0.001)

Difference between mean scores of the rater was significant (p=0.047)

Interrater reliability: spearman’s rank: r=0.96 (p<0.001)

No difference between the mean scores of two raters (p=0.31)

Spearman’s rho correlation was used for reliability

Wong 2014

Instruments assessed: Berg Balance Scale

Setting and country:

Sampling method:

Funding and conflicts of interest: supported by National

Center for Injury Prevention and Control,

Centers for Disease Controi and Prevention,

to the Center for Injury Epidemiology and

Prevention at Columbia University

Inclusion criteria:

Community dwelling, any level of cause of amputation, bilateral or unilateral

Exclusion criteria:

Medical issues affecting balance

N total at baseline:

G: n =5 (5 patients were rated by 16 raters)

Sample characteristics¹:

Mean age ± SD:

G: 53 (15.7)

Sex (male/female):

G: 4M/1F

Time since amputation, years (SD):

G: 8.2 (7.9)

Describe the assessed measurement properties and their procedures:

Reliability

Reliability (intest-retest, inter- and intrarater)

Two raters scored the initial performance of the BBS and later the video recording (intrarater). The trial was video recorded and scored by 16 raters (interrater).

Incomplete outcome data:

None reported

How were missing data handled?

None reported

Length of follow-up (if applicable):

NA (1 trial was video recorded and assessed by 16 raters)

Loss-to-follow-up (if applicable):

From sample (and subgroups if applicable)

G: %

SG: %

Reason:

Was the distribution of the (total) scores in the study sample described? (yes/no):

BBS score:

Participant 1: 4

Participant 2: 42

Participant 3: 53

Participant 4: 56

Participant 5: 56

Percentage of the sample with the lowest score possible:

None.

Percentage of the sample with the highest score possible:

2/5 persons had the maximum of 56 points

Outcome measures and effect size (include 95%CI and p-value if available):

Reliability

Reliability

Interrater reliability, BBS total score (16 raters): ICC(2,k)=0.99 (95%CI: 0.99-1.00)

Interrater reliability, BBS item 1 (16 raters): ICC(2,k)=0.823 (95%CI: 0.601-0.975)

Interrater reliability, BBS item 2 (16 raters): ICC(2,k)=0.996 (95%CI: 0.988-1.00)

Interrater reliability, BBS item 3 (16 raters): ICC(2,k)=0.075 ( (95%CI: -0.015-0.532)

Interrater reliability, BBS item 4 (16 raters): ICC(2,k)=0.721 (95%CI: 0.45-0.957)

Interrater reliability, BBS item 5 (16 raters): ICC(2,k)=0.834 (95%CI: 0.62-0.977)

Interrater reliability, BBS item 6 (16 raters): ICC(2,k)=0.981 (95%CI: 0.946-0.998)

Interrater reliability, BBS item 7 (16 raters): ICC(2,k)=0.902 (95%CI: 0.752-0.987)

Interrater reliability, BBS item 8 (16 raters): ICC(2,k)=0.946 (95%CI: 0.853-0.993)

Interrater reliability, BBS item 9 (16 raters): ICC(2,k)=0.871 (95%CI: 0.689-0.983)

Interrater reliability, BBS item 10 (16 raters): ICC(2,k)=0.969 (95%CI: 0.911-0.996)

Interrater reliability, BBS item 11 (16 raters): ICC(2,k)=0.901 (95%CI: 0.749-0.987)

Interrater reliability, BBS item 12 (16 raters): ICC(2,k)=0.912 (95%CI: 0.772-0.989)

Interrater reliability, BBS item 13 (16 raters): ICC(2,k)=0.867 (95%CI: 0.680-0.982)

Interrater reliability, BBS item 14 (16 raters): ICC(2,k)=0.976 (95%CI: 0.982)

Intrarater reliability (5 patients): ICC(2,k)=0.99 (95%CI: 0.96-1.00)

ICC model 2,k was used

1st author,

year of publication

Instruments assessed:

Setting and country:

Sampling method:

Funding and conflicts of interest:

Inclusion criteria:

Exclusion criteria:

N total at baseline:

Sample characteristics¹:

Mean age ± SD (or median age (range)):

Sex (male/female):

G: M/F

Disease characteristics (e.g. severity/status/ duration):

Describe the assessed measurement properties and their procedures:

Validity

Criterion validity

Construct validity (hypothesis testing)

Content validity

Structural validity

Cross-cultural validity

Reliability

Internal consistency

Reliability (indicate type of reliability, e.g. test-retest, inter- or intrarater)

Measurement error

Responsiveness

Responsiveness

Incomplete outcome data:

From sample (and subgroups if applicable)

G: %

SG: %

Reason:

How were missing data handled?

Length of follow-up (if applicable):

Loss-to-follow-up (if applicable):

From sample (and subgroups if applicable)

G: %

SG: %

Reason:

Was the distribution of the (total) scores in the study sample described? (yes/no):

Percentage of the sample with the lowest score possible:

G: %

Percentage of the sample with the highest score possible:

G: %

Minimally important change/difference determined or referred? (yes/no)

MIC/MID:

Outcome measures and effect size (include 95%CI and p-value if available):

Validity

Criterion validity

Construct validity (hypothesis testing)

Content validity

Structural validity

Cross-cultural validity

Reliability

Internal consistency

Reliability

Measurement error

Responsiveness

Responsiveness

¹ Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., Bouter, L. M., … de Vet, H. C. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation, 19(4), 539-49.

COSMIN Risk of Bias

Hypothesis testing for construct validity
Author: Brooks 2001
Instrument: 2MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws

Reliability
Author: Brooks 2002
Instrument: 2MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Cardoso 2019
Instrument: Four Square Step Test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Cardoso 2019

Instrument: Four Square Step Test

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Hypothesis testing for construct validity
Author: Clemens 2018
Instrument: Timed up and Go test
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws
Discriminative / known-groups validity
Was an adequate description provided of important characteristics of the subgroups?	Adequate description of the important characteristics of the subgroups	Adequate description of most of the important characteristics of the subgroups	Poor of no description of the important characteristics of the subgroups
Was the statistical method appropriate for the hypothesis to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws

Reliability
Author: Clemens 2018
Instrument: Timed up and Go test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Clemens 2018

Instrument: Timed up and Go test

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Reliability
Author: Cox 2017
Instrument: 6MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Cox 2017

Instrument: 6MWT

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Reliability
Author: De Laat 2019
Instrument: SIGAM/WAP
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable (ICCs were calculated for ordinal the outcome of the SIGAM/WAP)
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Hypothesis testing for construct validity
Author: Deathe 2005
Instrument: L-test
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws
Discriminative / known-groups validity
Was an adequate description provided of important characteristics of the subgroups?	Adequate description of the important characteristics of the subgroups	Adequate description of most of the important characteristics of the subgroups	Poor of no description of the important characteristics of the subgroups
Was the statistical method appropriate for the hypothesis to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws

Reliability
Author: Deathe 2005
Instrument: L-test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Deathe 2005

Instrument: L-test

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Hypothesis testing for construct validity
Author: Gailey 2002
Instrument: AMPPro
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws
Discriminative / known-groups validity
Was an adequate description provided of important characteristics of the subgroups?	Adequate description of the important characteristics of the subgroups	Adequate description of most of the important characteristics of the subgroups	Poor of no description of the important characteristics of the subgroups
Was the statistical method appropriate for the hypothesis to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws

Reliability
Author: Gailey 2002
Instrument: AMPPro
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Hunter 2018
Instrument: L-test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Hunter 2018

Instrument: L-test

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Hypothesis testing for construct validity
Author: Lin 2008
Instrument: 6MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws

Reliability
Author: Lin 2008
Instrument: 6MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Lin 2008

Instrument: 6MWT

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Hypothesis testing for construct validity
Author: Major 2013
Instrument: Berg Balance Scale
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct)	Other important methodological flaws
Discriminative / known-groups validity
Was an adequate description provided of important characteristics of the subgroups?	Adequate description of the important characteristics of the subgroups	Adequate description of most of the important characteristics of the subgroups	Poor of no description of the important characteristics of the subgroups
Was the statistical method appropriate for the hypothesis to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct): fear of falling and fall frequency are self-reported	Other important methodological flaws

Internal consistency
Author: Major 2013
Instrument: Berg Balance Scale
	Very Good	Adequate	Doubtful	Inadequate	NA
Was an internal consistency statistic calculated for each unidimensional scale or subscale separately?	Internal consistency statistic calculated for each unidimensional scale or subscale		Unclear whether scale or sub scale is unidimensional	Internal consistency statistic NOT calculated for each unidimensional scale or sub scale
For continuous scores: Was Cronbach’s alpha or omega calculated?	Cronbach’s alpha, or Omega calculated		Only item‐total correlations calculated	No Cronbach’s alpha and no item‐total correlations calculated	Not applicable
For dichotomous scores: Was Cronbach’s alpha or KR‐ 20 calculated?	Cronbach’s alpha or KR‐20 calculated		Only item‐total correlations calculated	No Cronbach’s alpha or KR‐ 20 and no item‐total correlations calculated	Not applicable
For IRT‐based scores: Was standard error of the theta (SE (θ)) or reliability coefficient of estimated latent trait value (index of (subject or item) separation) calculated?	SE(θ) or reliability coefficient calculated			SE(θ) or reliability coefficient NOT calculated	Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Major 2013
Instrument: Berg Balance Scale
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Resnik 2011
Instrument: 2MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (the 2MWT was not conducted separately. Participants were instructed to walk as far as they could in 6 minutes, however the distance covered at 2 minutes was recorded as their 2MWT score)	Other important methodological flaws

Reliability
Author: Resnik 2011
Instrument: 6MWT
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Resnik 2011
Instrument: Timed up and go test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Resnik 2011
Instrument: AMP(Pro)
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Measurement error

Author: Resnik 2011

Instrument: 2MWT

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws (the 2MWT was not conducted separately. Participants were instructed to walk as far as they could in 6 minutes, however the distance covered at 2 minutes was recorded as their 2MWT score)

Other

important

methodological

flaws

Measurement error

Author: Resnik 2011

Instrument: 6MWT

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Measurement error

Author: Resnik 2011

Instrument: Timed up and Go

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Measurement error

Author: Resnik 2011

Instrument: AMP(Pro)

Very Good

Adequate

Doubtful

Inadequate

Were patients stable in the interim period on the construct to be

measured?

Patients were stable

(evidence provided)

Assumable that

patients were stable

Unclear if patients

were stable

Patients were

NOT stable

Was the time interval appropriate?

Time interval

appropriate

Doubtful whether

time interval was

appropriate or time

interval was not

stated

Time interval

NOT

appropriate

Were the test conditions similar for the measurements? (e.g. type

of administration, environment, instructions)

Test conditions were

similar (evidence

provided)

Assumable that test

conditions were

similar

Unclear if test

conditions were

similar

Test conditions

were NOT

similar

For continuous scores: Was the Standard Error of Measurement

(SEM), Smallest Detectable Change (SDC) or Limits of Agreement

(LoA) calculated?

SEM, SDC, or LoA

calculated

Possible to calculate

LoA from the data

presented

SEM calculated

based on

Cronbach’s

alpha, or on SD

from another

population

Not applicable

For dichotomous/nominal/ordinal scores: Was the percentage

(positive and negative) agreement calculated?

% positive and

negative agreement

calculated

% agreement

calculated

% agreement

not calculated

Not applicable

Were there any other important flaws in the design or statistical

methods of the study?

No other important

methodological

flaws

Other minor

methodological flaws

Other

important

methodological

flaws

Reliability
Author: Rommers 2008
Instrument: SIGAM/WAP
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Hypothesis testing for construct validity
Author: Schoppen 1999
Instrument: Timed up and Go test
	Very Good	Adequate	Doubtful	Inadequate	NA
Convergent validity
Is it clear what the comparator instrument(s) measure(s)?	Constructs measured by the comparator instrument(s) is clear			Constructs measured by the comparator instrument(s) is not clear
Where the measurement properties of the comparator instrument(s) sufficient?	Sufficient measurement properties of the comparator instrument(s) in a population similar to the study population	Sufficient measurement properties of the comparator instrument(s) but not sure if these apply to the study population	Some information on measurement properties of the comparator instrument(s) in any study population	No information on the measurement properties of the comparator instrument(s), OR evidence for insufficient measurement properties of the comparator instrument(s)
Was the statistical method appropriate for the hypotheses to be tested?	Statistical method was appropriate	Assumable that statistical method was appropriate	Statistical method applied NOT optimal	Statistical method applied NOT appropriate
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws (e.g. only data presented on a comparison with an instrument that measures another construct): Constructs of comparator instruments seem not (or remotely) related to the construct of the L-test	Other important methodological flaws

Reliability
Author: Schoppen 1999
Instrument: Timed up and GO test
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa calculated	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Reliability
Author: Wong 2014
Instrument: Berg Balance Scale
	Very Good	Adequate	Doubtful	Inadequate	NA
Were patients stable in the interim period on the construct to be measured?	Evidence provided that patients were stable	Assumable that patients were stable	Unclear if patients were stable	Patients were NOT stable
Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate or time interval was not stated	Time interval NOT appropriate
Were the test conditions similar for the measurements? e.g. type of administration, environment, instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear if test conditions were similar	Test conditions were NOT similar
For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described for the total score of the BBS	ICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated	Not applicable
For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			No kappa Calculated for individual items on the BBS	Not applicable
For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated or not described for individual items on the BBS		Not applicable
For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described for individual items on the BBS			Not applicable
Were there any other important flaws in the design or statistical methods of the study?	No other important methodological flaws		Other minor methodological flaws	Other important methodological flaws

Exclusietabel - geexcludeerd na het lezen van het volledige artikel

Auteur en jaartal	Redenen van exclusie
Franchignoni 2019	Vragenlijst is afgenomen in het Italiaans
Beisheim 2019	Geen meeteigenschappen gerapporteerd
Sions 2018	Geen meeteigenschappen gerapporteerd
Overgaard 2018	Geen meeteigenschappen gerapporteerd
Howard 2018	Geen meeteigenschappen gerapporteerd
Moore 2017	Geïncludeerde studies voldoen niet aan de PICO
Hafner 2017	Vragenlijst is afgenomen in het Engels
Rushton 2015	Geen meeteigenschappen gerapporteerd
Franchignoni 2015	Prosthetic Mobility Questionnaire is geen instrument die door ons geselecteerd is (bestaat uit de PEQ + 2 nieuwe items, en is dus daardoor een geheel nieuw instrument). Mogelijk bruikbaar voor overwegingen.
Kaluf 2014	Geen meeteigenschappen gerapporteerd
Gerbais 2014	Geen meeteigenschappen gerapporteerd
Raya 2013	Voldoet niet aan de PICO (AMP-B is niet gespecificeerd)
Gremeaux 2012	Geen meeteigenschappen gerapporteerd
Samitier 2011	Spaanstalig artikel
Franchignoni 2007a	Vragenlijst is afgenomen in het Italiaans
Franchignoni 2007b	Vragenlijst is afgenomen in het Italiaans
Gauthier-Gagnon 2006	Vragenlijst is afgenomen in het Engels en/of Frans
Franchignoni 2004	Vragenlijst is afgenomen in het Italiaans
Ryall 2003	Vragenlijst afgenomen in het Engels
Miller 2001	Vragenlijst afgenomen in het Engels en/of Frans
Legro 1998	Vragenlijst afgenomen in het Engels
Gauthier-Gagnon 1994	Vragenlijst afgenomen in het Engels en/of Frans
Karatzios 2019	Vragenlijst afgenomen in het Frans
Repo 2018	Vragenlijst afgenomen in het Fins
Christensen 2017	Vragenlijst afgenomen in het Deens
Newton 2016	Voldoet niet aan PICO
MacKenzie 2016	Geen meeteigenschappen onderzocht
Joussain 2015	Vragenlijst afgenomen in het Frans
Binay Safer 2015	Vragenlijst afgenomen in het Turks
Salavati 2011	Vragenlijst afgenomen in het Persisch
Herbert 2011	Conference abstract
Larsson 2009	Vragenlijst afgenomen in het Zweeds
Ferrand-Ferri 2007	Beschrijft alleen het vertaal proces, maar geen meeteigenschappen (van de vertaling)
Ferriero 2005	Vragenlijst afgenomen in het Italiaans

Beoordelingsdatum en geldigheid

Publicatiedatum : 25-11-2020

Beoordeeld op geldigheid : 19-11-2020

Uiterlijk in 2025 bepaalt het bestuur van de Nederlandse Vereniging van Revalidatieartsen of de modules van deze richtlijn nog actueel zijn. Op modulair niveau is een onderhoudsplan beschreven. Bij het opstellen van de richtlijn heeft de werkgroep per module een inschatting gemaakt over de maximale termijn waarop herbeoordeling moet plaatsvinden en eventuele aandachtspunten geformuleerd die van belang zijn bij een toekomstige herziening (update). De geldigheid van de richtlijn komt eerder te vervallen indien nieuwe ontwikkelingen aanleiding zijn een herzieningstraject te starten.

De Nederlandse Vereniging van Revalidatieartsen is regiehouder van deze richtlijn en eerstverantwoordelijke op het gebied van de actualiteitsbeoordeling van de richtlijn. De andere aan deze richtlijn deelnemende wetenschappelijke verenigingen of gebruikers van de richtlijn delen de verantwoordelijkheid en informeren de regiehouder over relevante ontwikkelingen binnen hun vakgebied.

Module	Regiehouder(s)	Jaar van autorisatie	Eerstvolgende beoordeling actualiteit richtlijn	Frequentie van beoordeling op actualiteit	Wie houdt er toezicht op actualiteit	Relevante factoren voor wijzigingen in aanbeveling
Klinimetrie	VRA	2020	2025	Elke 5 jaar	VRA	Een vertaling naar het Nederlands en validatie van deze Nederlandse performance testen zou reden kunnen zijn om de tweede aanbeveling aan te passen.

Initiatief en autorisatie

Initiatief:

Nederlandse Vereniging van Revalidatieartsen

Geautoriseerd door:

Ergotherapie Nederland
ISPO Nederland
Korter maar Krachtig
Koninklijk Nederlands Genootschap voor Fysiotherapie
Nederlands Instituut van Psychologen
Nederlandse Orthopaedische Vereniging
Nederlandse Vereniging voor Anesthesiologie
NVOS-Orthobanda
Nederlandse Vereniging voor Heelkunde
Stichting Orthopedische Hulpmiddelenzorg Nederland
Verpleegkundigen en Verzorgenden Nederland
Vereniging van Specialisten Ouderengeneeskunde
Nederlandse Vereniging van Revalidatieartsen

Algemene gegevens

De richtlijnontwikkeling werd ondersteund door het Kennisinstituut van de Federatie Medisch Specialisten en werd gefinancierd uit de Stichting Kwaliteitsgelden Medisch Specialisten (SKMS). De financier heeft geen enkele invloed gehad op de inhoud van de richtlijn.

Samenstelling werkgroep

Voor het ontwikkelen van de richtlijn is in 2018 een multidisciplinaire werkgroep en klankbordgroep ingesteld, bestaande uit vertegenwoordigers van alle relevante specialismen die betrokken zijn bij de zorg voor patiënten die een amputatie (hebben) ondergaan. De werkgroepleden en klankbordgroepleden zijn door hun beroepsverenigingen gemandateerd voor deelname. De werkgroep is verantwoordelijk voor de integrale tekst van deze richtlijn.

Werkgroep

Prof. dr. J.H.B. Geertzen, revalidatiearts, Centrum voor Revalidatie Groningen, Universitair Medisch Centrum Groningen, VRA (voorzitter)
Prof. dr. J.S. Rietman, revalidatiearts, Roessingh, Centrum voor Revalidatie / Universiteit Twente, Enschede, VRA
Dhr. B. Fard, MSc, revalidatiearts, Roessingh, Centrum voor Revalidatie, Enschede, VRA
Prof. dr. P.C. Jutte, orthopedisch chirurg, Universitair Medisch Centrum Groningen, NOV
Dr. J.H.C. Daemen, vaatchirurg, Maastricht Universitair Medisch Centrum, Maastricht, NVvH
Drs. ing. D.A.A. Lamprou, vaatchirurg, Nij Smellinghe, Drachten, NVvH
Dr. W. ten Hoope, anesthesioloog, Ziekenhuis Rijnstate Arnhem, NVA
Dr. E.C. Prinsen, klinisch gezondheidswetenschapper en fysiotherapeut, Roessingh Research and Development, Enschede, KNGF
Prof. dr. J.H.P. Houdijk, hoogleraar klinische bewegingswetenschappen, Centrum voor bewegingswetenschappen, Universitair Medisch Centrum Groningen, Rijksuniversiteit Groningen, ISPO Nederland
Mevr. B.M.H. Donders, patiëntvertegenwoordiger, Korter maar Krachtig
Ing. J. Olsman, productgroepmanager protheses/ Teamleider en orthopedisch adviseur, OIM Orthopedie, NVOS-Orthobanda/SOHN
Dhr. T. Holling, senior orthopedisch adviseur, Livit Orthopedie, NVOS-Orthobanda/SOHN
Drs. H.P.P.R. de Wever, specialist ouderengeneeskunde, kaderarts geriatrische revalidatie, Stichting Tante Louise, Bergen op Zoom, Verenso
Dr. E. Schrier, GZ-psycholoog, Centrum voor Revalidatie Groningen, Universitair Medisch Centrum Groningen, NIP

Meelezers

Drs. G.J. Renzenbrink, bestuurder/revalidatiearts, Rijndam Revalidatie, Rotterdam, RN
Mevr. P.S.M. van der Vecht, ergotherapeut, Stichting Zorgspectrum, EN
Mevr. F.G. de Valk, verpleegkundig specialist vaatchirurgie, CWZ Nijmegen, V&VN

Met ondersteuning van

Dr. S. Persoon, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Dhr. M. Oerbekke, MSc, junior adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Dr. W.J. Harmsen, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Dr. L. Viester, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
Mevr. L. Boerboom, MSc, medisch informatiespecialist Kennisinstituut van de Federatie Medisch Specialisten.
Mevr. L.H.M. Niesink-Boerboom, MSc, medisch informatiespecialist Kennisinstituut van de Federatie Medisch Specialisten.
Mevr. M. Wessels, MSc, medisch informatiespecialist Kennisinstituut van de Federatie Medisch Specialisten.

Belangenverklaringen

De KNMG-code ter voorkoming van oneigenlijke beïnvloeding door belangenverstrengeling is gevolgd. Alle werkgroepleden hebben schriftelijk verklaard of zij in de laatste drie jaar directe financiële belangen (betrekking bij een commercieel bedrijf, persoonlijke financiële belangen, onderzoeksfinanciering) of indirecte belangen (persoonlijke relaties, reputatiemanagement, kennisvalorisatie) hebben gehad. Een overzicht van de belangen van werkgroepleden en leden van de klankbordgroep, en het oordeel over het omgaan met eventuele belangen vindt u in onderstaande tabel. De ondertekende belangenverklaringen zijn op te vragen bij het secretariaat van het Kennisinstituut van de Federatie Medisch Specialisten.

Werkgroeplid/ lid klankbordgroep	Functie	Nevenfuncties	Gemelde belangen	Ondernomen actie
Donders-van Straaten	Geen	* Zorgvrijwilliger hospice Bethlehem te Nijmegen (onbetaald) * Ski docent vereniging van gehandicapte wintersporters (onbetaald) * Bestuurslid vereniging Korter maar Krachtig (onbetaald)	Geen	Geen actie nodig
Geertzen	Hoofd afdeling Revalidatiegeneeskunde en hoofd Centrum voor Revalidatie, Universitair Medisch Centrum (UMCG)	Bestuurslid Revalidatie Nederland vanaf zomer 2018 (onbetaald) Tot 1-10-2018 Vicevoorzitter bestuur Hooglerarenconvent en bestuur UMCG (onbetaald) Editorial Board ‘Clinical Rehabilitation’ (onbetaald). Oprichter patiëntenvereniging Korter maar Krachtig, nu in Raad van Advies. Voorzitter Hoogleraren convent VRA (onbetaald) Voorzitter wetenschappelijke taskforce VRA-RN (onbetaald) Voorzitter bestuurscommissie Onderzoek Innovatie en Kwaliteit (BOIK) RN (onbetaald)	Vanuit mijn functie als hoogleraar/promotor ben ik betrokken bij verschillende extern/intern gefinancierde onderzoeksprojecten en subsidie aanvragen voor onderzoeksprojecten op het gebied van beenamputatie en prothesiologie. Het gaat om onafhankelijk wetenschappelijk onderzoek dat niet van oneigenlijke invloed zal zijn op de richtlijn of oneigenlijk gebruik zal maken van de richtlijn. Onderzoeksresultaten en publicaties van mijn onderzoeksgroep zouden kunnen worden aangehaald en gebruikt bij het opstellen van de richtlijn.	Geen actie nodig
Rietman	Revalidatiearts Roessingh centrum voor Revalidatie: 0,4 Hoogleraar Revalidatiegeneeskunde en technologie UT: 0,3 Lid managementteam Roessingh Research and Development: 0,3 Werkgever is in alle: Roessingh centrum voor Revalidatie: taken zijn via detachering geregeld.	2016- Adjunct professor Northwestern University Chicago USA (onbetaald) 2009- Editorial Board: J Back and Musculoskeletal Rehabilitation (onbetaald) 2011- Editorial Board: Disability and Rehabilitation (onbetaald) 2014-2019 President of the Netherlands Society of physical and Rehabilitation Medicine (onbetaald) 2015- Member European Academy for Rehabilitation Medicine (onbetaald) 2016- 2018 Scientific Medical director IMDI CoRE SPRINT (onbetaald) 2018- Leader IMDI 2.0 program (onbetaald) 2016- 2020 Raad van Toezicht METC Twente (onbetaald)	Co-principal investigator Myleg – gefinancierd door de Europese Commissie (H2020-ICT-25-2017) Het doel van het MyLeg project is het ontwikkelen van osseogeintegreerde, gemotoriseerde bovenbeenprothese die aangestuurd wordt door middel van spieractiviteit gemeten met EMG. Geen belang bij de financier van het onderzoek EUROBENCH – gefinancierd door de Europese Commissie (H2020-ICT-27b-2017) Het doel van het EUROBENCH project is het ontwikkelen van een benchmarking methodologie voor wearable robotics. Prothesen worden gerekend tot wearable robotics. Geen belangen bij de financier KeepWalking – niet gefinancierd onderzoek. Doel van dit internationale multicenteronderzoek is de evaluatie van een nieuwe endoprothese (KeepWalking Femoral Inplant) voor mensen met een bovenbeenamputatie. Het betreft niet-gefinancierd onderzoek, dus er staat geen vergoeding tegenover de deelname. Geen belang bij de financier	Geen actie nodig
Prinsen	Tot 01-07-2020 Senior onderzoeker Roessingh Research and Development Vanaf 01-07-2020 Cluster Manager Revalidatietechnologie Roessingh Research and Development	Voorzitter Nederlandse Vereniging van Revalidatie Fysiotherapeuten (vacatiegeld). tot 01-07-2019 Lid Wetenschappelijk College Fysiotherapie (vacatiegeld). Bestuurslid ISPO Nederland (onbetaald)	Ik ben betrokken bij een drietal onderzoeksprojecten gericht op prothesiologie bij mensen met een amputatie van de onderste extremiteit. Deze projecten zijn beschreven bij de gemelde belangen van prof.dr. J.S. Rietman.	Geen actie nodig
Houdijk	Tot 1-5-2020: Universitair Hoofddocent, afdeling bewegingswetenschappen, Faculteit der Gedrags- en Bewegingswetenschappen, Vrije Universiteit van Amsterdam (1,0 fte) Vanaf 1-5-2020 Hoogleraar klinische bewegingswetenschappen. Centrum voor bewegingswetenschappen, Universitair Medisch Centrum Groningen (UMCG), Rijksuniversiteit Groningen (RUG) 1.0 Fte	Voorzitter, ISPO Nederland (onbetaald) tot 1-5-2020 Senior onderzoeker, Heliomare R&D, Wijk aan Zee (detachering 0,2 fte)	Vanuit mijn functie ben ik betrokken bij verschillende extern gefinancierde onderzoeksprojecten en subsidie aanvragen voor onderzoeksprojecten op het gebied van beenamputatie en prothesiologie. Het gaat om onafhankelijk wetenschappelijk onderzoek dat niet van oneigenlijke invloed zal zijn op de richtlijn of oneigenlijk gebruik zal maken van de richtlijn. Onderzoeksresultaten en publicaties van mijn onderzoeksgroep zouden kunnen worden aangehaald en gebruikt bij het opstellen van de richtlijn.	Geen actie nodig
Wever	Specialist ouderengeneeskunde, kaderarts geriatrische revalidatie	Lesgeven geriatrische orthopedische revalidatie opleiding specialisten ouderengeneeskunde Radboud Universiteit Nijmegen (vergoeding per lesuur). Lid bestuur GRZ-kaderartsen (onbezoldigd)	Geen	Geen actie nodig
Jutte	Orthopedisch chirurg	Consultant firma Stryker, betaald werk waarbij de gelden ten goede komen aan de Kingma stichting voor orthopedisch onderzoek lid commissie voor beentumoren, onbetaald lid werkgroep orthopedische infecties (onbetaald). Lid opleidingscommissie NOV, (onbetaald). voorzitter tumorwerkgroep weke delen en bottumoren UMCG voorzitter Ondersteuning psychosociale zorg UMCG CCC Lid wetenschappelijke adviescommissie Sarcoom patiëntenvereniging	ZonMW LEAK study. De beste behandeling van wondlekkage na heup en knieprothese	Geen actie nodig Stryker richt zich op bio-implantaten bij oncologische patiënten. Dit onderwerp valt buiten de scope van de richtlijn.
Holling	Orthopedisch adviseur Livit Orthopedie Prothesemaker en teamcoördinator	Prothesemaker onderste extremiteit (11 jaar bij Livit) Teamcoördinator (coördinerende taken bij filiaal Livit Den Haag)	Geen	Geen actie nodig (geen persoonlijk gewin)
Olsman	Productgroepmanager prothese & Teamleider & Orthopedisch adviseur bij OIM Orthopedie, locatie Zwolle	Secretaris: ISPO NL Aantal uren in de week op de avond, dit is onbetaald. Secretaris: NBOT Nederlandse beroepsvereniging Orthopedisch Technologen Aantal uren in de week op de avond, dit is onbetaald. Redactielid: NVOS & Orthobanda, tijdschrift: Orthopedisch Techniek. Aantal uren in de maand op de avond, kleine onkostenvergoeding. Werkgroeplid: Platform prothesezorg arm- en beenprothese Vilans/VWS. Aantal uren in de week, en deze uren worden beschikbaar gesteld door OIM orthopedie. Externe examinator: Fontys Hogeschool, betreft opleiding Orthopedisch technologie. Aantal uren per jaar, en deze uren worden beschikbaar gesteld door OIM Orthopedie.	Geen	Geen actie nodig (geen persoonlijk gewin)
Daemen	Vaatchirurg MUMC+	Geen	Geen	Geen actie nodig
Fard	Revalidatiearts Roessingh Centrum voor Revalidatie, Enschede; Buitenpromovendus UMCG, Groningen	Promotieonderzoek: epidemiologie van dysvasculaire beenamputaties	Eén van huidige promotores van wetenschappelijk onderzoek is Prof. Dr. J.H.B. Geertzen, die tevens voorzitter van de huidige richtlijncommissie is.	Geen actie nodig
Hoope	Anesthesioloog	Geen	Geen	Geen actie nodig
Lamprou	Vaatchirurg	Geen	Geen	Geen actie nodig
Schrier	GZ-psycholoog UMCG	Geen	Geen	Geen actie nodig
De Valk	Verpleegkundig specialist vaatchirurgie	Geen	Geen	Geen actie nodig
Vecht	Ergotherapeut bij Stichting Zorgspectrum	Administratief medewerker bij Studiotekst B.V. (betaald)	Geen	Geen actie nodig
Renzenbrink	Lid Raad van Bestuur Revalidatiearts Rijndam Revalidatie	Lid bestuurscommissie Onderzoek, Innovatie & Kwaliteit; Revalidatie Nederland (onbezoldigd)	Geen	Geen actie nodig
Persoon	Adviseur Kennisinstituut van de Federatie Medisch Specialisten	Tot oktober 2018 Gastvrijheidsaanstelling afdeling Revalidatie Academisch Medisch Centrum, Amsterdam, in verband met promotietraject. Project: Physical fitness to improve fitness and combat fatigue in patients with multiple myeloma or lymphoma treated with high dose chemotherapy. April 2018-september 2018: Docent Team Technologie, Fontys Paramedische Hogeschool. Begeleiden van studenten bij afstudeerstages. Max 1 dag in de week, betaald.	Geen, Promotieonderzoek werd gefinancierd door KWF, financier had geen invloed op uitkomsten onderzoek of op huidige werkzaamheden.	Geen actie
Harmsen	Adviseur Kennisinstituut van de Federatie Medisch Specialisten	Geen	Geen	Geen actie nodig
Oerbekke	Junior Adviseur (0,6 fte) Kennisinstituut van de Federatie Medisch Specialisten PhD student (0,4 fte) Cochrane Netherlands, UMCU.	Geen	Geen, PhD positie wordt bekostigd door het Kennisinstituut van de Federatie Medisch Specialisten	Geen actie nodig
Viester	Adviseur Kennisinstituut van de Federatie Medisch Specialisten	Geen	Geen	Geen actie nodig

Inbreng patiëntenperspectief

Er werd aandacht besteed aan het patiëntenperspectief door participatie van een werkgroeplid namens Korter maar Krachtig. De conceptrichtlijn is tevens voor commentaar voorgelegd aan Korter maar Krachtig, Harteraad en Patiëntenfederatie Nederland.

Methode ontwikkeling

Evidence based

Implementatie

In de verschillende fasen van de richtlijnontwikkeling is rekening gehouden met de implementatie van de richtlijn (module) en de praktische uitvoerbaarheid van de aanbevelingen. Daarbij is uitdrukkelijk gelet op factoren die de invoering van de richtlijn in de praktijk kunnen bevorderen of belemmeren. De implementatietabellen zijn te vinden in de bijlagen van de afzonderlijke modules.

Werkwijze

AGREE

Deze richtlijn is opgesteld conform de eisen vermeld in het rapport Medisch Specialistische Richtlijnen 2.0 van de adviescommissie Richtlijnen van de Raad Kwaliteit. Dit rapport is gebaseerd op het AGREE II instrument (Appraisal of Guidelines for Research & Evaluation II; Brouwers, 2010), dat een internationaal breed geaccepteerd instrument is. Voor een stap-voor-stap beschrijving hoe een evidence-based richtlijn tot stand komt wordt verwezen naar het stappenplan Ontwikkeling van Medisch Specialistische Richtlijnen van het Kennisinstituut van de Federatie Medisch Specialisten.

Knelpuntenanalyse

Tijdens de voorbereidende fase inventariseerden de voorzitter van de werkgroep en de adviseur de knelpunten. De werkgroep beoordeelde de aanbevelingen uit de eerdere richtlijn (Amputatie en prothesiologie onderste extremiteit (2012)) op noodzaak tot revisie. Tevens zijn er knelpunten aangedragen tijdens een invitational conference door de Nederlandse Vereniging van Revalidatieartsen, Harteraad, Verpleegkundigen en Verzorgenden Nederland, Verenso, NVOS-Orthobanda, Inspectie Gezondheidszorg en Jeugd, Ergotherapie Nederland, Revalidatie Nederland, Koninklijk Nederlands Genootschap voor Fysiotherapie. Het verslag van de invitational conference is opgenomen onder aanverwante producten. De werkgroep stelde vervolgens een long-list met knelpunten op en prioriteerde de knelpunten op basis van: (1) klinische relevantie, (2) de beschikbaarheid van (nieuwe) evidence van hoge kwaliteit, (3) en de te verwachten impact op de kwaliteit van zorg, patiëntveiligheid en (macro)kosten.

Uitgangsvragen en uitkomstmaten

Op basis van de uitkomsten van de knelpuntenanalyse zijn door de werkgroep en adviseur concept-uitgangsvragen opgesteld. Voor de modules ‘Suspensiemethode’ en ‘Klinimetrie’ is ook input gevraagd van de Werkgroep Amputatie en Prothesiologie (WAP) van de Nederlandse Vereniging van Revalidatieartsen. Vervolgens inventariseerde de werkgroep per uitgangsvraag welke uitkomstmaten voor de patiënt relevant zijn, waarbij zowel naar gewenste als ongewenste effecten werd gekeken. De werkgroep waardeerde deze uitkomstmaten volgens hun relatieve belang bij de besluitvorming rondom aanbevelingen, als cruciaal (kritiek voor de besluitvorming), belangrijk (maar niet cruciaal) en onbelangrijk. Tevens definieerde de werkgroep tenminste voor de cruciale uitkomstmaten welke verschillen zij klinisch (patiënt) relevant vonden.

Strategie voor zoeken en selecteren van literatuur

Er werd voor de afzonderlijke uitgangsvragen aan de hand van specifieke zoektermen gezocht naar gepubliceerde wetenschappelijke studies in (verschillende) elektronische databases. Tevens werd aanvullend gezocht naar studies aan de hand van de literatuurlijsten van de geselecteerde artikelen. In eerste instantie werd gezocht naar studies met de hoogste mate van bewijs. De werkgroepleden selecteerden de via de zoekactie gevonden artikelen op basis van vooraf opgestelde selectiecriteria. De geselecteerde artikelen werden gebruikt om de uitgangsvraag te beantwoorden. De databases waarin is gezocht, de zoekstrategie en de gehanteerde selectiecriteria zijn te vinden in de module met desbetreffende uitgangsvraag.

Kwaliteitsbeoordeling individuele studies

Individuele studies werden systematisch beoordeeld, op basis van op voorhand opgestelde methodologische kwaliteitscriteria, om zo het risico op vertekende studieresultaten (risk of bias) te kunnen inschatten. Deze beoordelingen kunt u vinden in de risk of bias tabellen.

Samenvatten van de literatuur

De relevante onderzoeksgegevens van alle geselecteerde artikelen werden overzichtelijk weergegeven in evidencetabellen. De belangrijkste bevindingen uit de literatuur werden beschreven in de samenvatting van de literatuur.

Beoordelen van de kracht van het wetenschappelijke bewijs

A) Voor interventievragen (vragen over therapie of screening)

De kracht van het wetenschappelijke bewijs werd bepaald volgens de GRADE-methode. GRADE staat voor ‘Grading Recommendations Assessment, Development and Evaluation’ (zie http://www.gradeworkinggroup.org/ ). De basisprincipes van de GRADE-methodiek zijn: het benoemen en prioriteren van de klinisch (patiënt) relevante uitkomstmaten, een systematische review per uitkomstmaat, en een beoordeling van de bewijskracht per uitkomstmaat op basis van de acht GRADE-domeinen:

Vijf domeinen voor downgraden:
- - risk of bias;
  - inconsistentie;
  - indirectheid;
    - imprecisie;
- publicatiebias.
Drie domeinen voor upgraden:
- dosis-effect relatie;
- groot effect;
- residuele plausibele confounding.

Bij het beoordelen van de bewijskracht voor interventievragen starten resultaten uit gerandomiseerd onderzoek hoog, en resultaten uit (vergelijkend) observationeel onderzoek laag, en wordt de uiteindelijke bewijskracht bereikt na evaluatie van de acht GRADE-domeinen.

GRADE onderscheidt vier gradaties voor de kwaliteit van het wetenschappelijk bewijs: hoog, redelijk, laag en zeer laag. Deze gradaties verwijzen naar de mate van zekerheid die er bestaat over de literatuurconclusie (Schünemann, 2013).

GRADE	Definitie
Hoog	er is hoge zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt zoals vermeld in de literatuurconclusie; het is zeer onwaarschijnlijk dat de literatuurconclusie verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Redelijk*	er is redelijke zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt zoals vermeld in de literatuurconclusie; het is mogelijk dat de conclusie verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Laag	er is lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt zoals vermeld in de literatuurconclusie; er is een reële kans dat de conclusie verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Zeer laag	er is zeer lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt zoals vermeld in de literatuurconclusie; de literatuurconclusie is zeer onzeker.

*in 2017 heeft het Dutch GRADE Network bepaald dat de voorkeursformulering voor de op een na hoogste gradering ‘redelijk’ is in plaats van ‘matig’

B) Voor prognostische vragen

De kracht van het wetenschappelijke bewijs werd eveneens bepaald volgens de GRADE-methode. In de gehanteerde generieke GRADE-methode werden de basisprincipes van de GRADE-methodiek toegepast (zie ook A). Bij het beoordelen van de bewijskracht starten bij prognostische vragen de resultaten uit studies (onafhankelijk van het design) echter standaard op hoog, waarna de uiteindelijke bewijskracht wordt bereikt na evaluatie van de GRADE-domeinen (zie ook A). Ook hier geven de gradaties de mate van zekerheid die bestaat over de literatuurconclusies weer.

C) Voor vragen over de waarde van meet- of classificatie-instrumenten (klinimetrie)

Voor module Klinimetrie is per meetinstrument voor elke meeteigenschap een GRADE-beoordeling uitgevoerd. De gebruikte methodiek is afgeleid van de procedure beschreven in de COSMIN handleiding voor systematische reviews van PROMS (Mokkink, 2018) en wordt in betreffende module uitgebreider beschreven.

Formuleren van de conclusies

Voor elke relevante uitkomstmaat werd het wetenschappelijk bewijs samengevat in één of meerdere literatuurconclusies waarbij het niveau van bewijs werd bepaald volgens de GRADE-methodiek.

Overwegingen (van bewijs naar aanbeveling)

Om te komen tot een aanbeveling zijn naast (de kwaliteit van) het wetenschappelijke bewijs ook andere aspecten belangrijk en worden meegewogen, zoals aanvullende argumenten uit bijvoorbeeld de biomechanica of fysiologie, waarden en voorkeuren van patiënten, kosten (middelenbeslag), aanvaardbaarheid, haalbaarheid en implementatie. Deze aspecten zijn systematisch vermeld en beoordeeld (gewogen) onder het kopje ‘Overwegingen’ en kunnen (mede) gebaseerd zijn op expert opinion. Hierbij is gebruik gemaakt van een gestructureerd format gebaseerd op het evidence-to-decision framework van de internationale GRADE Working Group (Alonso-Coello, 2016a; Alonso-Coello 2016b). Dit evidence-to-decision framework is een integraal onderdeel van de GRADE-methodiek.

Formuleren van aanbevelingen

De aanbevelingen geven antwoord op de uitgangsvraag en zijn gebaseerd op het beschikbare wetenschappelijke bewijs en de belangrijkste overwegingen, en een weging van de gunstige en ongunstige effecten van de relevante interventies. De kracht van het wetenschappelijk bewijs en het gewicht dat door de werkgroep wordt toegekend aan de overwegingen, bepalen samen de sterkte van de aanbeveling. Conform de GRADE-methodiek sluit een lage bewijskracht van conclusies in de systematische literatuuranalyse een sterke aanbeveling niet a priori uit, en zijn bij een hoge bewijskracht ook zwakke aanbevelingen mogelijk. De sterkte van de aanbeveling wordt altijd bepaald door weging van alle relevante argumenten tezamen.

In de GRADE-methodiek wordt onderscheid gemaakt tussen sterke en zwakke (of conditionele) aanbevelingen. De sterkte van een aanbeveling verwijst naar de mate van zekerheid dat de voordelen van de interventie opwegen tegen de nadelen (en vica versa), gezien over het hele spectrum van patienten waarvoor de aanbeveling is bedoeld. Een aanbeveling is geen dictaat, zelfs een sterke aanbeveling gebaseerd op bewijs van hoge kwaliteit (GRADE-gradering HOOG) zal niet altijd van toepassing zijn, onder alle mogelijke omstandigheden en voor elke individuele patiënt.

Randvoorwaarden (Organisatie van zorg)

In de knelpuntenanalyse en bij de ontwikkeling van de richtlijn is expliciet rekening gehouden met de organisatie van zorg: alle aspecten die randvoorwaardelijk zijn voor het verlenen van zorg (zoals coördinatie, communicatie, (financiële) middelen, menskracht en infrastructuur). Randvoorwaarden die relevant zijn voor het beantwoorden van een specifieke uitgangsvraag maken onderdeel uit van de overwegingen bij de bewuste uitgangsvraag. Meer algemene, overkoepelende, of bijkomende aspecten van de organisatie van zorg worden behandeld in de module ‘Organisatie van zorg’.

Indicatorontwikkeling

De werkgroep heeft gezien de huidige aandacht voor de registratielast afgezien van het ontwikkelen van interne kwaliteitsindicatoren. Er was ook geen concreet plan om eventuele indicatoren te gaan gebruiken in het kwaliteitsbeleid.

Kennislacunes

Tijdens de ontwikkeling van deze richtlijn is systematisch gezocht naar onderzoek waarvan de resultaten bijdragen aan een antwoord op de uitgangsvragen. Bij elke uitgangsvraag is door de werkgroep nagegaan of er (aanvullend) wetenschappelijk onderzoek gewenst is om de uitgangsvraag te kunnen beantwoorden. Een overzicht van de onderwerpen waarvoor (aanvullend) wetenschappelijk van belang wordt geacht, is als aanbeveling in de bijlage Kennislacunes beschreven (onder aanverwante producten).

Commentaar- en autorisatiefase

De conceptrichtlijn werd aan de betrokken (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd ter commentaar. De commentaren werden verzameld en besproken met de werkgroep. Naar aanleiding van de commentaren werd de conceptrichtlijn aangepast en definitief vastgesteld door de werkgroep. De definitieve richtlijn werd aan de deelnemende (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd voor autorisatie en door hen geautoriseerd dan wel geaccordeerd.

Literatuur

Alonso-Coello P, Schünemann HJ, Moberg J, et al. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ. 2016;353:i2016.

Alonso-Coello P, Oxman AD, Moberg J, et al. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines. BMJ. 2016;353:i2089.

Brouwers MC, Kho ME, Browman GP, et al. AGREE Next Steps Consortium. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010;182(18):E839-42. doi: 10.1503/cmaj.090449. Epub 2010 Jul 5. Review. PubMed PMID: 20603348.

Medisch Specialistische Richtlijnen 2.0 (2012). Adviescommissie Richtlijnen van de Raad Kwaliteit. https://www.demedischspecialist.nl/sites/default/files/Medisch%20specialistische%20richtlijnen%202_0%20okt%202012.pdf

Mokkink LB, Prinsen CA, Patrick DL, et al. COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs). User manual. 2018;78:1. Beschikbaar op: https://www.cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018-1.pdf.

Schünemann H, Brożek J, Guyatt G, et al. GRADE handbook for grading quality of evidence and strength of recommendations. Updated October 2013. The GRADE Working Group, 2013. Available from http://gdt.guidelinedevelopment.org/central_prod/_design/client/handbook/handbook.html.

Schünemann HJ, Oxman AD, Brozek J, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ. 2008;336(7653):1106-10. doi: 10.1136/bmj.39500.677199.AE. Erratum in: BMJ. 2008;336(7654). doi: 10.1136/bmj.a139. PubMed PMID: 18483053.

Zoekverantwoording

Uitgangsvraag: Welke klinische meetinstrumenten zijn het meest geschikt om loopvaardigheid en mobiliteit te kunnen evalueren?
Database(s): Medline, Cinahl	Datum: 01-05-2019
Periode: geen beperking	Talen: E

Database

Zoektermen

Totaal

Medline (OVID)

Engels

4 (("Amputation"/ or "Amputees"/) and (exp lower extremity/ or (lower extremit* or lower limb* or leg? or transtibial or trans-tibial or knee? or transfemoral or trans-femoral or foot or feet or ankle*).ti,ab,kf.)) or ((prosthe* or amput*) adj4 (lower extremit* or lower limb* or leg or transtibial or knee or transfemoral or foot or ankle*)).ti,ab,kf. or "Amputees"/rh (21847)

5 (SIGAM* or "minute walking test*" or "10 meter walk test" or "ten meter walk test" or "Timed get up and go test" or TUG or L-test or (Four-square adj3 test) or FSST or "Breath gas analysis" or "berg balance scale" or BBS or "Locomotive Capability Index" or LCI or "prosthetic limb users survey of mobility" or Plus-M or "Prosthetic Profile of the Amputee" or PPA or "Prosthetic Evaluation Questionnaire" or PEQ or ((breath or gas) adj3 analysis)).ti,ab,kf. (27005)

6 4 and 5 (154)

7 "Special Interest Group in Amputee Medicine ".ti,ab,kf. (8)

8 6 or 7 (157)

9 limit 8 to english language (156)

287

Cinahl (Ebsco)

(MH "Amputation+") OR (MH "Amputees") OR (MH "Limb Prosthesis") OR ((prosthe* or amput*) N4 (lower extremit* or lower limb* or leg or transtibial or knee or transfemoral or foot or ankle*)) )

AND TX ( (SIGAM* or "minute walking test*" or "10 meter walk test" or "ten meter walk test" or "Timed get up and go test" or TUG or L-test or (Four-square N3 test) or FSST or "Breath gas analysis" or "berg balance scale" or BBS or "Locomotive Capability Index" or LCI or "prosthetic limb users survey of mobility" or Plus-M or "Prosthetic Profile of the Amputee" or PPA or "Prosthetic Evaluation Questionnaire" or PEQ or ((breath or gas) N3 analysis)) ) (235) > 131 uniek

Amputatie en prothesiologie onderste extremiteit

Amputatie en prothesiologie onderste extremiteit

Uitgangsvraag

Aanbeveling

Overwegingen

Onderbouwing

Achtergrond

Conclusies / Summary of Findings

Samenvatting literatuur

Zoeken en selecteren

Referenties

Evidence tabellen

Verantwoording

Beoordelingsdatum en geldigheid

Initiatief en autorisatie

Algemene gegevens

Samenstelling werkgroep

Belangenverklaringen

Inbreng patiëntenperspectief

Methode ontwikkeling

Implementatie

Werkwijze

Zoekverantwoording

Bijlagen

De Federatie Medisch Specialisten maakt gebruik van cookies