I’ve been meaning to write this post for a few months, so I’ll warn you – it’s going to be long. Since I realised I’d be using the ROBINS-I (Risk Of Bias In Non-randomised Studies – of Interventions) tool for my systematic review I have been searching for people, blog posts, and snippets of experience via Twitter to tell me how people have found this tool, and I didn’t find much – probably because the tool is so new. I did find a brilliant blog post from the Methods in Evidence Synthesis Salon (University of Bristol), which explains why we need a risk-of-bias tool for non-randomised studies, and gives an overview of the domains of bias that are assessed. I’d recommend reading that post before you carry on reading this one if you’re new to ROBINS-I: here. The lack of experiential posts got me thinking though – if I was looking for advice/guidance/stories of experience, surely others would appreciate that too? So, given that I couldn’t find a whole lot of detail, here’s my two-cents on ROBINS-I.
In this post, I’d like to give an overview of my experiences of using ROBINS-I (I will stress again that these are my experiences, if you’re going to use the tool for wildly different subject matters then my thoughts may not translate, and even if you use the tool for very similar studies you may still disagree – it’s all good), and then some ideas on what I think the tool does really well, and what it could do better.
As a very new systematic reviewer (i.e. this was my first) I had no pre-conceived thoughts or views on ROBINS-I, it was just another thing for me to learn how to do – just like protocol writing, abstract screening and data extraction had been. Saying that, I’ll admit that I was a bit hesitant when first downloaded the ROBINS-I pdf. I had expected the tool to be 2 or 3 pages at most, but this was 22 pages. I then looked immediately for the guidance document which was a further 53 pages. To reinforce – that’s a lot of pages to get your head around. Looking further at the tool itself calmed me down a bit; the domains were nicely split and well defined, and the tool itself contains a lot of guidance within it.
Right, using ROBINS-I. As I said, my first impressions were a mix of hesitation and ‘I’m sure this will be fine‘, but when I went into the first meeting with my Supervisor to talk ROBINS-I tactics, I went right back to hesitant. He’d never used the tool either, and we were both sat with the guidance document, a print out of the tool, and a huge stack of studies to assess – all covered in scribbles and highlighted areas that were fighting for our attention. My thought process went something like this: ‘Yep, bit nervous about this now – how long is this going to take? Will I ever get all this done before I have to submit my thesis? This is probably going to kill me.’
I’ll say upfront, the whole process of using the ROBINS-I tool to assess risk-of-bias for 103 included studies was not as much of a nightmare as I thought it would be. We (my Supervisor and I) did all of the risk-of-bias assessments together; and when I say together I mean we sat in a room and talked through the entire assessment process. The decision to do the assessments in a pair was for a few reasons; 1) neither of us had used the tool before so it was good to talk through each domain, challenge each other and then reach agreement, 2) the time it would have taken for each of us to do risk-of-bias assessments individually and then meet up to discuss discrepancies would have meant the process took at least double the amount of time, and with 103 studies that wasn’t workable, 3) honestly, I was a bit nervous.
The first study we assessed took a relatively long time. I gave my Supervisor an outline of the study, he looked over the completed data extraction form, and we talked through any flaws we could see in study design. After that we went through the ROBINS-I tool domain by domain, making sure to refer to the guidance when we needed clarification. I also made notes throughout this process, which was invaluable when trying to ensure consistency between assessments. I’ll give you an example, if we pulled one study down to moderate risk-of-bias in the ‘classification of interventions’ domain, I’d write down why. That would ensure that the text time we saw the same flaw in a different study, we’d be sure to pull it down to moderate too.
Once we were happy with the first assessment the second took less time, and the third less still.
After about 10 assessments it was clear that the studies we were looking at were falling down in similar places, and I made a sort of crib sheet (example on the right). This was how a typical study came out for us; obviously not all of them did, but it was a good way to build a loose structure. Things sped up after that. We’d arrange to meet for one or two hours at a time every week, and we got through the assessments much quicker than we first anticipated. When they were all done my Supervisor provided baked goods in celebration, I think that helped.
Advice for future users
- Do your risk-of-bias assessments in pairs if possible
- Write everything down, yep, that’s everything underlined and in bold, if you don’t do this you’ll be really angry at yourself later
- Make a loose crib sheet after you’ve got to grips with the assessment process, tweak it until you’re happy, and then apply to the rest of your studies
- Invest in highlighter pens, and lots of them – highlighting specific parts of your documents will ensure you don’t forget where there are flaws in the study design, and you can highlight the tool itself so you can see your usual ‘path’ through it
What ROBINS-I is really good at
The studies that we were looking at were not particularly good quality. We were very open right from protocol stage (screenshot on the left) that the collated data may remain at low or very low quality. That made me panic a bit; what was I going to do with a big pile of poor quality studies?! ROBINS-I provided a way to distinguish between the poor quality studies, and the not-so-poor quality ones. Using the tool helped us to create a quality gradient within the pile, which (gladly) prevented me from hating the process of writing the review up. I say that now, I’ve only just started writing the results, so there’s still time yet.
The length of the tool wasn’t a big deal for me. As I said earlier, at the beginning of the process it really intimidated me, but the judgements you need to make are guided heavily, without the named guidance document. There are lots of ‘if you answered yes to X, go to Y’ which means you never answer every question within the 22 pages, and the entire process speeds up considerably because you don’t need to keep checking what each question/judgement means in minute detail – the tool holds a lot of information itself.
What changes I think ROBINS-I would benefit from
- It’s longer than it needs to be
Let’s tackle the obvious thing first. The tool is really long, and whilst the guidance contained within it is good and it’s relatively easy to navigate, the process of doing an assessment could be very time-consumptive. My review has one outcome that we could apply ROBINS-I to, but for non-randomised clinical studies, especially those that involve multiple outcomes, this is going to take an age.
- What’s the difference between ‘yes’ and ‘probably yes’, ‘no’ and ‘probably no’, and ‘probably yes’ and ‘probably no’?
I know that the judgements you make throughout each and every domain in the tool are subjective, but the nuances between these responses makes them even more subjective, which I’m not sure is a good thing. In the older version of the risk-of-bias tool for randomised controlled trials, the responses were simply ‘yes’ ‘no’ and ‘unclear’. That seems like an easier route to ensure consistency between reviewers. As well as that, the ‘probably yes’ and ‘yes’ responses, like ‘probably no’ and ‘no’ tend to result in the same judgement for that domain anyway, so I’m unsure what these subtleties are adding to the judgement itself.
Some clarification on the need for these additional degrees of judgement would be great; if they’re not adding much to the final judgement outcome then they could either be taken out, or at least if people know the finer judgements don’t have a huge impact, they won’t agonise over their decision-making.
- When should you complete the optional question, ‘What is the predicted direction of bias due to selection of participants into the study?’ and how?
This one is a weird one for me – how do you make that judgement, and what is it adding to the process? For me, I don’t think I’d feel comfortable saying that the direction of bias could be characterised as favouring the experimental arm or the comparator. In my (perhaps incorrect – feel free to discuss!) view the fact that the study is at risk-of-bias means just that, it’s too difficult to tell what the direction of that bias is, and it ends up being another gut judgement that you can argue either way.
- Is one ‘serious’ really the same as four ‘serious’ judgements?
This was my main problem with the tool. If an overall risk-of-bias judgement using ROBINS-I comes out at ‘serious’, that means that the study is judged to be at serious risk of bias in at least one domain, but not at critical risk of bias in any domain. Meaning then, that one ‘serious’ domain and four ‘serious’ domains equate to the same overall judgement. When I was thinking about this I decided to look at it in a completely out-of-context example; image you’re a child and you get a detention once over the course of an entire school term, if you get a detention 5 times in the space of one week is your punishment or judgement by parents/teachers etc the same? I wouldn’t have thought so. I got detention once (and it really was only once in my entire school career), my parents weren’t very happy, but it wasn’t something that they were particularly worried about. If I’d come home with detention every week though, I’m pretty sure I’d have been grounded. See what I mean?
This is more tricky because all of my studies started with a ‘serious’ judgement in the confounding domain, meaning they had no chance of redemption. We knew they were all going to be at serious risk-of-bias due to confounding from the type of studies they were, so it was the other domains that allowed us to see which studies were truly of poor quality.
Have you used the ROBINS-I tool yet? What did you think? I’d really like to hear your thoughts on it, and I’m happy to answer any questions you have on my experiences. When a new tool comes out it’s always a bit tricky to navigate, and I think speaking to others and listening to their thoughts and experiences is invaluable. Leave a comment and let’s get talking.