Selected Development Themes   

Monitoring & Evaluation

Content related to: monitoring and evaluation, including issues results and impact, the IFAD Results an Impact Measurement System (RIMS), participatory monitoring, indicators, logical frameworks, and on-going, internal and external evaluation

Discussions Discussions

«Back

How do we make project monitoring and evaluation more effective?

Display Date: 8/31/11

 

At a basic level all IFAD-supported projects monitor and report on the progress of project activities and outputs – such as how many people have been trained or how many hectares of land are covered by new irrigation systems.  Most projects also report on impact, with at least data on the RIMS anchor indicators of child malnutrition, food security and assets being collected from a sample of project participants at baseline, mid-term and completion.   These impact indicators aim to provide evidence of achievement of the goal of poverty reduction.   Some projects also collect data on the intermediate outcome indicators – the immediate results of project outputs such as adoption of new technologies, improved irrigation water supply or increased crop production. 

 

In the development community – or at least those involved in monitoring and evaluation - there has been a demand for a more rigorous approach to the evaluation of the effectiveness of international development assistance.    The International Initiative on Impact Evaluation (http://www.3ieimpact.org) and others have argued that more convincing evidence is needed to attribute development outcomes to project interventions. 

 

Randomised Control Trials (RCT) have emerged as the “gold standard” for measuring the results of a development intervention.  This approach, which is also used in trials to test new drugs, involves dividing members of a target population at random into two groups, one of which is given the development intervention and one of which is not.  Indicators for the two groups are measured and the difference between the two groups at the end of the trial is the measured impact of the intervention.

 

Although RCT can produce useful insights into what works and what does not work, it does not seem to be a tool that can be used to measure the impact of a development project.  Project participants are not chosen at random from a target population – they may volunteer to join or projects may recruit people they think best fit the target group or are locations that will benefit most from project interventions.   Impact surveys for such projects can still use people who have not participated in the project as a comparison or “control group”, but this group, as it was not chosen at random from the same population, will not be identical to those participating in the project.   In other words, people who do not participate in the project are not participating because they do not wish to, or may not fit the profile of the target group as well, or they may live in a location which has different characteristics.    However it is still possible to use such a control group in what is known as a “semi-experimental design” for an impact survey.

 

The idea behind such a semi-experimental design is to compare changes that take place in the project group with changes in as good a comparison group as is practically possible.   There are various statistical techniques that can help in selecting a control group that is a near-match to the project group, such as Propensity Score Matching, but a practical and straight forward approach is to select as group that fits the profile of the target group as closely as possible (such as being small farmers from nearby villages) and then using the “difference of difference” technique to measure project impact – this being the difference between the two groups in change in indicators from pre-project to post-project dates. 

 

Such an approach requires pre-project (baseline) surveys for both project and control groups, which are often not done (or otherwise difficult to do).   But we can still produce evidence of project impact without data on the “difference of difference”.   One way is measure changes in the project group though baseline and impact (post-project) data, and then collect some more qualitative information from a control group for comparison.   For example, baseline and impact surveys may show an increase in yield of an average of 25% for 90% of project group farmers, while only 15% of farmers in a control group reported having any significant increase in yield (note that, without a control group baseline, we are not trying to measure the size of the yield increase for the control group). 

 

It is also possible to produce better evidence of project impact by using a “results chain”.    A project that provided training in agriculture to 90% of its participants, and then reported that the proportion of underweight children fell from 40% to 25%, has not produced much of a case to show that training resulted in better child nutrition.  We do not know if the training was effective, and some other factor may have improved child malnutrition.    However if the project can produce evidence of a results chain, this can make the case that the training did result in reduced child malnutrition.  For example, evidence can be collected to show that the training actually provided farmers with information that they did not obtain from other sources, and this resulted in adoption of new technologies, that in turn increased crop production, and this extra production was either consumed at home, or used to buy food, that meant that periods of food shortage were reduced and diet was improved, then we can claim that the training did have an impact on child malnutrition.                 

 

If projects can generate convincing evidence of their results and impact, not only will people, including those outside of IFAD and its implementation partner, be persuaded that the project was successful, but useful lessons will be learned for the planning of future interventions.   To produce this evidence of results, projects need the capacity and resources to collect, analyse and interpret data that links project implementation to outcomes and impact.  I wonder if projects feel that they have the resources they need, and should the design of new projects provide additional resources?

Comments
I am thankful to Mr. Mallorie for bringing out this important subject for discussion. There are in my view essentially three reasons why we need to focus on putting more rigor to the robust impact evaluation system that we are putting in place and thereby incorporating more resources in new project designs as well as in Country Offices : Firstly, national governments are strengthening knowledge sharing and management capacities and that there is demand for evidence based studies and rigorous and robust impact evaluation, both at the state level and national level; b) secondly, it is a sound way of advocating policy through systematic knowledge generation, in preparing policy briefs and strengthening pipeline project designs; and c) lastly, this initiative will help to harmonise international initiatives with other donors, both multilateral and bilateral as in the last five years there is a renewed interest on impact evaluation.
Randomised Control Trial method has emerged from clinical research and as a purely experimental design based on randomisation, which has been used for impact evaluation. Recent work on RCT in rural development and health sector has shown success, but I still would agree with Mr. Mallorie that it may not always be suitable to use this methodology. The method is dependent on the context and baseline descriptors mainly related to demography which might have a bearing on the primary outcome and impacts of the intervention. Since it is based on the ‘Intent to Treat’ selection of treatment and non treatment groups have to be done accurately. And design errors might question the process of randomisation (e.g. literature informs us that if the statistical tests T-test & Chi-square are conducted at 0.05 level of significance and if the P-value is greater than 0.05 level beyond 1 in 20 expected, one might question whether randomisation was done properly.). If the time duration is short between intervention and evaluation the regression model may not show many significant points. RCT has been also conducted in agricultural trials. In rural development there could be ethical issues of selection of treatment groups, political and social feasibility issues, and design limitation in terms of ‘spill over’ and ‘cross over’ effects, etc. Nevertheless it is still the considered as a statistically powerful tool and the ‘First Best’ methodology.
Using quasi-experimental methods are also useful and a sound set of tools for impact evaluation of agricultural and rural development projects. In IFAD M&E we are primarily using two quasi-experimental designs. One of which is used rather half-heartedly the ‘double difference/difference-in-difference method’ in the annual outcome surveys where we use control groups, or a comparison group . But we propose that we start the baseline survey two years after the manifestation of outcome, there by weaken the first difference (in the before and after comparison ) in the evaluation process and then we use the control group to compare with intervention group for the second difference (to compare with and without intervention effects). Ideally suitable to assess attribution and deduce the counterfactual. In this quasi-experimental method the importance of identifying a proper comparison group cannot be ignored. As Mr. Mallorie mentioned that we use propensity score matching methods in difference-in-difference for identifying comparison groups. Thus far, projects have identified comparison groups within the project area (as recommended) and there are project that have used comparison groups outside the project area as well. Propensity score matching would be applicable in both conditions and there could be a possible case of dropping the comparison groups if the scores are out of the range of the intervention group. Perhaps to avoid this slightly technical aspect we have safely selected apart from ethical and other considerations the ‘Reflexive Comparison’ (another quasi-experimental method) for the RIMS plus survey. However, would like others to share their views on this aspect.
I hope this discussion would lead to refining our M&E methodology and hopefully by the end of this year IFAD would have a workshop specifically dedicated to the subject of impact evaluation. Accordingly identify and allocating resources in new project design and in my view specific allocation to conduct programme level impact evaluation studies by Country Offices. This would help IFAD to support policy dialogue with national government and we could align with recent international initiatives in impact evaluation. I look forward to hearing more from my colleagues in IFAD and other members in the network.

Posted on 9/4/11 6:01 AM.

Top
The other aspect that I would like to empahsis, which I had done in another discussion is that the baseline survey is predominantly used for impact evaluation. Edward mentioned how we could also make evaluation effective looking at the results chain. It is important to see how the results chain emerges from the use of inputs to the creation of outputs then these leading to intermediate outcomes and finally contributing to the project impact. However, baseline surveys have been also seen by a large number of projects as a planning tool..I am not sure if this is the right perspective.

In the new projects (MPOWER and ILSP) we are doing a baseline of an annual outcome survey, not only to track outcomes, but to capture intermediate outcomes associated with an activity/intervention, specific to a project design. The salient distinction from the past method is that in the past we used the annual outcome survey with indicators from IFAD strategic framework cutting across all project designs, but now we would make the annual outcome survey fit the project design with intermediate indicators selected from the project logframe. We are doing this because a common annual outcome survey questionnaire may not be suitable to all projects. For example, in the Post Tsunami Sustainable Livelihoods Programme (PTSLP) we had to redesign the Annual Outcome questionnaire with respect to the project components, as the project has a unique set of components. Alternatively, customising the currently Annual Outcome survey in relation to a specific project design could be also another option to make the monitoring and evaluation of project more effective. Let's discuss this further.

Posted on 9/5/11 4:30 PM in reply to Rafique Shaheel.

Top
On taking a second look at datasets of the Annual Outcome surveys done in India, there is a scope for comparing variability of intervention group and comparison/ controll groups by using the 'F'-Test. As of now iin a few of our projects we have compared the means of the intervention and control groups by using T-Test. For example, in the first round of the Annual Surveys tort the womwns' empowerment indicator T-Test showed significant difference between intervention and control groups.

Posted on 9/10/11 2:19 AM in reply to Rafique Shaheel.

Top
Shaheel

Thanks for your comments and ideas of making M&E surveys more effective.
I agree with your suggestion that we need to customise annual outcome surveys to be able to gather data on outcomes that reflect the results of individual projects. All projects work in different ways and have different outputs - which means the immediate results (outcome indicators) also differ. This is in contrast to indicators of impact at the goal level – when most projects have similar objectives of poverty reduction and RIMS anchor indicators of child malnutrition, household assets, food security and housing quality can be applied.

You also make a good point about carrying out some statistical analysis to see if the project has resulted in a significant change. For example, is the increase in livestock numbers for project group members significantly different from the increase in livestock numbers for a control group of other households – it may be that there is a difference between these two groups but it is too small to be significant (i.e. it has happened just by chance). I find there is often not time to carry out such statistical tests, but if the sample size and design are adequate, then we can use our own common sense to judge if changes are significant – although statistical analysis will always be more convincing.

Posted on 9/14/11 9:48 AM in reply to Rafique Shaheel.

Top
True development will happen when rural areas will develop to this achieve this goal we are selling NGO and village industry made eco friendly products on www.mawsym.com .

Posted on 3/21/18 9:38 AM.

Top

Video Gallery

Image Gallery