2023 Evaluation Criteria, Methods, and Limitations

This web page provides a detailed description of how we used ACE’s four criteria to conduct our charity evaluations. This mirrors the standard text in the full review of each Recommended Charity published in 2023.

Impact Potential: How promising are the charity’s programs?

With this criterion,¹ we assess the impact potential (IP) of a charity’s programs without considering their specific program achievements. During our assessment, we analyze the groups of animals the charity’s programs target, the countries where they take place, and the intervention types they use. We also examine how the charity allocates expenditures among different animal groups, countries, and interventions. A charity that performs well on this criterion has programs with great potential to reduce animal suffering or improve animal wellbeing. The key aspects that ACE examines when evaluating a charity’s programmatic work are discussed in detail below.

Our Method

ACE characterizes promising programs as those that (i) target high-priority animal groups, (ii) work in high-priority countries, and/or (iii) use interventions that work toward high-priority outcomes. We used a version of the Scale, Tractability, and Neglectedness framework to score the impact potential of charities’ programmatic work in three categories: animal groups, countries, and interventions. Specifically, we calculated:

Animal-relative scores to assess the expected impact of targeting different animal groups relative to each other
Country-relative scores to assess the expected impact of working in different countries relative to each other
Intervention-relative scores to assess the expected impact of using different interventions relative to each other
Synergy scores to assess the expected impact of using specific combinations of animal groups, countries, and interventions together

For each animal group, country, and intervention, we assigned an impact potential score (IP score) on a 1–7 scale. For each animal group and intervention, we also assigned an uncertainty score on a 1–7 scale, which accounts for our team’s confidence in the IP score. We designate our overall impression as “high priority” when the IP score is equal to or greater than the median (of all animal group scores, country scores, and intervention scores, as applicable) and as “high uncertainty” when the uncertainty score is equal to or greater than the median (of all animal group scores, country scores, and intervention scores, as applicable). Scores below the median were categorized as “moderate priority” and “moderate uncertainty.”

We asked charities to estimate the percentage of their annual programmatic expenditures (i.e., non-overhead) allocated toward different categories of animal groups, countries, and interventions. The final IP score for each charity is the average of the four scores mentioned above (the animal-relative score, the country-relative score, the intervention-relative score, and the synergy score) weighted by those percentages. This final IP score represents ACE’s assessment of the impact potential of the charity’s collective programs without considering their specific program achievements. Similarly, we also arrived at a final uncertainty score for each charity.

We designate our overall impression of each charity’s impact potential as “high IP” when the final IP score is equal to or greater than the median (of all 2023 charities under evaluation) and as “high uncertainty” when the final uncertainty score is equal to or greater than the median (of all 2023 charities under evaluation). Scores below the median were categorized as “moderate IP” and “moderate uncertainty.”

Prioritizing animals

We used the Scale, Tractability, and Neglectedness (STN) framework to score 16 animal groups relative to each other. By using this framework, we aim to prioritize groups of animals who are affected on a large magnitude, where there appear to be tractable solutions to improve their situation, and whose situation is relatively neglected in animal advocacy.

For Scale, we used the following proxies:

Global animal population data.Most of the data for farmed animal groups was extracted from the Food and Agriculture Organization of the United Nations’ statistics website (FAOSTAT) and from other sources for other animal groups. We note every source in the scoring spreadsheet.
Welfare range composite score. This score is an average of three scores: Rethink Priorities’ Welfare Range Estimates, which were adjusted for ACE;² an egalitarian score;³ and the average of ACE’s team members’ welfare range scores. We selected one default animal species for scoring each animal group’s welfare range. We made sure that each default animal species we selected for each animal group was one of the eleven species included in Rethink Priorities’ welfare range project.
Typical welfare situation score. This score aims to represent the welfare situation each animal group typically experiences. This score is the average of our Programs team members’ individual scores for the typical welfare situation of each animal group. Factors we considered include the typical space per individual, typical lifespan vs. potential natural lifespan, and the main types of welfare issues.

Our Programs team members individually scored Tractability, defined as “our understanding of the problem and its solutions,” and Neglectedness, defined as “the amount of global funding going toward each animal group.” We then averaged Scale, Neglectedness, and Tractability scores to calculate an IP score for each animal group.

Additionally, we calculated an uncertainty score for each animal group, reflecting the standard deviation of team members’ scores.

For more details on how we currently prioritize animals, see the animal-relative scores spreadsheet.

Prioritizing countries

The countries and regions in which a charity operates can affect the impact potential of their work. In the case of farmed animal organizations,⁴ we used the STN framework to score 196 countries relative to each other.By using this framework, we aim to prioritizecountries with relatively large animal agriculture industries, few other charities engaged in similar work, and where animal advocacy is likely to be feasible and have a lasting impact. Additionally, we consider a country’s Global Influence as a fourth factor in prioritizing countries.

Our method of scoring countries was inspired by Mercy For Animals’ Farmed Animal Opportunity Index (FAOI), using most of their proxies for Scale, Tractability, and Global Influence. We also considered Neglectedness as a factor.

For Scale, we used the following proxies:

Current farmed animal population in the country. We extracted data on farmed animal populations from FAOSTAT.
Projected farmed animal population in 2050. These are based on the FAO’s projections for farmed fishes⁵ and land animals.⁶ We include these projections to account for the importance of advocacy in countries where animal agriculture will likely be widespread in the near future, even if it is relatively limited at present.

For Tractability, we used the following proxies:

Gross National Income per capita. This is the dollar value of a country’s annual income divided by its population, according to data from The World Bank.
Lack of corruption. The Corruption Perception Index by Transparency International ranks 180 countries and territories around the world by their perceived levels of public sector corruption.
Giving Index. The World Giving Index is an annual report published by the Charities Aid Foundation, using data gathered by Gallup. It ranks over 140 countries according to how charitable they are.
Innovation Index.The Global Innovation Index is an annual ranking of countries by their capacity for and success in innovation, published by the World Intellectual Property Organization.

For Global Influence, we used the following proxies:

Elcano Global Presence Index. This index is an annual measurement of the international influence and reach of 150 countries based on three factors: economic presence, military presence, and soft presence.
Meat trade. This is the aggregate number of tonnes of meat imported and exported, according to FAOSTAT’s Food Balances Portal.
Live trade. This is the aggregate number of live animals imported and exported, according to FAOSTAT’s Food Balances Portal.

For Neglectedness, we calculated the ratio of the human population to the number of farmed animal organizations in each country. Specifically, we compared our own data on the number of farmed animal organizations (excluding farmed animal sanctuaries) working in each country to the human population (in millions) of that country.

For more details on how we currently prioritize countries, see the country-relative scores spreadsheet.

Prioritizing interventions

We categorized the interventions animal advocacy charities use into 26 types and the main outcomes they work toward into eight types. Using the STN framework, we calculated different IP scores for 77 combinations of interventions and expected outcomes. For each combination, our Programs team members individually scored:

Scale in the short term, defined as the amount of animal suffering this intervention can relieve in the short (0–5 years) and medium term (5–100 years) relative to other interventions working toward the same outcome
Scale in the long term, defined as how much this intervention can create systemic change.
Tractability, defined as how easy it is for this intervention to cause each outcome.
Neglectedness, defined as the amount of funding going toward this intervention relative to other interventions.

Each team member assigned a percentage weight to the above four factors. We then averaged these scores and percentage weights to calculate an IP score for each combination of interventions and expected outcomes.

In line with our guiding principle of following empirical evidence and logical reasoning, we used existing research to inform our assessments and explain our thinking about the impact potential of different interventions. We compiled research about the effectiveness of each intervention using information from our research newsletter, research library, and research briefs.

Additionally, we calculated an uncertainty score for each combination of interventions and expected outcomes. These scores factor in the standard deviation of team members’ scores, the number of relevant publications on the intervention’s effectiveness, and the quality of those publications. We determined each publication’s quality based on its peer-review status, open-access availability, publication date (ideally 2020 or later), transparency in methodology, and whether it was a meta-analysis, meta-review, or systematic review (if applicable).

For more details on how we currently prioritize interventions, see the intervention-relative scores spreadsheet.

Assessing synergy

This year, we introduced synergy IP scores to account for the unique impact that a combination of factors can create. These scores represent the impact of a specific combination of factors (intervention, animal group, and country) and aim to capture a dimension of charities’ impact that is different from the one reflected in the animal-relative, country-relative, and intervention-relative scores. With the synergy scores we attempt to analyze charities’ work in a way that is, to some extent, sensitive to each charity’s particular context as well as consistent across charities.

We identified all different “animal-country” and “intervention-country” combinations for charities under evaluation in 2023 and produced the following scores:

Animal-country scores (if applicable). Our Programs team members scored the impact potential of helping a specific animal group in a specific country. To inform our scores, we considered the animal population (current and projected for 2050) in each country.
Intervention-country scores (if applicable). Our Programs team members scored the impact potential of using a specific intervention type in a specific country. To inform our scores, we considered specific tractability proxies depending on the intervention type.⁷

We then created a synergy IP score for each synergy combination (i.e., intervention-animal-country) by averaging the corresponding animal-country score and intervention-country score. For more details on how we scored synergy impacts, see the synergy scores spreadsheet.

Limitations of Our Method

In this criterion, we assess the impact potential (IP) of charities’ programs without considering their implementation or achievements. We address this limitation in the Cost Effectiveness criterion, where we analyze charities’ program achievements since 2022.

Due to the lack of available data for STN proxies and our limited capacity to gather data, we were unable to assess the effectiveness of interventions and animal groups based solely on data and without any subjective scoring by ACE team members, as we did for countries. The intervention-relative scores are based on ACE team members’ scores. In contrast, the animal-relative scores use a hybrid approach, incorporating ACE team member scores for tractability, neglectedness, and some scale proxies. For other scale proxies, we used animal population data and Rethink Priorities’ adjusted data on welfare ranges, each with its own limitations.⁸ Because we assigned numerical values to non-numerical data, we advise caution when interpreting results, as they can appear more objective than they are.

Due to limited capacity, we decided to estimate the welfare range of each animal group using only one species. Rethink Priorities recommended a more complex method that involves population-weighted averages of welfare ranges for categories of multiple animal species (e.g., wild animals). Although we did not use Rethink Priorities’ suggested method this year, we may implement a refined version of it in the future if we expect it would significantly improve the accuracy of our estimates.

We based the country-relative scores entirely on data, but unfortunately, not all the data we intended to include was available. Our original plan for this year was to use the amount of farmed animal advocacy funding per human population as a proxy for neglectedness in different countries. However, due to a lack of reliable data on the amount of farmed animal advocacy funding in each country, we instead used the number of farmed animal organizations in the country as a proxy. The data we used for the country-relative scores also has limitations, as they are estimations made by different institutions using various methods. Another limitation of our country-scoring method is that it only applies to farmed animal advocacy. We do not have country scores to evaluate programs that aim to support wild animals or other animal groups. In such cases, we scored countries as “n/a” and excluded the country factor from our analysis.

ACE team members used empirical evidence when available to score interventions and synergy impacts; however, evidence was inconsistent across interventions. For example, we found limited research on the effects of corporate litigation or funding.

We are particularly uncertain about the long-term impact of interventions, which is plausibly what matters most.⁹ The potential number of animals affected increases over time due to an accumulation of generations. Thus, we expect that the long-term impacts of an action will likely affect more animals than the short-term impacts of the same action. This year, we included some considerations of long-term impact in our assessment of each intervention type.However, we remain highly uncertain about the specific long-term effects of each intervention. Due to this uncertainty, our reasoning about the potential impact of each charity (including our diagrams) tends to emphasize short-term effects.

Cost Effectiveness: How much has the charity achieved through their programs?

With this criterion, we assess the effectiveness of a charity’s approach to implementing interventions, their achievements, and the costs associated with those achievements. Charities that perform well on this criterion likely use their available resources in a cost-effective manner. The key aspects that ACE considers when examining cost effectiveness are reviewed in detail below.

Our Method

We conducted our analysis by comparing a charity’s reported expenditures over 12 months to their reported achievements in each intervention category during that time. To simplify the reporting process for charities, we gave them the choice to report achievements for the last full calendar year or their organization’s last full fiscal year.

We asked charities to report up to 10 key achievements per intervention category alongside their estimated expenditures for each achievement.¹⁰

During our evaluation, we verified a subset of the charity’s reported achievements. Specifically, we identified the three intervention categories for which the charity reported the highest expenditures and selected up to three key claims per intervention category to verify.¹¹ We aimed to identify the claims with the highest reported expenditures, ideally by finding at least one independent source to confirm the claim. When we were unable to do so, we directed follow-up questions to the charity to verify their achievements.

We used a Weighted Factor Model¹² (WFM) approach to calculate the charity’s final cost-effectiveness score (see image below) based on their achievement scores. The achievement scores represent a combination of the intervention impact potential (IP) score and the implementation scorefor each key achievement.

For each key achievement, we assigned the respective intervention IP score. For more details on how we calculated intervention IP scores and prioritized interventions, see the Impact Potential criterion.
We also computed the achievement quantity, i.e., how much the charity accomplished per U.S. dollar or per $100,000.¹³ We applied discounts to the achievement quantities if a charity collaborated with other organizations, and in certain other cases (e.g., if a charity influenced funding rather than directly providing it, if they summarized and disseminated research rather than conducting original studies, or if a corporate campaign did not result in any commitments). We then normalized the achievement quantity against other achievements in the same intervention category and with the same unit and mapped it onto a 1–7 scale.
We then used a rubric to score the achievement quality (i.e., how well the charity had implemented the intervention). The rubric included the respective animal-relative IP score for the animal group targeted by the achievement, as well as intervention-specific factors. Where an achievement targeted multiple animal groups, we used the average animal score.¹⁴
We calculated a weighted average of the achievement quantity per dollar/$100k and the achievement quality to arrive at the implementation score.¹⁵
We then multiplied the intervention IP score by the implementation score and normalized and mapped the resulting score onto a 1–7 scale to arrive at the achievement score.¹⁶
The final cost-effectiveness score for each charity is the average of all of their achievement scores, weighted by their expenditures. This score indicates, on a 1–7 scale, how cost effective we think the charity has been during the reported year, with higher scores indicating higher cost effectiveness. Note that charities were benchmarked against other charities under evaluation rather than against all charities, and the standard of effectiveness in the charities we selected to evaluate is likely high. Low cost-effectiveness scores therefore don’t necessarily indicate low cost effectiveness in absolute terms.

Fig. 1: A model depicting the breakdown of a charity’s cost-effectiveness score

We also rated our level of uncertainty in each achievement score based on the intervention uncertainty score, the number of missing scores in the achievement scoring rubric,¹⁷ and whether we were able to successfully verify a charity’s achievements. Uncertainty was scored on a 1–7 scale, with higher scores indicating higher uncertainty. The final uncertainty score for each charity is the average of the achievement uncertainty scores, weighted by the relative achievement expenditures.¹⁸ We consider a score to represent “high uncertainty” when it is equal to or greater than the median of all charities under evaluation.

Limitations of Our Method

ACE used quantitative cost-effectiveness models until 2018. Between 2019 and 2021, we moved away from this approach due to concerns about the usefulness of purely quantitative models. Instead, we employed a qualitative analysis of charities’ cost effectiveness, which consisted of an assessment of whether we had concerns about the way a charity was using its funds. Since 2022, we have again been moving toward an approach that includes more quantitative information. For ACE’s 2023 evaluations, we decided not to use a model that fully quantifies the expected impact on animals for the following reasons:

The charities we evaluate employ a wide range of interventions, and we lack the empirical research for most interventions that would enable us to make informed, quantified estimates of their impact on animals. The resulting uncertainties would be significant, limiting the estimates’ usefulness.¹⁹
Fully quantified cost-effectiveness estimates often ignore important factors that are hard to quantify and are sometimes arbitrary in what they include and exclude.
Interventions are interdependent and their effectiveness is context-sensitive, so there might not be a clear answer as to which intervention is the most effective.
Improving on these limitations may lead to greater complexity, making it harder for the public to understand and critically appraise the model.

To include as much quantifiable information as possible this year, we opted for a Weighted Factor Model²⁰ (WFM) approach, which can be useful when limited hard data is available. This approach allows us to combine objective (hard) factors (e.g., the number of downloads of an educational resource) and subjective (soft) factors (e.g., how evidence-based the content of the resource is) when evaluating a charity’s achievements. Because factors and factor weights are standardized, this approach also ensures consistency in evaluation across charities. However, there are several noteworthy limitations.

Given that most factors and factor weights are set in advance, a WFM can have limited flexibility.²¹ Additionally, because we assign numerical values to non-numerical data, the data can be misinterpreted as more objective than it is. Therefore, the results of a WFM need to be interpreted with caution.

Further, due to limited team capacity, we could not independently verify every achievement charities reported to us. Instead, we followed a protocol where we aimed to verify one to three achievements in each of the top three intervention categories for which charities reported the highest expenditures. In some cases, we were either partially or not able to independently verify the achievements that were reported. In these cases, we sent follow-up questions to the charities to ask them for evidence of the achievement. While we made every effort to independently verify all the achievements we selected using our verification protocol, there were some achievements we were not able to verify. This increased our uncertainty score for these achievements.

To allow for comparison across charities, we evaluate them based on a standardized 12-month time period. However, this limited timeframe may not accurately reflect a charity’s long-term cost effectiveness. Some accomplishments that are reported to us might seem more cost-effective than they actually are if a significant portion of the associated expenses occurred before or after the assessed timeframe. On the other hand, for a charity that focuses on long-term systemic change, the 12-month time period may not capture the full impact of their achievements. Complex issues like policy reform or behavior change often require sustained efforts over multiple years to yield significant results. Therefore, assessing cost effectiveness within this limited timeframe does not fully reflect the long-term potential and cumulative impact of the charity’s work.

Additionally, it’s important to note that charities could report achievements based on the last full calendar year or the last full fiscal year, depending on their financial reporting practices. While this flexibility allowed charities to align their reporting with their financial cycles, it means that we did not evaluate the exact same time period for each charity. Consequently, although there is significant overlap between the charities’ selected reporting periods, some caution should be exercised when directly comparing charities.

Generally, assessing cost effectiveness by considering a charity’s key achievements has inherent limitations. It could bias cost-effectiveness estimates upward because it tends to disregard work that did not result in an achievement. To help mitigate this, we asked charities to report achievements and expenditures that cover at least 90% of their overall program expenditures.

Room For More Funding: How much additional money can the charity effectively use in the next two years?

A recommendation from ACE could lead to a large increase in a charity’s funding. With this criterion, we investigate whether a charity would be able to absorb the funding that a new or renewed recommendation may bring, and the extent to which we believe that their future uses of funding will be as effective as their past work.

Our Method

We begin our room for more funding (RFMF) assessment by inspecting the charity’s revenue and plans for expansion through 2025, assuming that their ACE recommendation status and the amount of ACE-influenced funding they receive will stay the same. We then outline how the charity would likely expand if they were to receive funds beyond their predicted income and use this information to calculate their RFMF.

Plans for Expansion

To estimate charities’ RFMF, we request their financial and staffing records from 2020 onwards and ask them to predict their revenue and staff size through 2025. We ask them to report how their projections are allocated across different interventions, animal groups, and countries. We then assess our overall level of uncertainty in the charity’s projected revenue, expenditures, and hiring plans on a scale of 1–7, with higher scores indicating higher uncertainty. This assessment is based on factors such as sustainability of growth,²² alignment with projections from our previous evaluation (if applicable), and our uncertainty in charities’ ability to find and train planned new hires in the projected time frame based on our understanding of the talent landscape.²³

Our focus is to determine whether additional resources will likely be used for programs with high impact potential or other beneficial organizational changes. The latter may include investments into infrastructure and staff retention, both of which we think are important for sustainable growth.

Unexpected Funding

We ask charities to indicate how they would spend additional, unexpected funding that an ACE recommendation may bring. Because previously recommended charities tend to receive more ACE-influenced funding over time, we also ask those charities to specify how they would use additional funding.

We then assess our level of uncertainty in the effectiveness of the charity’s plans in 2024 and 2025 and estimate their RFMF for those years. To do this, we assign an uncertainty score on a 1–7 scale, with higher scores indicating higher uncertainty, based on questions such as the following for each plan:

How uncertain are we that the charity’s plans will make as effective use of the funding as their previous expenditures?
Is there a percentage threshold of the charity’s proposed plans beyond which the additional funding is not used as effectively?
Are there nonfinancial barriers that may impact the charity’s ability to carry out their plans?

We use these uncertainty scores and the charity’s revenue and expenditure trajectory to define two RFMF dollar estimates that represent the amount beyond the charity’s projected revenue in 2024 and 2025 that we believe they could effectively use. If the charity has plans for a large amount of unexpected funding that are likely to be as effective as their past work, they will receive higher RFMF estimates.

Reserves

We may adjust RFMF based on the status of a charity’s reserves. It is common practice for charities to hold more funds than needed for their current expenses to be able to withstand changes in the business cycle or external shocks that may affect their incoming revenue.²⁴ Such additional funds can also serve as investments in future projects. Thus, it can be effective to provide a charity with additional funding to secure the organization’s stability and/or provide funding for larger projects in the future. Therefore, we increase a charity’s RFMF if they are below their targeted amount of reserves. If a target does not exist, we suggest that charities hold reserves equal to at least one year of expenditures.

Revenue Diversity

The charities we evaluate typically receive revenue from a variety of sources, such as individual donations and grants from foundations.²⁵ A review of the literature on nonprofit finance suggests that revenue diversity may be positively associated with revenue predictability if the sources of income are largely uncorrelated.²⁶ However, there is evidence that revenue diversity may not always be associated with financial stability.²⁷ Therefore, although revenue diversity does not play a direct role in our recommendation decision, we indicate charities’ major sources of income in this criterion for donors interested in financial stability.

Limitations of Our Method

Because we cannot predict exactly how much charities will fundraise in the future and how their plans for expansion will unfold, our estimates are speculative—not definitive. For instance, a charity could lose a major funder or discover a way to use additional funding that they did not anticipate, in which case our estimates would be too low. Conversely, they could fail to hire an employee with the necessary skills or experience to enable an expansion, in which case our estimates would be too high.

Our RFMF estimates are intended to identify the point in time at which we would want to check in with a charity to ensure that they have used their funds effectively and can still absorb additional funding. Therefore, we check in with our recommended charities twice a year leading up to our Recommended Charity Fund distributions to update our sense of their RFMF.

Finally, because we assign numerical values to non-numerical assessments of uncertainty, the data can be misinterpreted as more objective than it is.

Organizational Health: Are there any management issues substantial enough to affect the charity’s effectiveness and stability?

With this criterion,²⁸ we assess whether any aspects of an organization’s leadership or workplace culture pose a risk to its effectiveness or stability, thereby reducing its potential to help animals. Problems with leadership and workplace culture could also negatively affect the reputation of the broader animal advocacy movement, as well as employees’ wellbeing and their willingness to remain in the movement. For example:

Schyns & Schilling (2013) report that poor leadership practices result in counterproductive employee behavior, stress, negative attitudes toward the entire company, lower job satisfaction, and higher intention to quit.
Waldman et al. (2012) report that effective leadership predicts lower turnover and reduced intention to quit.
Wang (2021) reports that organizational commitment among nonprofit employees is positively related to engaged leadership, community engagement effort, the degree of formalization in daily operations, and perceived intangible support for employees.
Gorski et al. (2018) report that all of the activists they interviewed attributed their burnout in part to negative organizational and movement cultures, including a culture of martyrdom, exhaustion/overwork, the taboo of discussing burnout, and financial strain.
A meta-analysis by Harter et al. (2002) indicates that employee satisfaction and engagement are correlated with reduced employee turnover and accidents and increased customer satisfaction, productivity, and profit.

Our Method

We review aspects of organizational health by examining information provided by top leadership staff and by capturing staff perspectives via our engagement survey. We also distribute the survey to volunteers working at least five hours per week.

Poor performance on a single aspect of the organizational health assessment does not automatically lead to an unfavorable overall outcome. Instead, we seek to assess organizational health from multiple perspectives to arrive at the most appropriate conclusion within the time available based on the information we have. This includes our follow-up conversations with the charity’s leadership.

People policies and processes

We ask charities to report their people policies.We provide a checklist of policies that we believe are strong indicators of organizational health and ask charities to indicate which of these they have implemented.Policies are grouped into the following categories:

Compensation
Workplace safety
Organizational design and communication
Performance and hiring assessments
Learning and development
Workplace conditions
Representation, equity, and inclusion

We do not assess which policies from our list are the most important. As we gather aggregated information about the charities we evaluate, we hope to build up a clearer picture over time of which policies are the strongest indicators of organizational health in different contexts. As noted above, we take into account that many charities work in contexts (e.g., geographical regions) where these policies may not be common practice.

A safe and inclusive working environment is likely to deliver significant benefits not only for advocates, but also for the effectiveness and stability of organizations and the broader animal advocacy movement.²⁹ This is why we collect information about policies and activities regarding representation, equity, and inclusion. We use the term “representation” broadly to refer to the diversity of certain social characteristics (called “protected classes” in some countries).³⁰ Additionally, charities should generally have human resources policies against harassment³¹ and discrimination³² and ensure that cases of harassment and discrimination in the workplace³³ are addressed appropriately.

We also ask about the internal accessibility of people policies and processes, i.e., which policies are shared with employees at the organization and in what way. This is because written policies have little use without employees knowing that they exist, understanding them, and believing that the organization enforces them. For example, it is important for employees to understand their entitlement to sick days and how to submit internal reports of harassment and discrimination.

Governance and accountability

We ask charities to report whether they have basic governance policies and processes in place, including: an anti-retaliation policy protecting whistleblowers and those who report grievances, a Conflict of Interest policy, a policy setting out procedures for the storage and destruction of documents, and a process for documenting minutes of board and board committee meetings.

We also consider leadership’s commitment to transparency.³⁴ Firstly, we require organizations selected for evaluation to be transparent with ACE throughout the evaluation process. This is essential for us to be confident that we have the necessary information to carry out a full, well-informed evaluation. Secondly, we consider organizations’ public-facing transparency by asking charities to report what information they have available on their website, such as key staff members and financial information. Although we value such transparency, we recognize that some organizations may be able to have a greater impact by keeping certain information private. For example, organizations and individuals working in some regions or on particular interventions could be harmed by publicizing certain information about their work. We seek to understand where this is the case based on conversations with the charity’s leadership.

Leadership and governance

First, we consider key information about the composition of leadership staff and the board of directors. There appears to be no firm consensus in the literature on the specifics of the relationship between board composition and organizational performance.³⁵ However, BoardSource (a 501(c)(3) organization that provides the most reliable research we have found on nonprofit board leadership) recommends that, if the law permits, the Executive Director (ED) or equivalent should be an “ex officio, non-voting member of the board.”³⁶ In this way, the ED can provide input in board meeting deliberation and decision-making while avoiding perceived conflicts of interest, questions concerning accountability, or blurring the line between oversight and execution. We also ask if there have been any recent transitions in leadership and what measures were taken to ensure this transition happened smoothly.

We ask about the board’s membership and functions, so we can understand to what extent these align with the Compliance Practices set out in BoardSource’s Recommended Governance Practices. For example, we ask how often the board meets, how its performance is evaluated, and what term limits are in place for board members.

Our engagement survey also asks staff to identify the extent to which they feel that leadership competently guides the organization, both as an indicator of leadership competence and staff engagement. The questions we ask are based on cross-cultural research by Culture Amp and Google’s Project Oxygen.

Finally, we have specific considerations for charities that work internationally. In some international organizations, there can be a particular risk of power imbalances between offices based in different countries, and especially between offices based in the Global North and the Global South. These power imbalances—e.g., resulting from differences in the level of autonomy and decision-making power granted to different regional offices, or from inequalities in the level of funding received by Global North and Global South offices—can occur within the same organization or between organizations working together. We think it is important that charities’ leadership create opportunities for all of their subsidiaries to influence decision-making at the international level.³⁷ We ask leadership to elaborate on their approach to international expansion and report the measures they take to address potential power imbalances.

Staff engagement and satisfaction

We solicit staff and volunteer perspectives via our engagement survey. We developed this survey in collaboration with organizational consultants Scarlet Spark. To help ensure that our questions were reliable predictors of organizational health, we based them where possible on recognized frameworks such as the cross-culturally validated Gallup Q12 Employee Engagement Survey, the Maslach Burnout Inventory, Google’s Project Oxygen, and cross-cultural research by Culture Amp.

We require at least 65% of the charity’s staff to respond to the survey to ensure that we have a representative sample of responses. There is no participation threshold for volunteers, recognizing that most organizations do not have a fixed number of volunteers as their participation tends to fluctuate.

If a charity scores particularly low on any aspect of staff engagement, we follow up on these factors with the charity’s leadership to hear their perspective and understand any relevant context. We only share aggregated organizational-level data with leadership and do not share individual survey responses or other confidential information. ACE may recommend that the charity address any outstanding concerns, for example, by:

Conducting a comprehensive staff survey to assess employee engagement, satisfaction, and areas for improvement.
Establishing regular channels for communication and feedback, such as open-door policies, suggestion boxes, or anonymous reporting mechanisms.
Developing professional development opportunities and career advancement pathways for staff.
Seeking external expertise on how to improve staff morale.
If low staff morale is being caused by a specific person, carrying out a performance review with that person and agreeing on the specific ways in which their behavior needs to change, including a timeline by which changes must happen.

Our engagement survey contains questions based on the 12 statements from the Gallup Q12 Employee Engagement Survey, with staff requested to rate each statement on a scale from 1 (No, I strongly disagree) to 5 (Yes, I strongly agree). Where possible, we avoided making adjustments to standard assessments since these questions have been validated with large, cross-cultural samples of participants. However, we made minor amendments to some statements in the original Gallup survey that charities have found unclear in the past. We consider an average engagement score of below 3 (the scale midpoint) to warrant follow-up with the charity’s leadership.

In addition to the engagement questions based on the Gallup Q12 Employee Engagement Survey, we ask questions designed to elicit information about the risk of burnout among staff, the level of psychological safety at the organization, and organizational stability. We designed these questions together with Scarlet Spark, based largely on the frameworks mentioned above (the Gallup Q12 Employee Engagement Survey, the Maslach Burnout Inventory, Google’s Project Oxygen, and cross-cultural research by Culture Amp). We also ask all staff about wage satisfaction since this can serve as an indicator of retention.³⁸

We ask volunteers an alternative set of questions specifically designed to assess volunteer engagement and satisfaction. These questions are similar to the ones in the employee engagement survey but with added dimensions to understand whether volunteers feel they are making a difference, whether their workload is fair, the ease of volunteering for the organization, and their pride in volunteering at the organization.

Harassment and Discrimination

The engagement survey contains a link to an anonymous Whistleblower Form,³⁹ developed with support from legal experts at Animal Defense Partnership, for any staff or volunteers who wish to report issues of harassment and discrimination. In most cases where we decide to take action based on such reports, this consists of sharing relevant non-confidential information with the leadership of the organization in question and hearing their perspective. We do this to improve our understanding of what happened, whether the leadership members were aware, and what measures they took or plan to take, if relevant. We then factor this information into our overall Organizational Health assessment.

Limitations of our method

While we strive to continually improve our assessment of charities’ organizational health, we recognize that several limitations remain.

Firstly, we are currently unable to fully investigate harassment and discrimination claims due to a combination of time constraints, lack of expertise, and the often anonymous nature of the reports that we receive. We recognize that this may cause frustration among charities that we evaluate, especially when we are unable to share specific details about these claims for reasons of confidentiality.

This year, we have sought to improve the channel for people to submit such reports, linking to the more comprehensive Whistleblower Form co-developed with Animal Defense Partnership, rather than asking about harassment and discrimination in the engagement survey directly. We hope this helps ensure that claimants understand the implications of providing such information, improve the comprehensiveness of any such information that we receive, better enable us to follow up with claimants, and better identify the level of detail we are able to share with the leadership of the charity in question. At the same time, we recognize that requiring claimants to fill out a separate, more comprehensive form may reduce the number of reports that we receive.

Secondly, our engagement survey only provides a limited window into a charity’s workplace culture and may not fully represent the broad range of experiences within the organization. In particular, we recognize that surveying staff and volunteers can lead to inaccuracies due to selection bias and also may not reflect employees’ true opinions, as respondents are aware that their answers could influence ACE’s evaluation of their employer. We also recognize that our assessment represents a snapshot at a point in time and may not fully capture ongoing cultural shifts within an organization.

This year, we have included a wider range of questions in the survey and collaborated with the organizational consultant Scarlet Spark to help ensure these questions are likely to be effective predictors of organizational stability and effectiveness. As in previous years, we do not rely solely on the results of the engagement survey to make our assessment. Rather, we assess organizational health from multiple perspectives to arrive at the most appropriate decision within the time available based on all the information we have, including our follow-up conversations with the charity’s leadership.

Thirdly, there is no universally agreed-upon “best practice” for organizational leadership and culture. With a wide range of frameworks, models, and approaches available, it can be challenging to establish a singular standard for evaluation, which may lead to a variety of interpretations and expectations among charities.

As mentioned, this year, we developed our organizational health assessment in collaboration with organizational consultants Scarlet Spark to help ensure we are using the most relevant research. Where possible, we used recognized frameworks such as the cross-culturally validated Gallup Q12 Employee Engagement Survey, the Maslach Burnout Inventory, Google’s Project Oxygen, and cross-cultural research by Culture Amp. As in previous years, we also seek to gather input both from the charity’s leadership and non-leadership staff so that we can understand any issues from multiple perspectives.

Lastly, our assessment may be biased toward certain Western workplace practices. As a U.S.-based organization with staff based predominantly in the U.S. and Western Europe, our understanding of best-practice organizational health is inevitably skewed toward the cultures with which we are most familiar.

We seek to recognize this bias at all stages of the assessment and to continually learn from the charities that we evaluate, rather than imposing a ’one size fits all’ approach onto each charity’s unique situation. For example, we recognize that not all of the policies and processes that we ask charity leadership about will be common or relevant in all countries and situations. Where there are indications that important policies and processes may be lacking, we follow up with the charity to gain a better understanding of the context. Particularly if the charity is based outside of the U.S., we are also eager to learn of additional policies they may have that they find to be important contributors to their effectiveness. In this way, we hope that this exercise can be mutually informative for ACE and for the charities that we evaluate.

This year, we also modified our engagement survey questions to reduce their focus on Western cultures and piloted the questions with charities from different global regions to help ensure this was successful. We will continue to explore how best to improve the applicability of our assessment across all national contexts, using evidence from the countries where our evaluated charities are based.

If you have any questions about ACE’s evaluation methods, please contact our Charity Evaluations Manager, Vincent Mak: vincent.mak@animalcharityevaluators.org.

This criterion was called Programs from 2020 to 2022.We decided to rename it Impact Potential to better reflect its focus on assessing the effectiveness of charities’ programs without considering their implementation. This name is more specific and less confusing internally, especially since we recently changed the name of our research team to the Programs team.
Rethink Priorities adjusted their welfare range estimates for use in ACE’s evaluations. Because ACE compares animal charities with each other rather than with human charities, Rethink Priorities reindexed the ranges to pigs instead of humans—see this page for more information.
An “egalitarian” score is a score of 1 that we assign to each animal group to represent the view that all animal groups have equal welfare range or probability of sentience.
The framework we used to prioritize countries only applies to farmed animal advocacy.We have not developed a framework to prioritize wild animal welfare work because there are very few organizations that work on wild animal welfare, and those we have considered so far are focused on indirect work such as research and academic development, which is less country-specific.
FAO (2022)
FAO (2018)
For example, when scoring the intervention category “apps and other digital resources,” we considered the following tractability proxies: the Global Innovation Index, Education (mean years of schooling), and Internet Penetration rate.
For more information on the limitations of FAOSTAT data, see Šimčikas (2019). For more information on Rethink Priorities welfare ranges project, see Fischer (2020).
For arguments supporting the view that the most important consideration of our present actions should be their long-term impact, see Greaves & MacAskill (2019) and Beckstead (2019).
We asked that reported achievements and associated expenditures amount to at least 90% of a charity’s total program expenditures during the reporting period. We also adjusted achievement expenditures by taking the charity’s reported expenditures and adding a portion of their non-programmatic expenditures (i.e., overhead or administration). This process allowed us to incorporate general organizational running costs into our consideration of cost effectiveness.
We selected key claims from the achievements with the highest expenditures, given that those achievements contribute most to the cost-effectiveness score.
For more information about Weighted Factor Models, see Charity Entrepreneurship (2019).
We standardized this unit to achievements per one U.S. dollar or per $100,000, depending on which was easier to interpret, to allow for comparison across achievements. For example, we calculated how many individuals a social media campaign reached per dollar spent or how many legal actions a charity filed per $100,000 spent. For some intervention categories, the number of achievements was too low to normalize the achievement quantity. In these cases, we used the average of two researchers’ subjective assessment of the quantity on a 1–7 scale.
Two researchers scored each achievement on the rubric and discussed significant disagreements before a second round of revising scores. We averaged the two researchers’ scores for each factor. Where we did not have enough information to score an achievement, we set the corresponding factor weight to zero.
We defaulted to giving achievement quality 75% and achievement quantity 25% weight. In some cases, e.g., if we were particularly uncertain about the achievement quantity, we gave achievement quality a higher weight.
By using a multiplicative method, we avoid giving high scores to achievements that implement promising interventions poorly (i.e., high intervention score but low implementation score). Consider the example where a charity focuses on an intervention like cage-free campaigns, which has the potential to be highly impactful, but fails to achieve any significant commitments. With a weighted average approach, the charity would still receive a relatively high score despite an unsuccessful implementation of their campaigns. However, by using a multiplicative method, the overall score accounts for the interaction between intervention and implementation scores. This means that if the implementation quality is lacking, the overall score will appropriately reflect that.
We encouraged charities to give as much information as possible about each achievement. In order to protect their capacity, we also marked some questions as optional. Where we did not have the relevant information to score an achievement on a factor in the scoring rubric, this increased our uncertainty score for that achievement.
We increased the uncertainty score for charities that reported fewer than 10 achievements to account for the fact that measurement errors and uncertainties have a higher impact on the final score when fewer achievements are averaged.
For interested readers, we compiled a list of existing quantified cost-effectiveness estimates for animal advocacy interventions here.
For more information about Weighted Factor Models, see Charity Entrepreneurship (2019).
Some factors and factor weights were adjusted slightly after receiving the data from charities.
Sustainability of growth is based on historical revenue/expenditures and other context-dependent variables, such as nonfinancial barriers to the scalability of their programs (e.g., time).
Animal Advocacy Careers (2022)
National Council of Nonprofits; Propel Nonprofits; Boland & Freedman (2021)
To be selected for evaluation, we require that a charity has a budget size of at least about $100,000 and faces no country-specific regulatory barriers to receiving money from ACE.
Carroll (2009)
Hung & Hager (2018)
This criterion was called Leadership and Culture from 2020 to 2022. We found that ‘leadership’ was often misunderstood as referring solely to the qualities of individual leaders and that ‘culture’ was understood in very different ways across countries and demographics. With the new name Organizational Health, we intend to highlight the broad focus of this criterion and to clarify that its goal is to identify any significant risks to the organization’s effectiveness and stability.
For example, in a study by Anderson (2020), 49% of paid animal advocates and 28% of unpaid animal advocates reported having experienced discrimination or harassment. Advocates who were members of a minoritized group (i.e., people of color, people with disabilities, and LGBTQ+ people) were significantly more likely to leave the movement as a result of discrimination than non-minoritized advocates.
Examples of such social characteristics include: race, color, ethnicity, religion, sex, gender or gender expression, sexual orientation, pregnancy or parental status, marital status, national origin, citizenship, amnesty, veteran status, political beliefs, age, ability, and genetic information.
ACE defines “harassment” as bullying, intimidation, and other behavior (whether physical, verbal, or nonverbal) that has the effect of upsetting, demeaning, humiliating, intimidating, or threatening an individual. Sexual harassment includes unwelcome sexual advances, requests for sexual favors, and other verbal or physical harassment of a sexual nature.
ACE defines “discrimination” as the unjust or prejudicial treatment of or hostility toward an individual on the basis of certain social characteristics.
ACE defines the “workplace” as any place where work-related activities occur, including physical premises, meetings, conferences, training sessions, transit, social functions, and electronic communication (such as email, chat, text, phone calls, and virtual meetings).
Charity Navigator defines transparency as “an obligation or willingness by a charity to publish and make available critical data about the organization.”
Du Bois et al. (2007)
BoardSource (2016), p. 4
Anheier (2005), p.370. More broadly, a review by Greer et al. (2017) maintains that teams with a high degree of power dispersion (meaning high power concentration vs. balanced distribution) have poorer outcomes and more unproductive conflict.
For example, see Mitchell et al. (2001).
The publicly accessible version of this form can be found via ACE’s Third-Party Whistleblower Policy on our website.