Surveys are one tool that Researchers use to collect information or data about respondents’ attitudes and behaviors. Surveys are useful because they allow Researchers to collect information that is not otherwise easily observable. Designing a survey requires specifying who will be surveyed, when they will be surveyed, how they will be approached, how data will be analyzed, and other procedural issues.
The design of a survey will be determined primarily by the purpose of the study. It is often easiest to think of the purpose of the survey in terms of a list of outcomes that it should measure. Examples include: “How many vegetarians or vegans are we creating with this activity?” or “Do our newsletter subscriptions increase if we include a health argument in our new video?” If there are many outcomes that the survey should measure, it is helpful to rank them in order of priority. The survey should be designed to measure the most important outcomes as accurately as possible, and other outcomes can be measured as space and time allows.
To collect meaningful data, Researchers should survey people who are representative of the group or groups of people they are trying to influence or whose opinion they care about. For practical reasons, it is common to initially make the pool of potential participants those who are easiest to reach, but it is better to use a pool that is representative of everyone we are trying to influence or whose opinion we care about. An important drawback of surveys is that they rely on self-report measures (i.e. measures that rely on respondents to report their own attitudes or behaviors). Self-report measures are subject to a number of limitations, including memory and social desirability biases, that may affect the accuracy of the results. As much as possible, Researchers should find a way to directly observe respondents’ behaviors. For example, Researchers might team up with grocery stores, restaurants, or college dining halls to gain access to objective measures (e.g. sales figures) of consumption. Another potentially promising option could be to work with Nielsen or similar services, but this can be costly and/or logistically complex.
Although surveys are not a perfect data collection tool, they are nonetheless useful. For example, Researchers who want to answer questions about the impact of a pay-per-view intervention might consider conducting an experiment where one group of participants receives a treatment (e.g. watches a video about factory farming) while another group (i.e. a control group) watches a different video about an unrelated topic. The Researchers would use a survey to ask participants from both groups how often they consume animal products, and use statistical analysis to assess whether the video caused a difference between groups. Researchers who want to answer questions about the long term impacts of this pay-per-view program would consider conducting multiple surveys over time using the same group of participants. This is called longitudinal research. Because very little longitudinal research has been done on vegetarianism and related issues, conducting this sort of study would be especially helpful to the movement as a whole.
A vital part of designing a survey is developing the questionnaire. Here is a checklist of questionnaire development steps:
- Determine which outcomes the questionnaire will measure.
- Determine question content and write the questions.
- Determine question sequence.
- Determine response options and format.
- Write any necessary instructions for completing the questionnaire.
- Write a consent form.
- Review the questionnaire and check for possible sources of survey error.
- Send the questionnaire to colleagues or experts for feedback and make any necessary revisions.
- Consider pilot-testing the questionnaire and make any necessary revisions.
Below is a more thorough discussion of a few key aspects of survey design:
- Analysis Plan
- Question Selection and Order
- Addressing Sources of Error
- Pilot Testing
Many aspects of the procedure used in conducting a study can affect its findings. For example, whether a study uses a survey that is given in person, by telephone, or online is often determined by logistical considerations. All these methods will reach slightly different populations. Respondents may be more frank in an online survey or automated telephone survey, and less frank when a real person is surveying them. For some involved questions, a real surveyor can provide cues and clarifications that allow respondents to provide more accurate answers. Also, the medium of the survey can affect the response rate, as in-person surveys (especially of captive audiences in lines or in classes) have higher response rates than surveys online that respondents can delay or can decline to participate in without disappointing a visible person.1
One benefit of conducting surveys online is that some online survey tools offer smart features. For example, some tools allow you determine which questions to ask each respondent based on that respondent’s previous answers. You can use these tools to elicit different information from different demographics or to filter certain demographics out of the survey altogether.
The timing of the survey can also affect the accuracy of responses. People are most accurate when answering questions about the present or the recent past, not the future or the distant past.2 Therefore, a study seeking to understand changes in behavior over the course of a year would ideally survey the same participants at the start and end of the year, to reduce the need for accurate recall of the distant past. If only one survey could be conducted, preferably it would be at the end of the year, since people are better able to recall the past than to predict the future.3
It may be a good idea to provide respondents with a consent form that includes important information about the survey. Researchers working within institutions in the United States may be required to provide such a form in order to receive approval from their institution’s ethics review board. The precise requirements vary from institution to institution, but consent forms generally should include an explanation of the benefits and risks of participating in the survey, an indication that participation is voluntary, and a guarantee of anonymity. When no formal consent form is used, Researchers should still use ethical research practices.
The analysis phase of a study is the phase after data collection, when the results are used to answer the original questions of the study. However, it is useful to consider the analysis plan before collecting any data, because this can help to clarify the requirements for the survey design. There are two main study purposes, and they call for different analysis styles.
An exploratory study seeks to discover completely new information. For instance, if an organization has no idea what aspects of their leaflets are most interesting to readers, they might conduct an exploratory study to get some ideas about how readers perceive the leaflets. In this case, many of the questions asked would be open-ended, and the data analysis would also be open-ended. An analysis procedure might not be specified ahead of time and typically a large number of possible findings would be considered. The result is that any relationships which appeared significant would still have to be considered somewhat tentative until confirmed with further information. Often, qualitative research methods are useful in this situation: free response questions and focus group discussions are hard to tabulate, but allow responses which might not be possible to collect through multiple choice questionnaires.
Instead of seeking to discover completely new information, a study could begin with a hypothesis (perhaps generated in a previous exploratory study) that it seeks to test. In this case, it is important not to run an open-ended data analysis. For instance, if a study intended to show that students who heard a humane education lecture were more likely to go vegetarian than students who didn’t, it would not be appropriate to test the effects separately for every possible major the student could have. Instead, tests should be planned only for groups believed likely to behave differently for specific theoretical reasons– for instance, students in a school of agriculture might be analyzed separately from students in a school of engineering, since they would likely have different amounts of background knowledge of the facts presented. When the number and kind of tests is carefully considered, the results of a study can be taken with more confidence than when an analysis is performed on every possible combination of responses and characteristics.4
Questions should be selected that will address the goals of the study. Where possible, using questions identical to those that have been asked in previous surveys, such as those in our Question Bank, will simplify comparison to those results. However, for some purposes, it will be necessary to modify existing questions or write new ones.
Questions asking respondents directly whether they are vegetarian or vegan should be avoided or treated as bearing on the respondents’ beliefs rather than their behaviors, due to very high rates of misreporting. In some surveys, over half of self-reported vegetarians also reported having eaten meat on at least one of two specific days.5 We believe that measuring dietary change is the most important way to assess veg advocacy programs and so specific dietary questions may need to be privileged over other questions, which may mean asking fewer complementary questions if respondent fatigue is thought to be an issue. A substantial dietary assessment will often be more useful for evaluating program success than a survey of comparable length that touches on a variety of questions.
Questions where respondents choose a single answer from a list are usually the easiest to analyze. Questions where respondents choose all relevant answers can be ambiguous, since there is no means of determining which answer was most relevant. Questions where respondents rank answers in order of importance can provide more information during analysis, though they can be more difficult to analyze. Questions where respondents write in their own answers are harder to analyze and should be used mostly for exploratory purposes.
To avoid indicating a desired response, scale questions should have equally many response options on either side of the neutral answer, if applicable. For instance, the scale Disagree, No Opinion, Agree Somewhat, Strongly Agree is unbalanced and should be corrected to Strongly Disagree, Disagree Somewhat, No Opinion, Agree Somewhat, Strongly Agree.6
The order and mode of delivery of response options can influence responses. Two common response order effects are called “primacy effects” and “recency effects.” Primacy effects occur when respondents are most likely to select response options presented early in a list. For example, respondents who feel rushed while completing a written survey may not read the full list of response options carefully; rather, they may choose the first reasonable option and move to the next question. Recency effects occur when respondents are most likely to select response options presented later in a list. For example, respondents who are read a long list of options in an aural survey may only remember the most recent ones.7 Presenting fewer response options may reduce primacy and recency effects. Response order effects can also be controlled for by randomizing the order of response options, when appropriate.
Question order can also affect response rate and responses. In general, neutral questions and very important questions should be placed early in the survey, so that respondents answer them before getting tired. Questions that might influence answers to other questions should be asked later. For instance, a question about whether the respondent has seen a video about factory farming should come after questions about their diet and attitudes towards animals, because being reminded of the video might affect their answers to other questions. Finally, demographic questions come last, especially potentially sensitive questions about topics like race, education, and household income, so that respondents who don’t want to answer these questions have already answered the other questions. Responses to demographic questions should not be required.
When designing a survey, it is important to keep in mind possible sources of error. In this section, we discuss four kinds of error and how to address them. We focus on reducing social desirability bias, which seems especially likely to lead to error in advocacy research.8
Sampling error results from observing a sample of the population of interest rather than the entire population of interest. Observing the entire population would eliminate sampling error, but it is rarely feasible. Sampling error can be reduced by increasing sample size.
Coverage error results from systematically excluding parts of the population of interest from the sample. For example, if a Researcher attempts to study vegans by approaching customers at vegan restaurants, coverage error might arise from the fact that vegans who only cook at home would be systematically excluded from the survey. Coverage error can be addressed by ensuring that respondents are randomly selected and likely to be representative of the population of interest.
Nonresponse error arises when some respondents fail to provide data. There are two kinds of nonresponse: item nonresponse and unit nonresponse. Item nonresponse error arises when some respondents provide incomplete data. For example, consider a survey on the effects of an undercover investigation video on respondents’ attitudes. Upon viewing the video, some respondents might be upset and refuse to complete the survey. Unit nonresponse error arises when some respondents fail to provide any data at all. For example, consider a survey intended to measure the mean length of time that former vegetarians abstained from eating meat. People who were vegetarian for a very short time might decline to take the survey, which would cause the survey’s sample mean to be higher than the population mean. Nonresponse is particularly problematic if the respondents who fail to provide data differ systematically from those who do provide data. There are several ways to reduce nonresponse error, including limiting the length of the questionnaire, writing questions that are easy to answer, and sending respondents reminders to complete the survey. In-person and telephone interviews may have higher rates of response than online questionnaires.
Measurement error occurs when survey responses or measurements are inaccurate. There are many factors that can contribute to inaccurate responses. For example, respondents may not understand the question, they may not recall the information necessary to answer the question, or they may not be willing to answer the question truthfully. One factor that seems especially likely to contribute to measurement error in advocacy research is social desirability bias.
Social desirability bias is a serious concern with studies conducted by animal advocacy organizations. Survey results may come back with unreasonably high rates of success, due to a combination of respondents incorrectly identifying as vegetarian, higher rates of engagement and response from the participants most interested or moved by an activity (also known as nonresponse error), and responses exaggerating the aspects of behavior that are believed to be pleasing to the surveyor (social desirability bias).9
The most reliable way to control for social desirability bias is to avoid giving the respondent clues about which answers the surveying organization would prefer. Since eating meat and disregarding the welfare of farmed animals are social norms, if the respondent does not know the surveying organization’s agenda, their responses to questions will likely not be much influenced by social desirability bias. (The exception is that respondents may under-report consumption of red meat, especially if they are in groups which are expected to be concerned about their weight or dietary habits.10 ) This approach can be strengthened by including a control group and considering their responses as a baseline – attitudes and behaviors of participants in a program are attributed to the effects of the program only insofar as they differ from what was reported by the control group.
A survey administered to a captive audience will have a high response rate, making it more representative. However, it may also be more difficult to obscure the purpose of such a survey, leading to biased responses. If respondents to a survey will unavoidably know which answers the surveying organization would prefer, social desirability bias can be addressed by including a set of questions designed to measure which respondents are most likely to be answering in ways that reflect what they believe the surveyor wants to hear, rather than the truth. Several instruments for this purpose have been developed in the psychological literature. A high correlation between scores on the social desirability instrument and responses to other questions on the survey would suggest that answers to those questions may be driven in part by socially desirable responding, rather than by respondents’ true beliefs and behaviors.11 We give a suggested instrument and method for using it in our page on social desirability bias.
It may be helpful to conduct a pilot prior to investing large amounts of effort in carrying out an unfamiliar study design. A pilot is a small trial of the study, using the intended protocol on only enough respondents to ensure that instructions are clear and the survey length is reasonable. Especially when a study involves investing significant resources, a pilot can help prevent waste by identifying problems in study design and measurement. A pilot can also help determine how large the study needs to be, by providing estimates of the size of effects the study is trying to detect or of the number of subjects who will eventually respond. If members of the target audience are not available, several people who were not involved in writing the survey should take it and make note of any unclear points and of how long it took them. Cognitive interviewing is one useful approach to pretesting survey questions by examining the comprehension, recall, decisions and judgment, and response processes of respondents as they answer questions.
Transparency within the research community has many benefits. It can help Researchers interpret each other’s findings, replicate past studies, or design new studies. We encourage Researchers to practice transparency in all stages of their research.
Ideally, Researchers should pre-register their procedural and analysis plans before collecting data. These plans should be made publicly available and should provide information about participant recruitment methods, treatment group assignments, outcomes being measured, data collection, data management, and which analyses will be performed. Publishing detailed procedural information can help other Researchers to replicate the study in the future or to design one of their own. For Researchers conducting hypothesis tests, publicly committing to an analysis plan may reduce the temptation to test more hypotheses than originally intended.
After collecting and analyzing data, we encourage Researchers to publish their raw data along with their findings and any statistical programming code that was used. Sharing data and code allows other Researchers to check for errors and perform additional analyses, if desired. It may also reduce incidences of fraud. Data should be fully de-identified before it is made public; respondents’ IP addresses, email address, phone, name, and any other identifying information must be removed (termed direct identifiers). It is important to note that things such as demographic characteristics (termed quasi identifiers) may be used along with an additional database to re-identify individuals as part of a “linkage attack.” As such, it may be necessary to manipulate or remove the quasi identifiers as well if a risk of re-identification exists. Ideally, data should be presented with intuitive variable names and a codebook explaining each variable in greater detail. There are many ways to share data and code online for free; you may consider uploading them to one of the following repositories: GitHub, The Center for Open Science’s Open Science Framework, and Harvard University’s Dataverse.12
We encourage Researchers to make the results of their analyses public even if the results are not statistically significant. The publication of all results can aid in the interpretation of future studies. A small percentage of studies are likely to yield statistically significant results by chance. If there is no public record of how many studies have been run in total, it is impossible to know how many studies are likely to produce misleading results. If, on the other hand, Researchers announce all study plans and make results available whether they are significant or not, we will better understand the likelihood that significant results are meaningful.13
Conducting survey research can require significant expenditure of resources, so careful design is essential to ensure the study produces useful results. Many resources are available to help survey designers address important concerns. Survey designers should be especially careful to ensure that responses to the most vital questions will be useful. Multiple methods are available to help reduce bias, including using control groups, giving consideration to bias, and choosing a method of administration that increases response rate. Survey designers should consider these and other factors when deciding how to conduct a study that provides the best value to their organization.
For examples of the literature comparing modes of survey administration, see Fricker, S., Galesic, M., Tourangeau, R., & Yan, T. (2005). An experimental comparison of web and telephone surveys. The Public Opinion Quarterly, 69(3), 370-392. and Chang, L., & Krosnick, J. A. (2009). National surveys via RDD telephone interviewing versus the internet comparing sample representativeness and response quality. The Public Opinion Quarterly, 73(4), 641-678. For an example of a case in which a real surveyor has been thought to be crucial to the quality of data collected, see the 24 hour diet recall, described in Thompson, F. E., & Byers, T. (1994). Dietary assessment resource manual. The Journal of Nutrition, 124(11 Suppl), 2245s-2317s. (Some 24-hour diet recalls are now conducted with specialized computer tools.)
While reports of even the very recent past can be inaccurate, problems only grow with time. “[A] recent review of the survey literature found reduced levels of reporting or reduced reporting accuracy for hospital stays, health care visits, medical conditions, dietary intake, smoking, car accidents, hunting and fishing trips, consumer purchases, and home repairs as the length of the retention interval increased.” Tourangeau, Roger. (1999). “Remembering What Happened: Memory Errors and Survey Reports.” The Science of Self-report: Implications for Research and Practice. Stone, Arthur A. et al, eds. Psychology Press. The review cited is Jobe, J. B., Tourangeau, R., & Smith, A. F. (1993). Contributions of survey research to the understanding of memory. Applied Cognitive Psychology, 7(7), 567-584. Because the future is inherently uncertain, predictions are typically less accurate than recollections. See the next footnote for an example.
For instance, in the context of voting, about 10-20% of people incorrectly identify whether they have voted in a recent election. (Tittle, C. R., & Hill, R. J. (1967). The accuracy of self-reported data and prediction of political activity. The Public Opinion Quarterly, 31(1), 103-106.) When polled before an election to ask whether they will vote, a much higher percentage predict their behavior inaccurately: in one study 83% predicted they would vote, but only 43% did so. (Smith, J. K., Gerber, A. S., & Orlich, A. (2003). Self‐Prophecy Effects and Voter Turnout: An Experimental Replication. Political Psychology, 24(3), 593-604.) Since both effects are hypothesized to be caused in part by social desirability bias, errors may be smaller in situations where desirability operates less strongly than with voting.
If multiple tests are conducted, statistical procedures can be used to account for the increased likelihood of finding apparent patterns by chance.
In a thorough national diet study, almost two thirds of self-described vegetarians reported eating significant amounts of meat during one or both 24 hour diet recalls. Haddad, E. H., & Tanzman, J. S. (2003). What do vegetarians in the United States eat? The American Journal of Clinical Nutrition, 78(3), 626S-632S.
For more general tips on survey design and especially question design, see this guide. Although it is aimed primarily at academic programs, many of the suggestions are broadly useful.
De Leeuw, E. D. (2008). Choosing the method of data collection. In E. D. De Leeuw, J. J. Hox, & D. A. Dillman (Eds.), International handbook of survey methodology (pp. 113–135). European Association of Methodology Series. New York: Lawrence Erlbaum Associates.
For a more thorough discussion of how to minimize error when designing a survey, see Lohr, S. L. (2008). Coverage and Sampling. In E. D. De Leeuw, J. J. Hox, & D. A. Dillman (Eds.), International handbook of survey methodology (pp. 97–112). European Association of Methodology Series. New York: Lawrence Erlbaum Associates.
For instance, Justice for Animals conducted a survey of students exposed to humane education lectures and found that 9% of respondents reported having been vegan or vegetarian before the lecture. Since the students had not had a choice about whether to see the lectures or not, the divergence of this number from the rate of vegetarianism (or even self-reported vegetarianism) in the general population was most likely due in large part to a combination of response bias and social desirability bias. In the same survey, 14% of respondents reported having become vegetarian or vegan after the lecture; this rate is probably also affected by social desirability and response biases.
Several socially desirable response scales have been developed and validated with the intent of measuring which respondents answer questions based on what is socially correct and which questions are particularly affected by such response styles. We include one such scale in our suggested questions document. However, some evidence suggests that these scales may instead measure a personality type that is particularly likely to actually behave in a socially desirable way. For this reason, we suggest dealing with social desirability bias through other means if possible. For more on problems with controlling for socially desirable responding through use of a scale, see Barger, S. D. (2002). The Marlowe-Crowne affair: Short forms, psychometric structure, and social desirability. Journal of Personality Assessment, 79(2), 286-305.
GitHub (especially helpful for version control) and the Open Science Framework are good for sharing generally, and Dataverse for sharing data. Registering pre-analysis plans can also be done on the Open Science Framework, AsPredicted, AEA RCT Registry, Registry for International Development Impact Evaluations (3ie), and Experiments in Governance and Politics registry.
For a more thorough discussion of the need for transparency in social science research, see Christensen, G. (2016). Manual of best practices for transparent social science research.