Faunalytics Data Analysis Method – 2015
This is a technical supplement to an archived version of our report on vegetarian recidivism. You can read the most recent report here.
Method Used to Obtain Average Length of Vegetarianism from Faunalytics Data
Preparing the Data
Our goal with our analysis of Faunalytics’ data was to obtain an average length of time that people follow a vegetarian diet. However, the way in which the data was collected was not conducive to an easy analysis. Faunalytics categorized responses to the questions, “how long have you been vegetarian?” and “how long were you vegetarian?”, into the the following categories: 0-3 months, 4-11 months, 1-2 years, 3-5 years, 6-10 years, and over 10 years. They presented this data by citing the proportion of participants that fell in each category for current and former vegetarians (see Table 1).
The challenge that we were faced with was that in order to obtain an average length of vegetarianism, we need the exact length of time that all 1336 respondents were vegetarian for (e.g., 1 year, 7 months, 3.5 years, etc). However, since we only have the proportion of people in each category, we simulated the data for each participant.1
First, we converted the proportion of people in each category to an approximate number of people in each category (see Table 1). Next, within each category, we randomly generated observations under the assumption that they were normally distributed with the mean in the middle of the range (see Figure 1). For example, for the category of 0-3 months for former vegetarians, we simulated 382 data points with most of the observations close to 1.5 months (see Table 2)2.
Former Vegetarians (%)* | Current Vegetarians (%)* | Former Vegetarians (#)** | Current Vegetarians (#)** | |
---|---|---|---|---|
Totals | 1124 | 212 | 1124 | 214.12 |
Up to 3 months | 0.34 | 0.05 | 382.16 | 10.6 |
4-11 months | 0.19 | 0.08 | 213.56 | 16.96 |
1-2 years | 0.18 | 0.07 | 202.32 | 14.84 |
3-5 years | 0.09 | 0.12 | 101.16 | 25.44 |
6-10 years | 0.06 | 0.08 | 67.44 | 16.96 |
More than 10 years | 0.06 | 0.58 | 67.44 | 122.96 |
Don’t know | 0.08 | 0.03 | 89.92 | 6.36 |
*Numbers reported by Faunalytics
**Numbers computed by ACE
Category | Midpoint of Category used as Mean |
---|---|
0-3 months | 1.5 months |
4-11 months | 7.5 months |
1-2 years | 1.5 years |
3-5 years | 4 years |
6-10 years | 8 years |
more than 10 years | 15.5 years |
The assumption that the data is normally distributed within each category is problematic because in reality, the data could be distributed very differently for each category. For example, the observations could be mostly centered around 1 or 2 months for former vegetarians in the 0-3 month category. Such a difference would mean that the average time could be lower or higher for each category than what we reported. The only category that we changed the shape of the distribution for was the “more than 10 years” category, since the data would realistically be much more spread out than the rest of the categories. We simulated a right-skewed distribution for that category with more data points near the 10-15 year mark and fewer observations at 40, 50, and 60 years (see Figure 2).
Method to Obtain Former Vegetarian Mean
To obtain a mean length of vegetarianism for former vegetarians, we took the average of all the simulated observations for former vegetarians in each category. We then computed a 95% confidence interval for the mean. Our final estimate was an average of 2.8 years (+/- 0.40). See Figure 3 for a histogram of the simulated data for former vegetarians.
Method to Obtain Current Vegetarian Mean
An issue with obtaining an average length of vegetarianism for current vegetarians is that they continue being vegetarian past the point of the survey. Therefore, we know that if we use their reported length of vegetarianism, our estimates are going to be much lower than reality. To obtain an estimate that is closer to the real average times that these participants would remain vegetarian, we can assume that on average, they reported a time about halfway through their period of vegetarianism. In a survey drawing randomly from the population, like the Faunalytics study, this is a realistic assumption.
An issue associated with this method is that since some of our simulated observations are at 50 and 60 years, we can infer that those individuals almost certainly would not be vegetarian for 50 or 60 more years. Statistically, there are also an equal number of observations for which we likely underestimate the length of the vegetarian diet, although these are harder to identify (i.e., an observation point at 10 years could be near the middle of a 20 year stretch of vegetarianism, near the end of a 10 year one, or near the beginning of a 60 year stretch). Since observations that are before and after their midpoint occur in roughly equal numbers, the errors associated with doubling the mean are reduced with large sample sizes.
Based on the assumptions outlined above, we calculated the average of all current vegetarians, which was 14.1 years (+/- 1.5 years), and doubled that number to obtain a final mean of 28.2 years (+/- 3). Error ranges given are for 95% confidence intervals. See Figure 4 for a histogram of the doubled simulated data for current vegetarians.
Method to Obtain Combined Mean
To obtain an overall average for the length of vegetarianism, we combined all 1241 simulated observations for former and current vegetarians. This number includes all of the people who reported being vegetarian in the Faunalytics survey. Using the data from former vegetarians and the doubled observations from current vegetarians, we estimated a mean and 95% confidence interval of 7.03 years (+/- 0.80). See Figure 5 for a histogram of all of the simulated data for former and current vegetarians.
Appendix
Category | Standard Deviation |
---|---|
0-3 months | 0.75 months |
4-11 months | 1.75 months |
1-2 years | 0.25 years (3 months) |
3-5 years | 0.5 years (6 months) |
6-10 years | 1 year (12 months) |
More than 10 years | 12.5 years (150 months) |
Contact us if you’d like the complete R script we used for this analysis.
The standard deviation we chose for each range had implications for how spread out the data was from the mean. We chose a standard deviation that was proportionally 25% of the spread of each category range. For example, for the category 0-3 months, 25% of 3 months is a standard deviation of 0.75. The only category where we departed from this method was for “more than 10 years” because we imagined that the data would spread far past the mean to up to 40 and 50 years. Please see the Appendix for a table of the standard deviations that we used.