Investigating Consumer Food Choice Behavior : An Application Combining Sensory 1 Evaluation and Experimental Auctions 2 3

1 Evaluation and Experimental Auctions 2 3 R. Karina Gallardo, Yeon A Hong, Marcial Silva Jaimes, and Johanna Flores Orozco 4 Abstract 5 We investigate what piece of information collected utilizing sensory evaluation tools 6 exhibits better predictive capacity on the willingness to pay, is it information from 7 preferences for a sensory quality attribute using hedonic scales or information on perceived 8 intensity for the same attribute using intensity scales? We also estimated if extrinsic or 9 intrinsic quality exerts a similar impact on consumer’s willingness to pay. We conducted a 10 sensory evaluation along with experimental auctions using three different apple varieties with 11 college students in Metropolitan Lima, Peru. Findings from this study show that information 12 collected on preference liking for apple quality attributes has a better explanatory capability 13 for willingness to pay, compared to information on consumers’ perceived intensity for the 14 same attribute. The explanatory capability was measured using measures of goodness-of-fit. 15 We also prove that willingness to pay was driven both by the apple variety induced intrinsic 16 quality attributes and the extrinsic cues of the variety. Results add to the existing body of 17 literature aiming to improve the understanding of consumer food choice behavior. 18


Introduction
Investigating consumers' food choices is a complex task to the extent that there does not seem to be a consensus across disciplines on the best approach to study it.A branch of marketing studies postulates that food choice behavior follows a structured process that could be described by different stages including problem recognition, information search, evaluation of alternatives, purchase decision, product consumption, and post purchase behavior (Kotler and Keller, 2012;Grunert, 2005).A branch of studies in economics and psychology advocate for a different perspective based on simple heuristics; that is, consumers select or eliminate products based on a few salient attributes rather than by using a systematic structured procedure (Rabin, 1998;Simon, 1957).In light of these contrasting perspectives, a popular alternative to improve the understanding of food choice behavior is to combine disciplines, such as sensory science and applied economics.In fact, there are numerous studies that follow such an approach: Lange et al., 2000;Lund et al., 2006;Stefani et al., 2006;Combris et al., 2009;Mueller et al., 2010;Gallardo et al., 2011;Zhang et al., 2010;Dinis et al., 2011;Bi et al., 2011;and Costanigro et al., 2014.In general, these studies seek to analyze the role of intrinsic and extrinsic food quality attributes on consumers' preferences and valuation for a food product.In these studies, consumers are asked to evaluate the external and internal sensory quality attributes and rate or rank the product and/or its sensory quality attributes as a function of their preferences.Next, consumers participate in experiments that would reveal the impact of their preferences on their well-being (Combris et al., 2009).Usually, this impact is measured by hypothetical-type questions using stated choice scenarios in a questionnaire format and/or incentive compatible experimental auctions.The importance of evaluating both external and internal sensory quality attributes stems from the postulate that internal quality attributes cannot be experienced at the time of purchase, so consumers rely on the external cues and past experiences with the internal attributes.It is believed that past experiences are not stellar in predicting repeated purchases if there is not a rigorous recollection or if these experiences were not consistent (Bi et al., 2011).The aforementioned factors coupled with non-sensory factors, such as convenience, societal values, production technology, personal health, and branding, intend to provide a complete depiction of food choice behavior (Jaeger, 2006).
The information obtained from sensory science is unique as it enables measurements of food sensory characteristics as perceived by humans, different from other sources of information such as chemicals or instruments that are also used to characterize food (Navarro da Silva et al., 2013).Sensory science uses sensory evaluation as its primary method of analysis (Tuorila and Monteleone, 2009).Sensory evaluation is "a scientific method that evokes, measures, analyzes, and interprets responses to products as perceived through the senses of sight, smell, touch, taste, and hearing" (Lawless and Heymann, 2010).Moreover, sensory science is considered to be at the intersection of other disciplines, including behavioral sciences, biology, nutrition, and health (Tuorila and Monteleone, 2009).
Among the many areas covered by sensory science, the evaluation of consumer preferences is an important one.Typically, consumer preferences are measured by the use of hedonic scales.The 9-point hedonic scale (from 1= dislike extremely to 9= like extremely) is the most internationally accepted and widely used.This scale was developed in 1947 at the Quartermaster Food and Container Institute for the U.S. Armed Forces.With this scale, word descriptors are used along with numbers that facilitate the interpretation of the mean values of the responses in terms of the degree of like/dislike (Lim, 2011).This scale is easy to implement and interpret both by respondents and researchers.However, it has limitations such as a high vulnerability to ceiling effects due to the small number of available categories and the general tendency of subjects to avoid using the extreme categories (Lim, 2011).
In addition to collecting information on preferences, the scales used in sensory science enable information collection about the chemical stimuli that sensory quality attributes trigger on panelists (Lim, 2011).The rationality of these scales is based on the idea that there is a direct relationship between perceived intensity and stimulus.Such relationships have been long studied in psychophysics, to the extent that current methods in psychophysics are able to capture the range of perceived intensities from a threshold to a maximum and capture with increased accuracy comparisons of perceived intensities across individuals (Bartoshuk, 2000).There are several ways to measure perceived intensity including 9-point scales (or similar) with word descriptors and magnitude scales that measure the ratio of intensities perceived for the same sensory quality attribute.One main disadvantage of intensity scales is that measurements are subjective and there is "no provision for anchoring the judgments of individual subjects to a common ruler" (Lim, 2011).In other words, there are no means to prove that a rating of "9" means the same to all panelists (Lim, 2011).
Given the different pieces of information regarding consumers' perceptions collected via sensory evaluation techniques, preferences versus perceived intensity of sensory quality attributes, one questions how such information relates to consumers' willingness to pay for a food product.In other words, what piece of information would have a better predictive capacity of consumers' willingness to pay: how much each sensory quality attribute is liked or how intensely is each sensory quality attribute perceived?The primary goal of this study is to respond to these inquiries.To achieve this goal, we estimate two sets of regressions: one using liking ratings and the other using perceived intensity in the set of explanatory variables.Then, we test which set of regressions has a greater explanatory capacity using measures of goodness-of-fit.We also estimate how coefficients from either set of regressions compare by using non-parametric tests.A second goal of this study is to infer if the variety induced sensory quality attributes (the intrinsic quality attributes that could be measured only when using sensory evaluation) or the variety itself (the extrinsic quality attributes) exerts a greater impact on the willingness to pay for a food product.Note that the goal of this paper is not to offer recommendations on general consumers' preferences and willingness to pay for apples but to test the performance of different sensory evaluation scales when explaining the willingness to pay behavior and to test which set of quality attributes (extrinsic or intrinsic) exerts a greater impact on the willingness to pay for a food product.We used apples because it is a familiar product to most, if not all, individuals.
To estimate the willingness to pay, we used a Vickrey second price auction.This type of preference elicitation methodology has the advantage of being incentive compatible.This means that participants face consequences after their bidding behavior as they are presented incentives to assess and reveal their preferences as truthfully as possible (Lusk and Shogren, 2007).With the Vickrey second price auction, every participant submits a bid or his/her willingness to pay for the product being auctioned.The participant who submits the highest bid would win the auction; that is, this participant will actually buy the product being auctioned.The price the winner would pay is the second-highest price (selling or market price) (Lusk and Shogren, 2007).The advantage of the Vickrey second price auction over other auction formats is that it is relatively simple to explain to participants, and it creates an endogenous market-clearing price, making sure the participants are involved in an active market environment exposed to market feedback (Lusk and Shogren, 2007).

Data collection
The experimental auctions and sensory evaluations were conducted in June 2015 at the facilities of the Universidad Nacional Agraria La Molina in Lima, Peru.One hundred students were recruited two weeks in advance using flyers posted around campus.To participate in the study, individuals had to have eaten apples in the last three months and had to be in charge of the grocery shopping at home.Using student pools is often questioned.In principle, recruiting college students was more convenient and less costly than recruiting standard household individuals.Additionally, the purpose of this study was to compare how liking and perceived intensity of attributes affected willingness to pay, not to derive conclusions about general consumer preferences toward a specific product.Nalley et al. (2006) argue that when deriving consumer preferences is not the central motivation of the study, students perform similarly to other groups in economic experiments.
All apple samples were procured from the same local grocery store.The experiment was conducted in two different sessions, each hosting 50 participants.In each session, individuals were requested to evaluate the three apple samples visually and by tasting.Each apple sample was identified with the letter D, N, or S. Participants were then asked to respond to a questionnaire describing the intensity and how much they like the visual quality attributes of each sample.Appearance attributes included the perceived presence of external defects and size.After evaluating the appearance attributes, researchers cut each apple sample given to each participant in half.To objectively assess apple size, participants were asked to measure the transverse diameter of each apple with a ruler and write that number as a response to the size question in the questionnaire.Next, panelists were asked to taste each apple sample.For this, the moderator gave a brief explanation of each quality attribute included in the study, for example, what is/how to measure crispness, firmness, sweetness, and acidity.Panelists were given instructions to rinse their palates with water between tasting each sample to neutralize their taste buds.Next, the panelists responded to the questionnaire in which they were required to rate how much they liked the following apple attributes using a 9-point scale (from 1= dislike extremely to 9= like extremely): crispness, firmness, sweetness, and acidity.They were also requested to rate the perceived intensity of each of the attributes using a 9-point scale (from 1= not intense to 9= extremely intense).When most participants signaled they had finished responding to the questionnaire, they were asked to submit a bid in nuevos soles per kilo in two repetitive rounds.Nuevo sol is the Peruvian currency; as of June 18, 2015, $1 was equivalent to 3.16 nuevos soles (Peru, Central Reserve Bank, 2015).Following the second price auction, bids were organized in ascending order, and the first-and second-highest bids were identified along with the panelists submitting those bids.Researchers kept records of the winning bids and did not reveal them to participants.To identify the winner of the auction, a binding sample and bid was selected randomly.Once the winning sample and panelist were identified, the winning panelist bought 1 kg of apples and paid the second-highest bid submitted in the session.

Econometric model
Censored bids are common in experimental auctions (Lusk and Shogren, 2007).The results from a censoring test on our bid data indicated that 6% of bid observations were censored (see Figure 1).In addition, likelihood ratio tests were used to test the appropriateness of the Tobit model compared with an OLS and Cragg's double hurdle model.Test results rejected the OLS and Cragg's double hurdle in favor of the Tobit model.The results of the likelihood ratio tests to justify the use of the Tobit specification are available upon request to the authors.Coefficient estimates for the Tobit model were estimated by maximizing the likelihood function (LF) that follows (Greene, 2008), where LF is the likelihood function, Bid i is the bid for panelist i (i= 1, …, 100), X i , is the intensity or likeness rating for each quality attribute (e.g., appearance/presence of defects, size, crispness, firmness, sweetness and acidity) as perceived by panelist i, β is the coefficient estimate of the intensity or likeness rating for each quality attribute, UC i , is the indicator variable for the uncensored bid observations, LC i is the indicator variable for the left censored bid observations, σ is the square root of the variance of the error terms, ϕ is the standard normal density function, and ϕ is the cumulative standard normal distribution function.The censored marginal effects were calculated by, where Bid i is the bid for panelist i (i= 1, …, 100), X i is the intensity or likeness rating for each quality attribute as perceived by panelist i, β is the coefficient estimate of the intensity or likeness rating for each quality attribute, σ is the square root of the variance of the error terms, and ϕ is the cumulative standard normal distribution function.
The coefficient estimates and marginal effects were calculated in SAS® v.9.2 SAS Institute, Cary, North Carolina, U.S.
Two regressions were estimated based on the empirical specification (1): one included the ratings for liking sensory quality attributes, and the other included ratings for the intensity perceived of the sensory quality attributes.To measure which of these two models explained better the variations in the bids, we used the following criteria of goodness of fit: (i) the Akaike Information Criterion (AIC), (ii) the Schwarz Criterion (SC) and (iii) the McFadden likelihood ratio.These tests can be used to compare non-nested models.The model with the smallest AIC and SC has better explanatory power.Further, we conducted the non-parametric Mann-Whitney test to compare the ordinal ranking of the coefficient estimates between models including liking and intensity as explanatory variables.For example, if the coefficient estimate for crispness liking was the largest among all coefficients in the model that used liking as the explanatory variables and if the coefficient estimate for crispness intensity was also the largest among all coefficients in the regression using intensity as the explanatory variables.To further compare how liking and the intensity coefficient estimates differ, we conducted a pairwise comparison to compare the marginal effects from the liking and intensity ratings on bids.
In addition, we investigated if the variety induced sensory quality attributes or if the varietal differences across samples exerted a greater impact on bids.To accomplish this, we conducted three sets of regressions: (i) a full model including sensory quality attributes and binary indicators for variety, (ii) a restricted model including sensory quality attribute variables without binary indicators for variety, and (iii) a restricted model including only binary indicators for variety.To assess if the full or restricted model exhibited a higher explanatory power, we conducted F-tests and likelihood ratio tests.

Results
The summary statistics of the liking and perceived intensity for each sensory quality attribute are presented in Table 1.Recall that to assess external appearance liking, we asked panelists to rate on a 9-point scale how much they liked the external appearance of the apple samples.To assess the "intensity" perceived of the external appearance, we asked panelists how they perceived the extent of the external defects of the apple on a 9-point scale (from 1= no defects to 9= abundant defects that I would not buy).In relation to size, the question asked how much panelists liked the fruit size, and when asking for intensity, the actual fruit diameter was used.Pairwise comparisons indicated statistically significant differences across the liking and intensity ratings for each sensory quality attribute.Additionally, we conducted a correlation test to determine if the liking and intensity ratings were positively correlated.We found a positive correlation between liking and intensity for the attributes of crispness, sweetness, and acidity.For firmness, the correlation coefficient between liking and intensity was negative, and the correlation between fruit size and the liking score for size was not statistically significant (see Table 1).
Comparing the bids submitted for each apple variety tasted, panelists offered higher bids for the variety 'Royal Gala', followed by 'Delicia' and 'Fuji' (see Table 2).Pairwise comparisons across bids for each apple variety signal that average bids for 'Delicia' were $0.186 kg -1 lower than those for 'Royal Gala' and $0.103 kg -1 higher than those for 'Fuji'.Bids for the 'Royal Gala' variety were $0.289 kg -1 higher than those for 'Fuji'.
The coefficient estimates for the Tobit model are presented in Table 3. Recall that to infer if the liking or intensity ratings explained better the variations on bids, we estimated two regressions in which one included liking ratings, and the other regression included intensity ratings for each sensory quality attribute in the set of explanatory variables.The results from the McFadden likelihood ratio index, the Akaike Information Criterion, and the Schwarz Criterion favored the models that included liking ratings over those that included intensity ratings (see Table 3).This provides interesting cues to the rational process followed by the panelists.Willingness to pay is better explained by liking ratings rather than intensity ratings, as the intensity perceived is not perfectly correlated with liking.In other words, a more intense or a higher perceived level of an attribute does not denote a higher preference and willingness to pay.
The results from the F-tests and likelihood ratio tests to compare the full versus the restricted model indicate that the full model exhibits a higher explanatory power.The likelihood ratio tests led us to reject the restricted model in favor of the full model when the liking ratings were included as explanatory variables (the likelihood ratio statistic was 6.77, the 95% critical chi-square value with 2 degrees of freedom was 5.99).Similarly, when intensity ratings were included as explanatory variables (the estimated likelihood ratio statistic was 8.35), we rejected the restricted model in favor of the full model.Similar results were found after estimating the F-statistic; the results from this test led us to reject the restricted model in favor of the full model.The F-statistic was 3.46 when including the liking ratings and 4.30 when including the intensity ratings, with both values higher than the F critical value at 3.03 (95%, 2 degrees of freedom in the numerator and 288 degrees of freedom in the denominator).We conclude that the full model exhibits a higher explanatory power than the restricted model.
In the full model, the liking ratings for crispness, sweetness, and the binary variable for the variety 'Royal Gala' had a positive effect on the bids submitted.This was similar to the model using intensity ratings in the set of explanatory variables; intensity ratings for crispness, sweetness, and the binary variable for 'Royal Gala' were positive and statistically significant.That the coefficients for quality characteristics and variety were statistically significant indicates that both the variety induced quality characteristics and the variety itself affect bids.The three apple samples presented to the panelists were three varieties with different external attributes: 'Delicia' is elongated in shape and red in color, 'Royal Gala' is red with cream and yellow stripes, and 'Fuji' is bicolored yellow and red.All three samples were presented with peels; hence, it is possible that panelists recognized these varieties from their external appearance and recalled previous sensory experiences that influenced their preferences and bids.When excluding the binary variables for varieties (the restricted model), the liking ratings for size, crispness and sweetness were positive and statistically significant.For the restricted model using the intensity ratings in the set of explanatory variables, only the intensity rating for sweetness was positive and statistically significant.
The implications from these findings are twofold.First, models including the liking ratings outper-formed the models including the intensity ratings in the set of explanatory variables.The liking scores for most attributes included in the model (i.e., appearance, crispness, sweetness, firmness, and acidity), except for size, exhibited a statistically significant correlation with the perceived intensity scores.However, liking and intensity did not have a similar predictive capability of willingness to pay, with liking scores showing a higher predictive capacity.The second implication is the importance of the apples' external appearance (extrinsic quality) and the possibility that participants recognized and recalled past consumption experiences and showed a stronger preference for the apple they recalled liking the most.If the interest is centered in eliciting willingness to pay for intrinsic sensory quality attributes, it is recommended to present panelists with peeled samples so they cannot recognize a priori the variety being evaluated and possibly influence their preferences and willingness to pay.
The main conclusions are as follows.Food choice behavior is complex.Combining disciplines such as sensory science and experimental economics is becoming a popular approach to improve the understanding of food choice behavior.In this study, we combined both disciplines to investigate what piece of information exhibits better predictive capacity on the willingness to pay, information from preferences measured using hedonic liking scales or information about perceived intensity using intensity scales.We also estimated if extrinsic or intrinsic quality exerts a similar impact on a consumer's willingness to pay.The results from this study show that preference liking has a better explanatory capability for willingness to pay when compared with perceived intensity.The more the panelists liked a sensory quality attribute, the more they are willing to pay for the food product.This was not the case regarding perceived intensity; panelists were not necessarily willing to pay a higher price when they perceived an attribute more strongly despite the correlation existing between the liking scores and the intensity scores.Another interesting finding VOLUME 45 Nº1 JANUARY -APRIL 2018 is that willingness to pay was not only driven by variety induced intrinsic sensory quality attributes alone but also by the extrinsic cues of the actual variety.The findings from this study add to the existing body of literature that aims to improve the understanding of consumers' food choice behavior.En este estudio investigamos qué información recopilada utilizando herramientas de evaluación sensorial muestra una mejor capacidad predictiva sobre la disposición a pagar, ¿es la información de las preferencias medidas usando escalas de afición hedónicas o información sobre la intensidad percibida usando escalas de intensidad?También estimamos si la calidad extrínseca o intrínseca ejerce un impacto similar en la disposición a pagar del consumidor.Realizamos un estudio de evaluación sensorial y subastas experimentales con tres variedades de manzanas en la que participaron estudiantes de una Universidad en Lima, Perú.Los resultados de este estudio demuestran que la información recopilada sobre la preferencia por un atributo de calidad sensorial tiene una mejor capacidad predictiva para la disposición a pagar, en comparación con la información sobre la percepción de la intensidad percibida del atributo de calidad sensorial.Además, demostramos que tanto los atributos intrínsecos de calidad sensorial inducidos por la variedad de manzana y las señales extrínsecas sobre la variedad en sí, tienen un impacto en la disposición a pagar.Los resultados se suman a la literatura existente que tiene como objetivo mejorar la comprensión de la conducta de los consumidores al comprar alimentos.

Table 1 .
Summary statistics for liking and perception of intensity ratings.Correlation between liking and perception of intensity ratings for apples.

Table 2 .
Summary Statistics -Average for bids for 'Delicia', 'Royal Gala' and 'Fuji' apples.Pairwise comparison of bids across varieties.

Table 3 .
Coefficient estimates for the willingness to pay for quality characteristics for 'Delicia','Royal Gala' and 'Fuji' apples.