Results and Discussion
Hypothesis Testing
Generally, to test the hypotheses, the following steps are necessary:
- Compare mean/median scores
- Check test assumptions
- Conduct hypothesis test
- Calculate effect size
In the following, all results are reported apart from the test assumptions (you can read about this in the thesis).
Hypothesis 1
Measure | IR | NIR |
---|---|---|
Mean | 0.2612 | 0.2775 |
SD | 0.1249 | 0.1788 |
Due to the contrary tendency as indicated by the mean scores, hypothesis H1 is rejected. Instead, the alternative hypothesis—H1b NIRs are more positive than IRs—is tested.
Measure | Value |
---|---|
p | 0.0103 |
d | 0.047 |
With p < 0.05, the difference between the groups is statistically significant (H1b is confirmed). Still, with a very small effect size of 0.047, the difference is considered to have no practical relevance.
Hypothesis 2
Measure | IR | NIR |
---|---|---|
Mean | 5.2486 | 5.2886 |
SD | 0.2718 | 0.4158 |
Due to the contrary tendency as indicated by the mean scores, hypothesis H2 is rejected. Instead, the alternative hypothesis—H2b NIRs are more complex than IRs—is tested.
Measure | Value |
---|---|
p | 0.0233 |
d | 0.04 |
With p < 0.05, the difference between the groups is statistically significant (H2b is confirmed). Still, with a very small effect size of 0.04, the difference is considered to have no practical relevance.
Hypothesis 3
Measure | IR | NIR |
---|---|---|
Mean | 475.8192 | 327.6587 |
SD | 311.3817 | 280.9906 |
The mean tendency supports the initial hypothesis.
Measure | Value |
---|---|
p | 0.00 |
d | 0.64 |
With p < 0.0011, the difference between the groups is statistically highly significant (H3 is confirmed). Even more, with an effect size of 0.64, the difference is considered to have considerable practical relevance.
Hypothesis 4
Measure | IR | NIR |
---|---|---|
Median | 5 | 4 |
In this case, the median scores are compared because there are ordinal variables. Due to the contrary tendency, hypothesis H4 is rejected. Instead, the alternative hypothesis—H4b NIRs are less extreme than IRs—is tested.
Measure | Value |
---|---|
p | 0.0001 |
With p < 0.001, the difference between the groups is statistically highly significant. Therefore, the relative frequencies of the 1- and 5-star category is compared:
Review Type | ⭐ | ⭐⭐⭐⭐⭐ |
---|---|---|
IR | 0.965 | 52.648 |
NIR | 1.375 | 47.989 |
The relative frequencies show ambivalent results: While there are indeed less 1-star reviews, there are more 5-star reviews in the IR sample. Thus, hypothesis H4b is rejected.
Hypothesis 5
Measure | IR | NIR |
---|---|---|
Mean | 91.7514 | 91.3303 |
SD | 2.0898 | 3.8464 |
The mean tendency supports the initial hypothesis.
Measure | Value |
---|---|
p | 0.498 |
With p > 0.05, the difference between the groups is not statistically significant. Thus, H5 is rejected.
Summary
Hypothesis | Status | Relevant? |
---|---|---|
H1: IRs are more positive than NIRs | Rejected | - |
H1b: NIRs are more positive than IRs | Confirmed | No |
H2: IRs are more complex than NIRs | Rejected | - |
H2b: NIRs are more complex than IRs | Confirmed | No |
H3: IRs are more elaborate than NIRs | Confirmed | Yes |
H4: IRs are less extreme than NIRs | Rejected | - |
H4b: NIRs are less extreme than IRs | Rejected | - |
H5: IRs are more objective than NIRs | Rejected | - |
Note: “Status” reflects either the outcome of a) the comparison of mean / median scores; or b) the results of the hypotheses tests. The column “Relevant?” refers to the effect size (if computed) and whether the significant difference is considered to be relevant.
Only one of the five initial hypotheses could be confirmed: H3. The other four hypotheses were rejected. The two alternative hypotheses H1b and H2b were found to be statistically significant. At the same time, effect size indicates that the difference has no practical relevance. In case of H4b, a significant association was found, but with respect to the relative frequencies of the two respective star rating categories, the hypothesis was rejected.
Finally, the following table summarizes the hypotheses concepts and whether a relevant difference was found:
Hypothesis Concept | Difference? |
---|---|
Positivity | ❌ |
Complexity | ❌ |
Elaborateness | ✅ |
Extremeness | ❌ |
Objectivity | ❌ |
This is the basis for the following discussion.
Discussion
Hypothesis 1
Book reviewers do not feel to obliged to give anything more in return for the free copy than their opinion. Even more, positive publicity might be more relevant for other product types than books; in the latter case, even negative publicity might be valuable. This could explain why IRs and NIRs do not differ with respect to positivity.
Hypotheses 2 & 3
Book reviewers might not be aware of the possible danger of adding a disclosure statement to their reviews; thus, the underlying assumption of „self-fulfilling prophecy“ can be rejected. At the same time, an explanation for the confirmation of H3 could be the aforementioned norm of reciprocity.
Hypothesis 4
Reviewer motivations, while certainly different for incentivized and non-incentivized reviewers, are not reflected by extreme star ratings.
Hypothesis 5
It can be assumed that book reviewers develop an uniform writing style in reviews. Therefore, it does not make a difference whether the review is published shortly after the product experience or not.
Conclusion
❓ Do incentivized book reviews show signs of influence if the reviewer received a free book copy?
❗ Incentivization indeed impacts the contents of book reviews, but the only form of impact that has been found is an influence on review elaborateness (in terms of review length). At the same time, book reviews do not differ with respect to positivity, complexity, extremeness, and objectivity.
However, the phenomenon of „influence“ needs further investigation because there might be more dimensions than just the five considered in this thesis. Also, a conclusion such as „longer reviews are influenced“ is abridged.
Limitations
The findings are only valid with respect to this study’s product type, genre, language, reviewing platform, and temporal limitation.
Further Research Perspectives
- different formalisation and operationalisation of the concepts
- repeat analysis with sentence-based data
- try to avoid misclassification of NIRs
- use different NIR sample
- avoid biases by analyzing intra-reviewer or intra-book differences
- analyze a different genre
- derive hypotheses from book market-specifics etc.
-
Note that the p-value is not exactly zero. This stems from rounding the values. ↩