Repository GitHub logo

Results and Discussion

Hypothesis Testing

Generally, to test the hypotheses, the following steps are necessary:

  1. Compare mean/median scores
  2. Check test assumptions
  3. Conduct hypothesis test
  4. Calculate effect size

In the following, all results are reported apart from the test assumptions (you can read about this in the thesis).

Hypothesis 1

Measure IR NIR
Mean 0.2612 0.2775
SD 0.1249 0.1788


Due to the contrary tendency as indicated by the mean scores, hypothesis H1 is rejected. Instead, the alternative hypothesis—H1b NIRs are more positive than IRs—is tested.

Measure Value
p 0.0103
d 0.047


With p < 0.05, the difference between the groups is statistically significant (H1b is confirmed). Still, with a very small effect size of 0.047, the difference is considered to have no practical relevance.

Hypothesis 2

Measure IR NIR
Mean 5.2486 5.2886
SD 0.2718 0.4158


Due to the contrary tendency as indicated by the mean scores, hypothesis H2 is rejected. Instead, the alternative hypothesis—H2b NIRs are more complex than IRs—is tested.

Measure Value
p 0.0233
d 0.04


With p < 0.05, the difference between the groups is statistically significant (H2b is confirmed). Still, with a very small effect size of 0.04, the difference is considered to have no practical relevance.

Hypothesis 3

Measure IR NIR
Mean 475.8192 327.6587
SD 311.3817 280.9906


The mean tendency supports the initial hypothesis.

Measure Value
p 0.00
d 0.64


With p < 0.0011, the difference between the groups is statistically highly significant (H3 is confirmed). Even more, with an effect size of 0.64, the difference is considered to have considerable practical relevance.

Hypothesis 4

Measure IR NIR
Median 5 4


In this case, the median scores are compared because there are ordinal variables. Due to the contrary tendency, hypothesis H4 is rejected. Instead, the alternative hypothesis—H4b NIRs are less extreme than IRs—is tested.

Measure Value
p 0.0001


With p < 0.001, the difference between the groups is statistically highly significant. Therefore, the relative frequencies of the 1- and 5-star category is compared:

Review Type ⭐⭐⭐⭐⭐
IR 0.965 52.648
NIR 1.375 47.989


The relative frequencies show ambivalent results: While there are indeed less 1-star reviews, there are more 5-star reviews in the IR sample. Thus, hypothesis H4b is rejected.

Hypothesis 5

Measure IR NIR
Mean 91.7514 91.3303
SD 2.0898 3.8464


The mean tendency supports the initial hypothesis.

Measure Value
p 0.498


With p > 0.05, the difference between the groups is not statistically significant. Thus, H5 is rejected.

Summary

Hypothesis Status Relevant?
H1: IRs are more positive than NIRs Rejected -
H1b: NIRs are more positive than IRs Confirmed No
H2: IRs are more complex than NIRs Rejected -
H2b: NIRs are more complex than IRs Confirmed No
H3: IRs are more elaborate than NIRs Confirmed Yes
H4: IRs are less extreme than NIRs Rejected -
H4b: NIRs are less extreme than IRs Rejected -
H5: IRs are more objective than NIRs Rejected -


Note: “Status” reflects either the outcome of a) the comparison of mean / median scores; or b) the results of the hypotheses tests. The column “Relevant?” refers to the effect size (if computed) and whether the significant difference is considered to be relevant.

Only one of the five initial hypotheses could be confirmed: H3. The other four hypotheses were rejected. The two alternative hypotheses H1b and H2b were found to be statistically significant. At the same time, effect size indicates that the difference has no practical relevance. In case of H4b, a significant association was found, but with respect to the relative frequencies of the two respective star rating categories, the hypothesis was rejected.

Finally, the following table summarizes the hypotheses concepts and whether a relevant difference was found:

Hypothesis Concept Difference?
Positivity
Complexity
Elaborateness
Extremeness
Objectivity


This is the basis for the following discussion.

Discussion

Hypothesis 1
Book reviewers do not feel to obliged to give anything more in return for the free copy than their opinion. Even more, positive publicity might be more relevant for other product types than books; in the latter case, even negative publicity might be valuable. This could explain why IRs and NIRs do not differ with respect to positivity.

Hypotheses 2 & 3
Book reviewers might not be aware of the possible danger of adding a disclosure statement to their reviews; thus, the underlying assumption of „self-fulfilling prophecy“ can be rejected. At the same time, an explanation for the confirmation of H3 could be the aforementioned norm of reciprocity.

Hypothesis 4
Reviewer motivations, while certainly different for incentivized and non-incentivized reviewers, are not reflected by extreme star ratings.

Hypothesis 5
It can be assumed that book reviewers develop an uniform writing style in reviews. Therefore, it does not make a difference whether the review is published shortly after the product experience or not.

Conclusion

❓ Do incentivized book reviews show signs of influence if the reviewer received a free book copy?

❗ Incentivization indeed impacts the contents of book reviews, but the only form of impact that has been found is an influence on review elaborateness (in terms of review length). At the same time, book reviews do not differ with respect to positivity, complexity, extremeness, and objectivity.

However, the phenomenon of „influence“ needs further investigation because there might be more dimensions than just the five considered in this thesis. Also, a conclusion such as „longer reviews are influenced“ is abridged.

Limitations
The findings are only valid with respect to this study’s product type, genre, language, reviewing platform, and temporal limitation.

Further Research Perspectives


  1. Note that the p-value is not exactly zero. This stems from rounding the values.