How do I calculate if a test like this is statistically significant?

Azzu@lemm.ee · edit-2 10 months ago

How do I calculate if a test like this is statistically significant?

altairabove@lemmy.world · edit-2 10 months ago

You could use a few different null hypotheses here. One with minimal assumptions would be that the medians are equal. This can be tested using the Mann-Whitney U test. https://en.m.wikipedia.org/wiki/Mann–Whitney_U_test

Azzu@lemm.ee · 10 months ago

This seems like exactly the case here :) I will read up and try this

TauZero@mander.xyz · 10 months ago

Your situation reminded me of the way IMDB sorts movies by rating, even though different movies may receive vastly different total number of votes. They use something called a credibility formula which is apparently a Bayesian statistics way of doing it, unlike the frequentist statistics with p-values and null hypotheses that you are looking for atm.

Jlafs@lemmy.world · 10 months ago

Your null hypothesis is the thing you’re trying to disprove. For example, if I wanted to run a study to asses the effect of adding a certain growth hormone to a cell culture, my null hypothesis would be “there is no effect”. In your case, it would be “there is no difference in how much different things are liked”. From there, you’d run your study, and do your statistical analysis, for which there are different methods based on the type of data, number of groups your comparing, sample size, etc., and I’m not a statistician so I can’t say which methods are best for what you’re planning.

When it comes to p-value, to really simplify it, you can think of your p-value as the likelihood your null hypothesis is true. That’s not exactly what it means, but it’s an easy way to remember it.

JWBananas@startrek.website · 10 months ago

People are inherently bad at rating things. Why not run a “This or that?” style study instead?

Given a list of items to rate, pair them up randomly. Ask a person which item they like better out of each pair. Run through Final Four type eliminations until you get down to their number one preference.

Run through this process for each person, beginning with different random pairings every time.

Record data on all the choices - not just the final ones. You should be able to get good data like that.

For example, there will probably be a thing that is so disliked that it gets eliminated in the first round more frequently than anything else. The inverse will likely be true of a highly-preferred item. And I am sure you can identify other insights as well.

Azzu@lemm.ee · 10 months ago

Sounds like a good idea, however my participants neither have the attention span nor do I have the resources to do anything else :) after all, like I said, it’s just a small personal thing :)