When using the "ranksum" function, how can I differentiate bewteen the two options of getting a low p-value?

Asked by Michal on 6 Aug 2012
Latest activity Commented on by Michal on 9 Aug 2012

Say I have two vectors- A and B. If I get a low p-value, I would like to check whether this low p-value stands for high median of B compared to A, instead of just different medians (that is, either the median of B is higher than the median of A, or vise versa).

For example:

A=[120 10 201 20 30 12 30 10 2 2 3 5 1]
B=[140 400 120 2000 30 40 2000 1000 1000]

I get a p-value of 7.2251e-004.

But if

B=[1 0 0 0 0 0 0 0 0 0 0 0 0]

I also get a low p-value (6.4360e-006).

I would like to get only low p-values when B median is higher then A median. Since I have many calculations, I need it to be automatically in the code, instead of checking every pair of vectors. Do you have any idea how to do that?

Thanks, Michal

0 Comments

Michal

Products

No products are associated with this question.

2 Answers

Answer by Star Strider on 6 Aug 2012
Accepted answer

Both ‘one-sided’ (that one median is greater than or less than the other) and ‘two-sided’ (that the medians are different) options are possible. See Item #4 under Assumptions and formal statement of hypotheses in the Wikipedia article on the Mann–Whitney U. There is also an excellent discussion of this on page 3 of The Wilcoxon Rank-Sum Test.

According the the documentation, ranksum returns the two-sided p-value, so make the appropriate calculation to get the one-sided p-value.

3 Comments

Michal on 7 Aug 2012

The Wikipedia article didn't exactly help me, but the second reference was very helpful.

If I understood correctly- when I use the following command: [p,h,stats] = ranksum(A,B), the "ranksum" value that I get (in the second field in the stats structure) can help me in the following way:

High ranksum value = H1 : A > B

Low ranksum value = H1 : A < B

Is that correct? If it is correct, then what should be the threshold for differentiating between the two options?

Many thanks!

Star Strider on 7 Aug 2012

My pleasure!

When I experimented with this a bit, I discovered that the z-statistic (z-value) — the first ‘stats’ field — may be the answer. When A > B, the z-statistic is (+)ve, and when A < B, the z-statistic is (-)ve.

According to the documentation, ‘ranksum’ only computes the z-value for large samples, so if your samples aren't large enough, I suggest simply comparing the medians.

Michal on 9 Aug 2012

That's great!

Again, many thanks!!

Star Strider
Answer by the cyclist on 6 Aug 2012
Edited by the cyclist on 6 Aug 2012

Use the 'tail' option. A careful read of

>> help ranksum

will explain how.

(In the first draft of my answer, I pointed to "doc ranksum" rather than "help ranksum", but it seems that that documentation doesn't list the 'tail' option. Weird.)

4 Comments

Michal on 7 Aug 2012

But thanks for trying to help!

the cyclist on 7 Aug 2012

Ah. I am using the prerelease of R2012b. It seems that 'tail' option is new.

Michal on 9 Aug 2012

I should probably get this release too. Thanks anyway!

the cyclist

Contact us