# When using the "ranksum" function, how can I differentiate bewteen the two options of getting a low p-value?

Asked by Michal on 6 Aug 2012
Latest activity Commented on by Michal on 9 Aug 2012

Say I have two vectors- A and B. If I get a low p-value, I would like to check whether this low p-value stands for high median of B compared to A, instead of just different medians (that is, either the median of B is higher than the median of A, or vise versa).

For example:

```A=[120 10 201 20 30 12 30 10 2 2 3 5 1]
B=[140 400 120 2000 30 40 2000 1000 1000]
```

I get a p-value of 7.2251e-004.

But if

```B=[1 0 0 0 0 0 0 0 0 0 0 0 0]
```

I also get a low p-value (6.4360e-006).

I would like to get only low p-values when B median is higher then A median. Since I have many calculations, I need it to be automatically in the code, instead of checking every pair of vectors. Do you have any idea how to do that?

Thanks, Michal

Answer by Star Strider on 6 Aug 2012

Both ‘one-sided’ (that one median is greater than or less than the other) and ‘two-sided’ (that the medians are different) options are possible. See Item #4 under Assumptions and formal statement of hypotheses in the Wikipedia article on the Mann–Whitney U. There is also an excellent discussion of this on page 3 of The Wilcoxon Rank-Sum Test.

According the the documentation, ranksum returns the two-sided p-value, so make the appropriate calculation to get the one-sided p-value.

Michal on 7 Aug 2012

The Wikipedia article didn't exactly help me, but the second reference was very helpful.

If I understood correctly- when I use the following command: [p,h,stats] = ranksum(A,B), the "ranksum" value that I get (in the second field in the stats structure) can help me in the following way:

High ranksum value = H1 : A > B

Low ranksum value = H1 : A < B

Is that correct? If it is correct, then what should be the threshold for differentiating between the two options?

Many thanks!

Star Strider on 7 Aug 2012

My pleasure!

When I experimented with this a bit, I discovered that the z-statistic (z-value) — the first ‘stats’ field — may be the answer. When A > B, the z-statistic is (+)ve, and when A < B, the z-statistic is (-)ve.

According to the documentation, ‘ranksum’ only computes the z-value for large samples, so if your samples aren't large enough, I suggest simply comparing the medians.

Michal on 9 Aug 2012

That's great!

Again, many thanks!!

Answer by the cyclist on 6 Aug 2012
Edited by the cyclist on 6 Aug 2012

Use the 'tail' option. A careful read of

```>> help ranksum
```

will explain how.

(In the first draft of my answer, I pointed to "doc ranksum" rather than "help ranksum", but it seems that that documentation doesn't list the 'tail' option. Weird.)

Michal on 7 Aug 2012

But thanks for trying to help!

the cyclist on 7 Aug 2012

Ah. I am using the prerelease of R2012b. It seems that 'tail' option is new.

Michal on 9 Aug 2012

I should probably get this release too. Thanks anyway!