Maths/Stats question - TTLG Forums

Convict on 21/6/2006 at 13:19

I have a set of data which, when grouped appropriately, approximates the normal distribution. However when left ungrouped it does not appear particularly in the normal distribution form.

I want to do standard deviation calculation - is it possible and if so, how (I'm not asking the formula)? ie do I use the raw data which doesn't appear seem to conform to the bell curve much or is there a way of using the appropriately grouped data for this calculation?

Cheers maths geniuses! :)

Scots Taffer on 21/6/2006 at 14:13

A few questions:

1. Why do you need the data to be normally distributed? Are you looking to calculate confidence intervals?

2. What is the appropriate grouping you speak of, is this categorical data (i.e. Male, Female and Shemales/Other)?

3. You don't need normally distributed data purely to calculate a standard deviation, so why is this relevant?

Convict on 21/6/2006 at 14:28

Scots I'm teaching myself statistics so plz bear with me. :)

I thought in order to calculate standard deviation and then be able to look the distribution and make inferences such as 68% of the population (possible scores) will be between certain scores (-1 to +1 standard deviation - I think this is right?) you need a normal bell curve distribution of the scores.

BTW this is for loot spawns on Thievery UT :ebil: Shemales :wot: that freaks me out!

Scots Taffer on 21/6/2006 at 15:09

What you are talking about is a confidence interval based on the Normal distribution (a confidence interval is merely a point estimate (mean value) plus/minus the z-score (based on your confidence level) multiplied by the standard deviation divided by the square root of the sample size). But the upshot is that the sample has a normal distribution, but I'm still hazy as to your "grouping".

Sombras on 21/6/2006 at 15:38

Convict--Yah, sometimes you want a normal distribution to be able to rationally (if not entirely accurately) measure effects.

If you have a continuous or categorical variable, you can z-score (or standardize) it. This forces a mean of zero and standard deviation of one. This doesn't necessarily turn it into a normal distribution, but it standardizes effect sizes when you do statistical tests, making them easier to understand and explain. Generally, you will want to z-score if the metric is not intuitive to the reader/audience (ex: test scores, evaluation points, survey scores, etc.).

Another way (there are probably a BUNCH of other ways I don't know of) is to do a log(arithmic) transformation. This actually forces a non-normal into a normal distribution. This is super common with income, for example, because, as everyone knows, income is not normally distributed across samples or populations. That's why you'll see economists using "log dollars" in their statistical analyses. Log transformations are useful for measures whose metrics are considered universally understandable (ex: the almighty $).

Gingerbread Man on 21/6/2006 at 17:20

stats geeks in the hizzy
:D

Raven on 21/6/2006 at 22:13

yeah it would help if we know what the data set was. you could take the standard deviation of either sets, but the one which doesn't look like a bell curve will have a massive uncertainty level. What are you doing to the data to make it look like a guassian distribution? In fact, where is the collected data from, and perhaps most imporantly, what were you expecting from the data, for it to group around your average - guassian, or perhaps another spread just describes that data better?

Ulukai on 21/6/2006 at 23:01

Poisson distributions were always my favourite.

Totally needing a poll to find out TTLGs best loved statistical distribution

Para?noid on 21/6/2006 at 23:07

As a music technology student I'd have to go with the Dirac Delta "function". But Poisson is LOL

Mortal Monkey on 21/6/2006 at 23:52

Good ole' Brownian for me.