3.27 statistics
Every function in this library is available on the statistics module object. For example, if you used import statistics as S, you would write S.median to access median below. If you used include, then you can refer to identifiers without writing S. as a prefix.
3.27.1 Basic Statistical Values
Calculates the arithmetic mean, also known as the average, of the numbers in l. This is simply the sum of all the values in the list, divided by its length.
check: mean([list: ]) raises "Empty List" mean([list: 1]) is 1 mean([list: 2, 2, 4.5, 1.5, 1, 1]) is 2 end
Calculates the median of the numbers in l. This is the “middle-most” value in the list, if the values were sorted. If the list is of even length, returns the average of the two middle-most values.
check: median([list: ]) raises "Empty List" median([list: 2]) is 2 median([list: -1, 0, 1, 2, 5]) is 1 end
Calculates the modes of the numbers in l. These are the numbers that appear most often in the list. If no number appears more than once, returns the empty list. The modes will be returned in sorted order.
Computing the mode of a list of values is unambiguous when there is a unique “most common” element. Computer scientists and mathematicians agree that when two values are equally “most common”, they are both considered modes of the list. The natural generalization of this is that when all values occur equally often, they are all modes of the list. However, many high-school textbooks assert that when no element appears more than once, no element should be considered a mode. To avoid confusing high-school students, we adopt the definition they will find in their textbooks.
check: modes([list: ]) is [list: ] modes([list: 1, 2, 3, 4]) is [list: ] modes([list: 1, 2, 3, 1, 4]) is [list: 1] modes([list: 1, 2, 1, 2, 2, 1]) is [list: 1, 2] modes([list: 1, 2, 2, 1, 2, 1]) is [list: 1, 2] end
Determines if a list of numbers has any modes, i.e., any repeated values.
check: has-mode([list: ]) is false has-mode([list: 1, 2, 3, 4]) is false has-mode([list: 1, 2, 2, 1, 2, 2]) is true has-mode([list: 1, 2, 3, 2]) is true end
Returns the smallest mode of a list of numbers, if any is present.
check: mode-smallest([list: ]) raises "empty" mode-smallest([list: 1]) raises "no duplicate values" mode-smallest([list: 1, 2, 3, 4, 5]) raises "no duplicate values" mode-smallest([list: 1, 1, 2]) is 1 mode-smallest([list: 1, 2, 1, 2]) is 1 end
Returns the largest mode of a list of numbers, if any is present.
check: mode-smallest([list: ]) raises "empty" mode-smallest([list: 1]) raises "no duplicate values" mode-smallest([list: 1, 2, 3, 4, 5]) raises "no duplicate values" mode-smallest([list: 1, 1, 2]) is 1 mode-smallest([list: 1, 2, 1, 2]) is 2 end
Returns an arbitrary mode of a list of numbers, if any is present.
check: mode-any([list: ]) raises "empty" mode-any([list: 1]) raises "no duplicate values" mode-any([list: 1, 2, 3, 4, 5]) raises "no duplicate values" mode-any([list: 1, 1, 2]) is 1 mode-any([list: 1, 2, 1, 2]) satisfies lam(m): (m == 1) or (m == 2) end end
Gives the population or uncorrected sample standard deviation of the data set represented by numbers in l.
check: stdev([list: ]) raises "list is empty" stdev([list: 2]) is 0 stdev([list: 2, 4, 4, 4, 5, 5, 7, 9]) is 2 end
Gives the corrected sample standard deviation of the data set represented by numbers in l.
check: stdev-sample([list: ]) raises "list is empty" stdev-sample([list: 2]) raises "division by zero" stdev-sample([list: 2, 4, 4, 4, 5, 5, 7, 9]) is-roughly 2.1380899 end
3.27.2 Statistical Models
Pyret currently supports two functions for working with simple linear-regression models. Further support will be added over time.
Calculates a linear regression to model a simple independent -> dependent variable relationship, using ordinary least squares regression. Its result is a predictor function to predict a y-value given an x-value.
check: predictor = linear-regression([list: 0, 1, 2, 3], [list: 3, 2, 1, 0]) predictor(1) is-roughly 2 predictor(1.5) is-roughly 1.5 predictor(1000) is-roughly -997 end
Calculates the coefficient of determination for a simple linear model, which measures how well the predictor function (from linear-regression) matches the given actual function (the argument f).
PI = ~3.1415926535 fun f-good(x): 3 - x end fun f-poor(x): 3 * num-cos((x * PI) / 6) end fun f-bad(x): 3 end xs = [list: 0, 1, 2, 3] ys = [list: 3, 2, 1, 0] check: r-squared(xs, ys, f-good) is-roughly 1 r-squared(xs, ys, f-poor) is-roughly 0.87846096 r-squared(xs, ys, f-bad) is-roughly -1.8 end