Macro Library stats

A library of Stats functions. Version 1.30, Nov 20, 2021

nCr

nCr(n,r)
The Choose function

nPr

nPr(n,r)
The Permutations function

mean

mean(array,[weights])
Finds the mean of an array of numbers. Optionally you can provide a corresponding array of weights or frequencies to do a weighted mean.

variance

variance(array,[weights])
the (sample) variance of an array of numbers. Optionally you can provide a corresponding array of weights or frequencies to do a weighted variance.

stdev

stdev(array, [weights])
the (sample) standard deviation of an array of numbers. Optionally you can provide a corresponding array of weights or frequencies to do a weighted stdev.

absmeandev

absmeandev(array)
the absolute mean deviation of an array of numbers

percentile

percentile(array,percentile)
example: percentile($a,30) would find the 30th percentile of the data
Calculates using the p/100*(N) method (e.g. Triola)

interppercentile


interppercentile(array, percentile, [mode])
Interpolated percentile. Finds the percentile using an interpolated method.
  mode=1 (def): Matches Excel's PERCENTILE.EXC, JMP, and recommended by NIST
    except that this function will return the lowest/highest value if needed.
  mode=2: Matches Excel's PERCENTILE.INC (and older percentile)
  mode=3: Matches Mathlab's prctile function

Nplus1percentile

Nplus1percentile(array,percentile)
example: percentile($a,30) would find the 30th percentile of the data
Calculates using the p/100*(N+1) method (e.g. OpenStax).

quartile

quartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using percentiles.

TIquartile

TIquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the TI-84 method.

Excelquartile

Excelquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the Excel method, matching the older QUARTILE function or the newer QUARTILE.INC function.

Excelquartileexc

Excelquartileexc(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the Excel method, matching QUARTILE.EXC function.

Nplus1quartile

Nplus1quartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Calculates using the N+1 method, which is like
percentiles, but calculated using N+1 (OpenStax).

allquartile

allquartile(array,quartile)
finds the 0 (min), 1st, 2nd (median), 3rd, or 4th (max) quartile of an
array of numbers. Uses all the quartile methods, and returns an "or" joined
string of all unique answers.

median

median(array)
returns the median of an array of numbers

modes

modes(array)
Returns the mode or modes of the data array as a comma-separated list. If all the values have the same frequency it returns DNE.

forceonemode

forceonemode(array)
Returns the mode of the data array. If the data does not have one unique mode, the data array will be altered to only have one mode. Because this function can alter the given array, be sure to call it before any functions that use the array.

freqdist

freqdist(array,label,start,classwidth)
display macro. Returns an HTML table that is a frequency distribution of
the data
 array: array of data values
 label: name of data values
 start: first lower class limit
 classwidth: width of the classes

frequency

frequency(array,start,classwidth, [end])
Returns an array of frequencies for the data grouped into classes
 array: array of data values
 start: first lower class limit
 classwidth: width of the classes
 end: end of last class. optional, but recommended to ensure the resulting array includes the last class.

countif

countif(array,condition)
Returns count of items in array that meet condition
 array: array of data values
 condition: a condition, using x for data values
Example: countif($a,"x<3 && x>2")

histogram

histogram(array,label,start,classwidth,[labelstart,upper,width,height,showgrid,fill,stroke])
display macro. Creates a histogram from a data set
 array: array of data values
 label: name of data values
 start: first lower class limit
 classwidth: width of the classes
 labelstart (optional): value to start axis labeling at. Defaults to start
 upper (optional): first upper class limit. Defaults to start+classwidth
 width,height (optional): width and height in pixels of graph
 showgrid (optional): the horizontal grid lines; default is true to show; set false to hide
 fill (optional) = the fill color of the bins; default is blue
 stroke (optional) = the line color of the bins; default is black

fdhistogram

fdhistogram(freqarray,label,start,cw,[labelstart,upper,width,height,showgrid,fill,stroke])
display macro. Creates a histogram from frequency array
 freqarray: array of frequencies
 label: name of data values
 start: first lower class limit
 classwidth: width of the classes
 labelstart (optional): value to start axis labeling at. Defaults to start
 upper (optional): first upper class limit. Defaults to start+classwidth
 width,height (optional): width and height in pixels of graph
 showgrid (optional): the horizontal grid lines; default is true to show; set false to hide
 fill (optional) = the fill color of the bins; default is blue
 stroke (optional) = the line color of the bins; default is black

To fill in only some optional parameters, the 5th argument can instead be an associate array of values to fill in, like ['width'=>500, 'fill'=>'blue']. When used this way, 'vertlabel' can be used to change the vertical axis label.


fdbargraph

fdbargraph(barlabels,freqarray,label,[width,height,options])
barlabels: array of labels for the bars
freqarray: array of frequencies/heights for the bars
label: general label for bars
width,height (optional): width and height for graph
options (optional): array of options:
  options['valuelabels'] = array of value labels, to be placed above bars
  options['showgrid'] = false to hide the horizontal grid lines
  options['vertlabel'] = label for vertical axis. Defaults to none
  options['gap'] = gap (0 ≤ gap < 1) between bars
  options['toplabel'] = label for top of chart
  options['fill'] = fill color of the bars; default is blue
  options['stroke'] = line color of the bars; default is black

piechart

piechart(percents, labels, {width, height})
create a piechart
percents: array of pie percents (should total 100%)
labels: array of labels for each pie piece
uses Google Charts API

normrand

normrand(mu,sigma,n, [rnd, positive, skew])
returns an array of n random numbers that are normally distributed with given
mean mu and standard deviation sigma. Uses the Box-Muller transform.
specify rnd to round to that many digit
set positive to true to not include negative values
set skew to a nonzero value to skew the data

expdistrand

expdistrand(mu, n, [rnd])
returns an array of n random numbers that are exponentially distributed
with given mean mu.
specify rnd to round to that many digits (default 3)

stats_randg

stats_randg(shape, n)

returns an array of n random numbers from a gamma distribution

stats_randF

stats_randF(df1, df2, n)

returns an array of n random numbers from an F disstribution with df1 and df2 degrees of freedom

stats_randt

stats_randt(mu, sigma, df, n)

returns an array of n random numbers from a t-distribution with given mean mu, standard distribution sigma, and degrees of freedom df.

stats_randchi2

stats_randchi2(df, n)

returns an array of n random numbers from a chi squared distribution with df degrees of freedom

stats_randpoisson

stats_randpoisson(lambda, n)

returns an array of n random numbers from a Poisson distribution with parameter lambda.


boxplot

boxplot(array,axislabel,[options])
draws a boxplot based on the data in array, with given axislabel
and optionally a datalabel (to topleft of boxplot)
array also be an array of dataarrays to do comparative boxplots
opts is an array of options:
   "datalabels" = array of data labels for comparative boxplots
   "showvals" = true to show 5 number summary above boxplot
   "showoutliers" = true to put whiskers at values inside 1.5IQR fence and show outliers
   "qmethod" = quartile method: "N", "TI", "Excel" or "Nplus1"
       N: percentile method, using .25*n
       Nplus1: percentile method, using .25*(n+1)
       TI: TI calculator method, a mix of n and nplus1 methods
       Excel: A method based on (n-1), with some linear interpolation
For backwards compatability, options can also just be an array of datalabels

normalcdf

normalcdf(z,[dec])
calculates the area under the standard normal distribution to the left of the
z-value z, to dec decimals (defaults to 4)
based on someone else's code - can't remember whose!

tcdf

tcdf(t,df,[dec])
calculates the area under the t-distribution with "df" degrees of freedom
to the left of the t-value t
based on code from www.math.ucla.edu/~tom/distributions/tDist.html

invnormalcdf

invnormalcdf(p,[dec])
Inverse Normal CDF
finds the z-value with a left-tail area of p, to dec decimals (default 5)
 from Odeh & Evans. 1974. AS 70. Applied Statistics. 23: 96-97

invtcdf

invtcdf(p,df,[dec])
the inverse Student's t-distribution
computes the t-value with a left-tail probability of p, with df degrees of freedom
to dec decimal places (default 4)
 from Algorithm 396: Student's t-quantiles by G.W. Hill Comm. A.C.M., vol.13(10), 619-620, October 1970

linreg

linreg(xarray,yarray)
Computes the linear correlation coefficient, and slope and intercept of
regression line, based on array/list of x-values and array/list of y-values
Returns as array: r,slope,intercept

expreg

expreg(xarray,yarray)
Computes the exponential correlation coefficient, and base and intercept of
regression exponential, based on array/list of x-values and array/list of y-values
Returns as array: r,base,intercept

checklineagainstdata

checklineagainstdata(xarray, yarray, student answer, [variable, alpha])
intended for checking a student answer for fitting a line to data. Determines
if the student answer is within the confidence bounds for the regression equation.
xarray, yarray: list/array of data values
student answer: the $stuanswers[$thisq] which is a line equation like "2x+3"
variable: defaults to "x"
alpha: for confidence bound. defaults to .05
return array(answer, showanswer) to be used to set $answer and $showanswer

checkdrawnlineagainstdata

checkdrawnlineagainstdata(xarray, yarray, student answer, [grade dots, alpha, grid, snaptogrid])
intended for checking a student answer for drawing a line fit to data. Determines
if the student answer is within the confidence bounds for the regression equation.
xarray, yarray: list/array of data values
student answer from draw: the $stuanswers[$thisq]
grade dots: default false. If true, will grade that dots of xarray,yarray were plotted
alpha: for confidence bound. defaults to .05
grid: If you've modified the grid, include it here
snaptogrid: If you're using snaptogrid, include it here
return array(answer, showanswer) to be used to set $answer and $showanswer

binomialpdf

binomialpdf(N,p,x)
Computes the probability of x successes out of N trials
where each trial has probability p of success

binomialcdf

binomialcdf(N,p,x)
Computes the probably of <=x successes out of N trials
where each trial has probability p of success

chi2teststat

chi2teststat(m)
Computes the test stat sum((E-O)^2/E) given a matrix of values

chi2cdf

chi2cdf(x,df)
Computes the area to the left of x under the chi-squared distribution
with df degrees of freedom

invchi2cdf

invchi2cdf(p,df)
Computes the x value with left-tail probability p under the
chi-squared distribution with df degrees of freedom

fcdf

fcdf(f,df1,df2)
Returns the area to right of the F-value f for the f-distribution
with df1 and df2 degrees of freedom (technically it's 1-CDF)
Algorithm is accurate to approximately 4-5 decimals

invfcdf

invfcdf(p,df1,df2)
Computes the f-value with probability of p to the right
with degrees of freedom df1 and df2
Algorithm is accurate to approximately 2-4 decimal places
Less accurate for smaller p-values

gamma_cdf

gamma_cdf(x,shape,[scale,offset])
Calculated the gamma cdf

gamma_inv

gamma_inv(p,shape,[scale])
Calculates the inverse gamma cdf

beta_cdf

beta_cdf(x,alpha,beta)
Calculated the gamma cdf

beta_inv

beta_inv(p,alpha,beta)
Calculates the inverse gamma cdf

mosaicplot

mosaicplot(rowlabels, columnlabels, count matrix, [width, height])
creates a mosaic plot (See http://www.wamap.org/course/showlinkedtextpublic.php?cid=1383&id=82972)
rowlabels: an array of labels for the rows of the display
columnlabels: an array of labels for the columns of the display
count matrix: a 2-dimensional array. $m[1][5] will give the count for
  rowlabel[1] and columnlabel[5]
width and height are optional, default to 300 by 300. Does not include labels

csvdownloadlink

csvdownloadlink([filename],string,array,[string,array]...)

Creates a link that downloads the specified data in CSV format. For each column provide a string header and an array of values. A filename (without the .csv) can optionally be provided as a first argument.


dotplot

dotplot(array,label,[dotspacing,labelspacing,width,height])
Display macro. Creates a dotplot from a data set
array: array of data values
label: title of the dotplot that will be placed below the horizontal axis
dotspacing (default 1): horiz spacing of dots; data will be rounded to nearest value
labelspacing (defaults to dot spacing): spacing of axis labels
width,height (default 300x150): width and height in pixels of graph

 

anova1way(arr1, arr2, [arr3, ...])

Function anova1way() performs one-way analysis of variance (ANOVA) on two or more groups and returns the ANOVA table as an array with each row corresponding to Factor A, error (residual), and totals.

Parameters:

 Returns:

ANOVA table as an array in the following format. array([SS_A, df_A, MS_A, F_A, P_A], [SS_E, df_E, MS_E], [SS_T, df_T]) where SS is sum of the squares, df is the degree of freedom, MS is mean square, F is F ratio, and P is P value. And A, E, and T correspond to Factor A, error (residual), and total, respectively. This array can be used in anova_table() function to tabulate data for display.

 

anova1way_f(arr1, arr2, [arr3,...])

Function anova1way_f() performs one-way analysis of variance (ANOVA) on two or more groups and returns F ratio and the corresponding P value as an array.

Parameters:

 Returns:

F ratio and the corresponding P value as an array in the form [F ratio, P value].

 

anova2way(arr, [replication = false])

Function anova2way() performs two-way analysis of variance (ANOVA) and returns ANOVA table as an array with each row corresponding to Factor A, Factor B, their interaction (only with replication), error (residual), and totals.

Parameters:

 Returns:

ANOVA table as an array in the following format. This array can be used in anova_table() to tabulate data for display. [[SS_A, df_A, MS_A, F_A, P_A],[SS_B, df_B, MS_B, F_B, P_B],[SS_I, df_I, MS_I, F_I, P_I],[SS_E,df_E,MS_E],[SS_T,df_T]] where SS is sum of the squares, df is the degree of freedom, MS is mean square, F is F ratio, and P is P value. And A, B, I, E, and T correspond to Factor A, Factor B, their interaction (only with replication), error (residual), and total, respectively.

 

anova2way_f(arr, [replication = false])

Function anova2way_f() performs two-way analysis of variance (ANOVA) and returns F ratio and the corresponding P value for Factor A, Factor B and their interaction (if replication is true).

Parameters:

 Returns:

F ratio and the corresponding P value for Factor A, Factor B and their Interaction (if replication is true) as an array in the form array([F_A,P_A],[F_B,P_B],[F_I,P_I]).

 

anova_table(arr, [factor = 1, replication = false, roundto = 12, nameA = "Factor A", nameB = "Factor B"])

Function anova_table() returns ANOVA table for both oneway and twoway ANOVA - display only. The output of anova1way() and anova2way() can be used as the input array for this function.

Parameters:

 Returns:

ANOVA table for displaying data.

 

student_t(arr1, arr2, [equalVar = False, paired = False, roundto = 12])

Function student_t() computes t statistic and corresponding P-value for two-sample student t-test.

Parameters:

 Returns:

t statistic, corresponding P-value (area to the right of t-value - one-tail), and degree of freedom for two sample student t-test as an array in the form [t, P-value, df].