Documentation

How to calculate statistics in a savvy manner 😉.

Using the Website

Uploading Data

The Savvy Stats library is fully functional at this site and can be used for basic calculations. In order to do so, simply go to the Upload Files tab and choose a file or files to load into the application. Only CSV and TSV file formats are currently accepted.

Loading file data for useage

Once data files have been uploaded (see above) into the application, head over to View Files and select the file you would like to analyze. Selection of a file will place its contents, as a string, into the following object property for useage:

fileData.rawData
Below is an example of how the contents of fileData.rawData. All examples in this documentation will use this example data set for a reference. Currently, all JavaScript constructs must be accessed from the browser console (F12 for most browsers) after selecting a file.

CSV File Example (fileData.rawData)

fname, lname, age, sex
john, smith, 45, male
jane, doe, 23, female
bill, cartwell, 45, male
bob, lancaster, 25, male
barbara, silinder, 67, female
candice, treller, 45, female
amy, nolander, 14, female
phil, lassed, 34, male

Basic Library Useage

Using the Savvy Stats Object

Parsing and Analyzing Files

The savvy stats object can be used to both parse CSV or TSV data files and analyze that file data. In order to parse a file, simply pass the file and data delimiter to the object. The delimiter is assumed to be "," if no delimiter is passed.

ss(file[, delimiter])

e.g.

ss(example_file.csv, ",")

The ss object has many methods attached to its prototype that then allow analysis of the file data. Below is an example using the mean function to calculate a mean age:

ss(example_file.csv, ",").mean("age")

OR

file = ss(example_file.csv, ",");
file.mean("age")

Analyzing Data without a File

The ss object can also analyze data not contained within a file. Simply call the ss object with the desired function and pass the data directly to the function. Example using the binomdist method below:

ss.binomdist(3, 100, 0.3)

Data Format Expectation of the Savvy Stats Object (ss)

The savvy stats object (ss) expects data to be in a CSV or TSV formatted string as shown below. Data passed to the ss object will then be parsed for all statistical methods that can be called on the object.

CSV File Example

fname, lname, age, sex
john, smith, 45, male
jane, doe, 23, female

TSV File Example

fname	lname	age	sex
john smith 45 male
jane doe 23 female

Useage

Almost all methods in the library that act on a data file can use a callback to filter the data in order to perform the desired calculation on a subset of the data. All callbacks expect an object as an argument, which is used to indicate the column to be used in the filter. Callbacks are then expected to return true or false depending on whether the criteria for columns' filters are met. Note that equality operators (==) should be used instead of identity operators (===) for callbacks unless the data type of a column is known with absolute certainty.

Examples

Filters the data based on the column "fname" having the value "john".

function(data) {
return data.fname == "john";
}

Filters the data based on the column "fname" having the value "jane" and "age" being greater than 20.

function(data) {
return data.fname == "jane" && data.age > 20;
}

Live example using the mean function to calculate the mean age of those having a last name of "smith"

ss(fileData.rawData).mean("age", function(data) {
return data.lname == "smith";
});

Descriptive Statistics

Arguments

ss(file).min(data_column, filter_callback(data))

Useage

Calculates the minimum of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a min of the resulting subset. For example, a callback could be used to calculate a minimum age of people with the last name "Smith".

Examples

Calculates the minimum age of the data set.

ss(fileData.rawData).min("age")

Calculates the minimum age of females in the data set.

ss(fileData.rawData).min("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).max(data_column, filter_callback(data))

Useage

Calculates the maximum of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a max of the resulting subset. For example, a callback could be used to calculate a maximum age of people with the last name "Smith".

Examples

Calculates the maximum age of the data set.

ss(fileData.rawData).max("age")

Calculates the maximum age of females in the data set.

ss(fileData.rawData).max("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).range(data_column, filter_callback(data))

Useage

Calculates the range of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a range of the resulting subset. For example, a callback could be used to calculate the range of ages for people with the last name "Smith".

Examples

Calculates the range of ages of the data set.

ss(fileData.rawData).range("age")

Calculates the range of ages for females in the data set.

ss(fileData.rawData).range("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).mean(data_column, filter_callback(data))

Useage

Calculates the mean of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a mean of the resulting subset. For example, a callback could be used to calculate a mean age of people with the last name "Smith".

Examples

Calculates the mean age of the data set.

ss(fileData.rawData).mean("age")

Calculates the mean age of females in the data set.

ss(fileData.rawData).mean("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).geomean(data_column, filter_callback(data))

Useage

Calculates the geometric mean of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a geometric mean of the resulting subset. For example, a callback could be used to calculate a geometric mean age of people with the last name "Smith".

Examples

Calculates the geometric mean age of the data set.

ss(fileData.rawData).geomean("age")

Calculates the geometric mean age of females in the data set.

ss(fileData.rawData).geomean("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).variance(data_column, filter_callback(data))

Useage

Calculates the variance of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a variance of the resulting subset. For example, a callback could be used to calculate a variance age of people with the last name "Smith".

Examples

Calculates the variance in the age of the data set.

ss(fileData.rawData).variance("age")

Calculates the variance in the age of females in the data set.

ss(fileData.rawData).variance("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).stdev(data_column, filter_callback(data))

Useage

Calculates the standard deviation of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a standard deviation of the resulting subset. For example, a callback could be used to calculate a standard deviation age of people with the last name "Smith".

Examples

Calculates the standard deviation of the age in the data set.

ss(fileData.rawData).stdev("age")

Calculates the standard deviation for the age of females in the data set.

ss(fileData.rawData).stdev("age", function(data) {
return data.sex == "female";
})

Arguments

ss(file).median(data_column, filter_callback(data))

Useage

Calculates the median of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a median of the resulting subset. For example, a callback could be used to calculate a median age of people with the last name "Smith".

Examples

Calculates the median age of the data set.

ss(fileData.rawData).median("age")

Calculates the median age of males in the data set.

ss(fileData.rawData).median("age", function(data) {
return data.sex == "male";
})

Arguments

ss(file).quartile(type, data_column, filter_callback(data))

Useage

Calculates an exclusive, non-weighted quartile of a data column in a CSV or TSV data file passed into the ss object. The type of quartile is allowed to be 1, 2, 3, or 4 and represents the 1st, 2nd, 3rd, and 4th quartile, respectively (calculated as the 25th, 50th, 75th, and 100th percentile). A callback function can be used to filter the data in the chosen column and calculate a quartile of the resulting subset. For example, a callback could be used to calculate a quartile for the age of people with the last name "Smith".

Examples

Calculates the first quartile for the age of the data set.

ss(fileData.rawData).quartile(1, "age")

Calculates the third quartile for the age of males in the data set.

ss(fileData.rawData).quartile(3, "age", function(data) {
return data.sex == "male";
})

Arguments

ss(file).percentile(k, data_column, filter_callback(data))

Useage

Calculates an exclusive, non-weighted percentile of a data column in a CSV or TSV data file passed into the ss object. The kth percentile is allowed to be any whole number between 0 and 100 including the bounds. The 0th percentile is considered to be the data set minimum and the 100th percentile is considered to be the data maximum. A callback function can be used to filter the data in the chosen column and calculate a percentile of the resulting subset. For example, a callback could be used to calculate a percentile for the age of people with the last name "Smith".

Examples

Calculates the tenth percentile for the age of the data set.

ss.percentile(10, "age")

Calculates the ninetieth percentile for the age of males in the data set.

ss.percentile(90, fileData.json, "age", function(data) {
return data.sex == "male";
})

Arguments

ss(file).mode(data_column, filter_callback(data))

Useage

Calculates the mode of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a mode of the resulting subset. For example, a callback could be used to calculate a mode age of people with the last name "Smith". If a mode for the data column does not exist, this method will return "DNE". Otherwise, this method will return an array of all values occurring equally at the highest rate.

Examples

Calculates the mode age of the data set.

ss(fileData.rawData).mode("age")

Calculates the mode age of females in the data set.

ss(fileData.rawData).mode("age", function(data) {
return data.sex == "male";
})

Data Transformations

Arguments

ss(file).logtrans(data_column)

Useage

Log transforms a data column in a CSV or TSV data file passed into the ss object by taking the log base e of each value in the data column. Returns the transformed data column and the remaining data in a ss object for use in other calculations (allows stringing methods as showing in the last example).

Examples

Log transforms each age in the "age" column of the data set.

ss(fileData.rawData).logtrans("age")

Using the returned log transformed data in a standard deviation calculation.

ss(fileData.rawData).logtrans("age").stdev("age");

Probabilities and Distributions

Arguments

ss.permutations(n, k)

Useage

Calculates the number of permutations of k items from a set of n.

Examples

Calculates the number of permutations of 5 items from a set of 10.

ss.permutations(10, 5)

Arguments

ss.combinations(n, k)

Useage

Calculates the number of combinations of k items from a set of n. In other words, calculates n choose k.

Examples

Calculates the number of combinations of 6 items from a set of 15 (15 choose 6).

ss.combinations(15, 6)

Arguments

ss.binomdist(num_successes, num_trials, prob_success [, cumulative = false])

Useage

Calculates the probability of achieving the number of successes (num_successes) in the given number of trials (num_trials) based on the provided probability for each success (prob_success), assuming a binomial distibution. Takes an optional argument cumulative, which is set to true or false and is by default set to false. If cumulative is set to true, the function will return the cumulative probability of the indicated number of successes (num_successes).

Examples

Calculates the probability of 5 successes in 10 trials with a probability of success of 0.5.

ss.binomdist(5, 10, 0.5)

Calculates a cumulative probability of 3 successes in 12 trials with a probability of success of 0.3

ss.binomdist(3, 12, 0.3, true);

Arguments

ss.poissondist(num_successes, avg_successes [, cumulative = false])

Useage

Calculates the probability of achieving the number of successes (num_successes) in a given time frame based on an average number of successes in the time frame (avg_success), assuming a poisson distibution. Takes an optional argument cumulative, which is set to true or false and is by default set to false. If cumulative is set to true, the function will return the cumulative probability of the indicated number of successes (num_successes).

Examples

Calculates the probability of 5 successes with an average number of successes of 2 in the assumed time frame.

ss.poissondist(5, 2)

Calculates a cumulative probability of 12 successes with an average number of successes of 4 in the assumed time frame.

ss.poissondist(12, 4, true);

Arguments

ss.normdist(x, mean, stdev [, cumulative = true])

Useage

Calculates the cumulative probability of a value x or P(X < x) in normal distribution given a mean and standard deviation that define the distribution. The parameter cumulative is optional and can be set to false in order to calculate the height of the normal distribution curve at the value x (returns the value of the probability mass function).

Examples

Calculates the cumulative probability of 5 in a normal distribution defined by a mean of 10 and a standard deviation of 2.

ss.normdist(5, 10, 2)

Calculates the height at 4 of the normal distribution curve defined by a mean of 23 and a standard deviation of 7.

ss.normdist(4, 23, 7, false);

Arguments

ss.norminv(prob, mean, stdev)

Useage

Calculates the value x of a normal distribution that will result in a desired cumulative probability (prob) based on a normal distribution defined by a mean and a standard deviation.

Examples

Calculates the value x that will result in a cumulative probability of 0.3 in a normal distribution with a mean of 10 and standard deviation of 2.

ss.norminv(0.3, 10, 2)

Arguments

ss.normbetween(prob, mean, stdev)

Useage

Calculates the value x of a normal distribution for a given probability (prob) such that P(-x < X < x) where X is normally distributed with a mean μ and a standard deviation σ.

Examples

Calculates the value x that will result in a cumulative probability of 0.3 such that P(-x < X < x) for a normal distribution with a mean of 10 and standard deviation of 2.

ss.normbetween(0.3, 10, 2)