How to calculate statistics in a savvy manner 😉.
The Savvy Stats library is fully functional at this site and can be used for basic calculations. In order to do so, simply go to the Upload Files tab and choose a file or files to load into the application. Only CSV and TSV file formats are currently accepted.
Once data files have been uploaded (see above) into the application, head over to View Files and select the file you would like to analyze. Selection of a file will place its contents, as a string, into the following object property for useage:
fileData.rawDataBelow is an example of how the contents of fileData.rawData. All examples in this documentation will use this example data set for a reference. Currently, all JavaScript constructs must be accessed from the browser console (F12 for most browsers) after selecting a file.
fname, lname, age, sex
john, smith, 45, male
jane, doe, 23, female
bill, cartwell, 45, male
bob, lancaster, 25, male
barbara, silinder, 67, female
candice, treller, 45, female
amy, nolander, 14, female
phil, lassed, 34, male
The savvy stats object can be used to both parse CSV or TSV data files and analyze that file data. In order to parse a file, simply pass the file and data delimiter to the object. The delimiter is assumed to be "," if no delimiter is passed.
ss(file[, delimiter])
e.g.
ss(example_file.csv, ",")
The ss object has many methods attached to its prototype that then allow analysis of the file data. Below is an example using the mean function to calculate a mean age:
ss(example_file.csv, ",").mean("age")
OR
file = ss(example_file.csv, ",");
file.mean("age")
The ss object can also analyze data not contained within a file. Simply call the ss object with the desired function and pass the data directly to the function. Example using the binomdist method below:
ss.binomdist(3, 100, 0.3)
The savvy stats object (ss) expects data to be in a CSV or TSV formatted string as shown below. Data passed to the ss object will then be parsed for all statistical methods that can be called on the object.
fname, lname, age, sex
john, smith, 45, male
jane, doe, 23, female
fname lname age sex
john smith 45 male
jane doe 23 female
Almost all methods in the library that act on a data file can use a callback to filter the data in order to perform the desired calculation on a subset of the data. All callbacks expect an object as an argument, which is used to indicate the column to be used in the filter. Callbacks are then expected to return true or false depending on whether the criteria for columns' filters are met. Note that equality operators (==) should be used instead of identity operators (===) for callbacks unless the data type of a column is known with absolute certainty.
Filters the data based on the column "fname" having the value "john".
function(data) {
return data.fname == "john";
}
Filters the data based on the column "fname" having the value "jane" and "age" being greater than 20.
function(data) {
return data.fname == "jane" && data.age > 20;
}
Live example using the mean function to calculate the mean age of those having a last name of "smith"
ss(fileData.rawData).mean("age", function(data) {
return data.lname == "smith";
});
ss(file).min(data_column, filter_callback(data))
Calculates the minimum of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a min of the resulting subset. For example, a callback could be used to calculate a minimum age of people with the last name "Smith".
Calculates the minimum age of the data set.
ss(fileData.rawData).min("age")
Calculates the minimum age of females in the data set.
ss(fileData.rawData).min("age", function(data) {
return data.sex == "female";
})
ss(file).max(data_column, filter_callback(data))
Calculates the maximum of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a max of the resulting subset. For example, a callback could be used to calculate a maximum age of people with the last name "Smith".
Calculates the maximum age of the data set.
ss(fileData.rawData).max("age")
Calculates the maximum age of females in the data set.
ss(fileData.rawData).max("age", function(data) {
return data.sex == "female";
})
ss(file).range(data_column, filter_callback(data))
Calculates the range of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a range of the resulting subset. For example, a callback could be used to calculate the range of ages for people with the last name "Smith".
Calculates the range of ages of the data set.
ss(fileData.rawData).range("age")
Calculates the range of ages for females in the data set.
ss(fileData.rawData).range("age", function(data) {
return data.sex == "female";
})
ss(file).mean(data_column, filter_callback(data))
Calculates the mean of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a mean of the resulting subset. For example, a callback could be used to calculate a mean age of people with the last name "Smith".
Calculates the mean age of the data set.
ss(fileData.rawData).mean("age")
Calculates the mean age of females in the data set.
ss(fileData.rawData).mean("age", function(data) {
return data.sex == "female";
})
ss(file).geomean(data_column, filter_callback(data))
Calculates the geometric mean of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a geometric mean of the resulting subset. For example, a callback could be used to calculate a geometric mean age of people with the last name "Smith".
Calculates the geometric mean age of the data set.
ss(fileData.rawData).geomean("age")
Calculates the geometric mean age of females in the data set.
ss(fileData.rawData).geomean("age", function(data) {
return data.sex == "female";
})
ss(file).variance(data_column, filter_callback(data))
Calculates the variance of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a variance of the resulting subset. For example, a callback could be used to calculate a variance age of people with the last name "Smith".
Calculates the variance in the age of the data set.
ss(fileData.rawData).variance("age")
Calculates the variance in the age of females in the data set.
ss(fileData.rawData).variance("age", function(data) {
return data.sex == "female";
})
ss(file).stdev(data_column, filter_callback(data))
Calculates the standard deviation of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a standard deviation of the resulting subset. For example, a callback could be used to calculate a standard deviation age of people with the last name "Smith".
Calculates the standard deviation of the age in the data set.
ss(fileData.rawData).stdev("age")
Calculates the standard deviation for the age of females in the data set.
ss(fileData.rawData).stdev("age", function(data) {
return data.sex == "female";
})
ss(file).median(data_column, filter_callback(data))
Calculates the median of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a median of the resulting subset. For example, a callback could be used to calculate a median age of people with the last name "Smith".
Calculates the median age of the data set.
ss(fileData.rawData).median("age")
Calculates the median age of males in the data set.
ss(fileData.rawData).median("age", function(data) {
return data.sex == "male";
})
ss(file).quartile(type, data_column, filter_callback(data))
Calculates an exclusive, non-weighted quartile of a data column in a CSV or TSV data file passed into the ss object. The type of quartile is allowed to be 1, 2, 3, or 4 and represents the 1st, 2nd, 3rd, and 4th quartile, respectively (calculated as the 25th, 50th, 75th, and 100th percentile). A callback function can be used to filter the data in the chosen column and calculate a quartile of the resulting subset. For example, a callback could be used to calculate a quartile for the age of people with the last name "Smith".
Calculates the first quartile for the age of the data set.
ss(fileData.rawData).quartile(1, "age")
Calculates the third quartile for the age of males in the data set.
ss(fileData.rawData).quartile(3, "age", function(data) {
return data.sex == "male";
})
ss(file).percentile(k, data_column, filter_callback(data))
Calculates an exclusive, non-weighted percentile of a data column in a CSV or TSV data file passed into the ss object. The kth percentile is allowed to be any whole number between 0 and 100 including the bounds. The 0th percentile is considered to be the data set minimum and the 100th percentile is considered to be the data maximum. A callback function can be used to filter the data in the chosen column and calculate a percentile of the resulting subset. For example, a callback could be used to calculate a percentile for the age of people with the last name "Smith".
Calculates the tenth percentile for the age of the data set.
ss.percentile(10, "age")
Calculates the ninetieth percentile for the age of males in the data set.
ss.percentile(90, fileData.json, "age", function(data) {
return data.sex == "male";
})
ss(file).mode(data_column, filter_callback(data))
Calculates the mode of a data column in a CSV or TSV data file passed into the ss object. A callback function can be used to filter the data in the chosen column and calculate a mode of the resulting subset. For example, a callback could be used to calculate a mode age of people with the last name "Smith". If a mode for the data column does not exist, this method will return "DNE". Otherwise, this method will return an array of all values occurring equally at the highest rate.
Calculates the mode age of the data set.
ss(fileData.rawData).mode("age")
Calculates the mode age of females in the data set.
ss(fileData.rawData).mode("age", function(data) {
return data.sex == "male";
})
ss(file).logtrans(data_column)
Log transforms a data column in a CSV or TSV data file passed into the ss object by taking the log base e of each value in the data column. Returns the transformed data column and the remaining data in a ss object for use in other calculations (allows stringing methods as showing in the last example).
Log transforms each age in the "age" column of the data set.
ss(fileData.rawData).logtrans("age")
Using the returned log transformed data in a standard deviation calculation.
ss(fileData.rawData).logtrans("age").stdev("age");
ss.permutations(n, k)
Calculates the number of permutations of k items from a set of n.
Calculates the number of permutations of 5 items from a set of 10.
ss.permutations(10, 5)
ss.combinations(n, k)
Calculates the number of combinations of k items from a set of n. In other words, calculates n choose k.
Calculates the number of combinations of 6 items from a set of 15 (15 choose 6).
ss.combinations(15, 6)
ss.binomdist(num_successes, num_trials, prob_success [, cumulative = false])
Calculates the probability of achieving the number of successes (num_successes) in the given number of trials (num_trials) based on the provided probability for each success (prob_success), assuming a binomial distibution. Takes an optional argument cumulative, which is set to true or false and is by default set to false. If cumulative is set to true, the function will return the cumulative probability of the indicated number of successes (num_successes).
Calculates the probability of 5 successes in 10 trials with a probability of success of 0.5.
ss.binomdist(5, 10, 0.5)
Calculates a cumulative probability of 3 successes in 12 trials with a probability of success of 0.3
ss.binomdist(3, 12, 0.3, true);
ss.poissondist(num_successes, avg_successes [, cumulative = false])
Calculates the probability of achieving the number of successes (num_successes) in a given time frame based on an average number of successes in the time frame (avg_success), assuming a poisson distibution. Takes an optional argument cumulative, which is set to true or false and is by default set to false. If cumulative is set to true, the function will return the cumulative probability of the indicated number of successes (num_successes).
Calculates the probability of 5 successes with an average number of successes of 2 in the assumed time frame.
ss.poissondist(5, 2)
Calculates a cumulative probability of 12 successes with an average number of successes of 4 in the assumed time frame.
ss.poissondist(12, 4, true);
ss.normdist(x, mean, stdev [, cumulative = true])
Calculates the cumulative probability of a value x or P(X < x) in normal distribution given a mean and standard deviation that define the distribution. The parameter cumulative is optional and can be set to false in order to calculate the height of the normal distribution curve at the value x (returns the value of the probability mass function).
Calculates the cumulative probability of 5 in a normal distribution defined by a mean of 10 and a standard deviation of 2.
ss.normdist(5, 10, 2)
Calculates the height at 4 of the normal distribution curve defined by a mean of 23 and a standard deviation of 7.
ss.normdist(4, 23, 7, false);
ss.norminv(prob, mean, stdev)
Calculates the value x of a normal distribution that will result in a desired cumulative probability (prob) based on a normal distribution defined by a mean and a standard deviation.
Calculates the value x that will result in a cumulative probability of 0.3 in a normal distribution with a mean of 10 and standard deviation of 2.
ss.norminv(0.3, 10, 2)
ss.normbetween(prob, mean, stdev)
Calculates the value x of a normal distribution for a given probability (prob) such that P(-x < X < x) where X is normally distributed with a mean μ and a standard deviation σ.
Calculates the value x that will result in a cumulative probability of 0.3 such that P(-x < X < x) for a normal distribution with a mean of 10 and standard deviation of 2.
ss.normbetween(0.3, 10, 2)