Optimal Data Analysis: A Guidebook with Software for Windows, CD-Rom Edition by Paul R. Yarnold, Robert C. Soltyslk (American Psychological Association) Optimal Data Analysis: A Guidebook With Software for Windows offers the only statistical analysis paradigm that maximizes (weighted) predictive accuracy. This unique book fully explains this paradigm and includes simple-to-use software that empowers a universe of associated analyses. For any specific sample and exploratory or confirmatory hypothesis, optimal data analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability.
No other software can accomplish analyses possible via ODA. The accompanying software offers the following features, among others:
The book provides intuitive examples drawn from disciplines such as psychology, medicine, biology, political science, accounting, economics, education, statistics, finance, psychiatry, biochemistry, public health, geology, and sports. This theoretical approach and statistical software package will prove its usefulness and flexibility for researchers, practitioners and analysts in all quantitative fields, and will become the new standard for accuracy of models derived in statistical analysis.
DA—pronounced with a long "O" sound ("oh-dah")—is the short way of referring
to the "optimal data analysis" paradigm. This new statistical paradigm is simple to learn, the software is easy to operate, and the findings of ODA analyses have intuitive interpretations. Nonetheless, the paradigm and the software are powerful and rich. This book describes the ODA paradigm and software and demonstrates how to apply ODA in the analysis of data. Everything needed to understand ODA is contained within this book, and all ODA analyses within this book can be accomplished using the accompanying software. Only moments away from jumping directly into the fray, we pause briefly to address four questions frequently asked by beginner and expert alike as they ponder the merits of learning ODA.
ODA is a new statistical paradigm—a quantitative scientific revolution, so to speak. Perhaps the best way to illustrate what this means is by example.
The ordinary least squares (OLS) paradigm maximizes a variance ratio for a given sample, and includes analyses such as t test, correlation, multiple regression analysis, and multivariate analysis of variance. If one wishes to maximize a variance ratio, then the OLS paradigm is required, obviously, because that is what it does. That is, maximizing variance ratios is what the "formulas" that compute t, F, and r actually accomplish for a given sample.
In contrast, the maximum likelihood (ML) paradigm maximizes the value of the likelihood function for a given sample. This paradigm includes analyses such as chi-square, logistic regression analysis, log-linear analysis, and structural equation modeling. If one wishes to maximize the value of the likelihood function, then the ML paradigm is required.
In contrast, ODA maximizes the accuracy of a model. As a simple example, imagine we wished to assess whether two groups—Group A and Group B—of independent observations can be discriminated on the basis of their score on a test. ODA identifies the model that uses the test score in a manner such that it discriminates members of A versus B with theoretical maximum possible accuracy. To understand how this is accomplished, recall that the model—any model, every model—can actually physically be used to compute each observation's "score" via an equation or "formula." The resulting score is then considered with respect to the decision criteria of the particular procedure and a prediction is then made on the basis of the model. In the present example, in some instances the model (regardless of the methodology by which it was developed) will predict that an observation is from Group A. Other observations will be predicted to be from Group B. Every time the predicted group membership status of an observation is correct—the same as the actual group member-ship status, a point is scored. An incorrect prediction scores no points. Obviously, the largest number of points that it is possible to attain for a sample of N observations, in theory, is equal to N, the number of observations in Groups A and B that are classified by the model. Clearly, this maximum score is only possible if all observations are correctly predicted to be from A or B by the model. The minimum score possible is obviously zero points, in which case all observations are incorrectly predicted to be from A or B by the model.
By definition, an ODA model achieves maximum possible accuracy for a given sample of data, in the sense that no other model that is based on the test score can achieve a superior number of points. All possible alternative models are (explicitly or implicitly) evaluated to literally prove this, which is one reason why ODA is "computationally intensive." As OLS maximizes a variance ratio for a given sample of data, and as ML maximizes the value of the likelihood function for a given sample of data, ODA maximizes the accuracy of the model for a given sample of data. Of course, if different observations can be weighted by a different number of points, for example if a "natural" weighting metric, such as time, weight, or cost is available, then weighted accuracy may be maximized (or cost minimized), as may be desired by the operator.
Every type of analysis (i.e., every specific configuration of data, constraints, and hypotheses) that can be conducted in the OLS and ML paradigms can be conducted in the ODA paradigm. The ODA paradigm can conduct many analyses that one simply cannot do using either OLS or ML paradigms. ODA is much more general, and much more encompassing of different data, constraints, and hypothesis configurations, than are the alternative statistical paradigms. The ODA paradigm is, quite literally, "new and improved." Using this paradigm, and only using this paradigm, is one able to identify maximally accurate models for a given sample.
The ODA paradigm is vastly superior to alternative paradigms. Consider first, conceptual clarity. For every problem analyzed via ODA there is one precise, optimal analysis—a specific given data configuration and hypothesis dictates the exact nature of the ODA model that is appropriate. Using traditional statistics, in most applications several different analyses are feasible—all reflecting some degree of lack of fit between their required underlying distributional assumptions and the actual character of the data. Consider second, ease of interpretation. Every ODA analysis provides the same intuitive goodness-of-fit index: for every ODA analysis an index is computed on which 0 reflects the accuracy expected by chance for the sample, and 100 reflects perfect accuracy. Using traditional statistics, different analyses provide different goodness-of-fit indices that are non-intuitive and that are not directly comparable across procedures. Consider third, maximum accuracy. Every ODA analysis provides a model that guarantees maximum possible accuracy. Using traditional statistics, no analysis provides a model that explicitly guarantees maximum possible accuracy. Consider fourth, valid Type I error. No ODA analysis requires any simplifying assumptions, and p is always valid and accurate—a permutation probability derived via Fisher's randomization method, invariant over any monotonic (i.e., transformed values either always increase or always decrease) transformation of the data. Traditional analyses require simplifying assumptions (e.g., normality), p is only valid if the required assumptions are true for one's data, and p may be inconsistent over transformations of the data.
An obvious advantage of ODA software is availability (the first and currently the only software available that performs ODA). There are many good packages available for performing OLS and/or ML analysis, and many are an order of magnitude more expensive than the ODA book/software. Comparing software across paradigms, the ODA software is superior to software of earlier paradigms for two important reasons. Consider first, ease of learning and teaching. Everything needed to understand the ODA paradigm and analyze data is discussed in this book. Many courses, books, and articles are needed to understand traditional statistics and to correctly operate associated software, requiring years of study. Consider second, ease of use. Most types of ODA analyses require the same basic set of seven programming commands. Using traditional procedures requires learning of numerous—hundreds—system unique programming commands.
insert content here