Friday, July 9, 2010

Quantitative Analytical Techniques

Friday, July 9, 2010
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of masses of numerical data. When a measurement is calculated for an entire population, say the average age, it’s called a parameter. When we look across a sample and calculate a measurement, also the average age, we call it a statistic. Since people make entire careers out of the study of statistics, the point of this post is to present a birds-eye overview and brief description of common terms you’ll hear in conversations about quantitative analysis.

When discussing statistics, researchers usually talk about the data in terms of “variables.” A variable is a characteristic that may assume more than one set of values (age, income, birth place can all have more than one value). A variable can either be nominal, ordinal, interval, or ratio in its scale. Nominal variables are also referred to as categorical variables because they represent categories of responses. The color of a car would be represented by a categorical value (for example, black, red, or silver). Categorical variables have no set order, meaning that a black car is not necessarily any better than a silver car.

The level of satisfaction with one’s car on a 1 to 10 scale is an example an ordinal variable (where a 10 is a better score than a 5 and a 5 is better than a 1). Ordinal variables have a clear, set order, but they still represent categories of responses. Interval and ratio variables are numerical variables whose numbers have direct meaning. The age of a car would be ratio variable because it can be measured precisely and at equal intervals (in hours, years, or decades).

Variables can also be discrete or continuous. Continuous variables, such as time, have an infinite number of possible values, while discrete variables, such as a satisfaction scale, have a finite (in this case 10) number of possible values.

Descriptive statistics are simple portrayals of what the variables show. They are summaries of the frequency of the different values (like percentages); the central tendency (mean, median or mode); and the dispersion (like the range and the standard deviation). Cross tabs (short for tabulations) are popular for displaying the joint distribution of two or more variables. They are usually presented in a matrix called a contingency table. In a cross tab table, each cell gives the number of respondents that gave a particular combination of responses.

Measures of association summarize the relationship between two variables (correlation and regression, for instance). Two variables are associated when information about one can help us predict information about the other. A variety of techniques to measure association are available, each better suited to different classes of variables. When analyzing data, most statisticians use multivariate analysis where the effects of many variables are considered.

Tests of statistical significance are used to determine how sure we can feel about the associations found in the data -- Could it just be chance? Can we infer that the result can be generalized to the study population? Confidence intervals, chi square tests and t-tests are the most common statistics used to indicate the probability of saying that there is a difference between two groups when actually there is none (level of significance).

Measures of association can be used in very sophisticated ways. Conjoint analysis can be used to determine trade-offs customers are willing to make among product or service attributes. In addition to understanding current preferences, this technique allows modeling of the impact of the introduction of new factors on preferences.

Discrete choice analysis models selection of a product or concept with many attributes from a set of products or concepts. In essence, it models how people make decisions in the real world. For example, one could test products with varying combinations of features to assess which consumers prefer. As with conjoint analysis, discrete choice analysis allows modeling of the impact of the introduction of a new product or concept on factors such as market share.

Cluster analysis identifies population segments using groups of variables. This provides information to better understand and communicate with customers, or help you understand your place in the market place. In general, whenever one needs to classify a mass of information into manageable and meaningful results, cluster analysis is a technique of great usefulness.

Discriminant analysis is used to define which variables best differentiate between predefined groups. The key difference is that discriminant analysis relies on previously defined groups whereas cluster analysis uses the data to discover these groups.

Factor analysis finds the underlying construct behind answers to a series of questions. In other words, factor analysis is designed to classify variables. For clients, it simplifies the interpretation of answers to many questions to a few “factors” that seem to drive answers to all questions. It can be used to determine the key factors that drive aspects like satisfaction, image or customer retention. In addition, factor analysis is used when designing surveys. Often complex concepts (like “leadership”) need to be turned into a group of concrete questions in order to query meaningfully.

Regression Analysis (linear, non-linear and logistic) is widely used for forecasting. It compares the effects of one or more variables on another. The objective of regression analysis is to understand the relationship between several independent or predictor variables on a dependent or criterion variable. This allows forecasting or estimation of the change in a dependent variable based on the change in an independent variable.

***


0 comments: