Analysis of variance. Course work: Variance analysis Multivariate analysis of variance

Variance analysis is a set of statistical methods designed to test hypotheses about the relationship between certain characteristics and studied factors that do not have a quantitative description, as well as to establish the degree of influence of factors and their interaction. In the specialized literature it is often called ANOVA (from the English name Analysis of Variations). This method was first developed by R. Fischer in 1925.

Types and criteria of analysis of variance

This method is used to study the relationship between qualitative (nominal) characteristics and a quantitative (continuous) variable. In essence, it tests the hypothesis about the equality of the arithmetic means of several samples. Thus, it can be considered as a parametric criterion for comparing the centers of several samples at once. If this method is used for two samples, the results of the analysis of variance will be identical to the results of the Student's t-test. However, unlike other criteria, this study allows us to study the problem in more detail.

Dispersion analysis in statistics is based on the law: the sum of squared deviations of the combined sample is equal to the sum of squared intragroup deviations and the sum of squared intergroup deviations. The study uses Fisher's test to establish the significance of the difference between intergroup variances and within-group variances. However, the necessary prerequisites for this are normality of distribution and homoscedasticity (equality of variances) of samples. There are univariate (one-factor) analysis of variance and multivariate (multifactorial). The first considers the dependence of the value under study on one characteristic, the second - on many at once, and also allows us to identify the connection between them.

Factors

Factors are controlled circumstances that influence the final result. Its level or processing method is a value that characterizes a specific manifestation of this condition. These numbers are usually presented on a nominal or ordinal measurement scale. Often output values ​​are measured on quantitative or ordinal scales. Then the problem arises of grouping output data in a number of observations that correspond to approximately the same numerical values. If the number of groups is taken to be excessively large, then the number of observations in them may be insufficient to obtain reliable results. If you take the number too small, this can lead to the loss of significant features of the influence on the system. The specific way to group data depends on the amount and nature of variation in values. The number and size of intervals in univariate analysis are most often determined by the principle of equal intervals or the principle of equal frequencies.

Analysis of variance problems

So, there are cases when you need to compare two or more samples. It is then that it is advisable to use analysis of variance. The name of the method indicates that conclusions are drawn based on the study of variance components. The essence of the study is that the overall change in the indicator is divided into component parts that correspond to the action of each individual factor. Let's consider a number of problems that are solved by typical analysis of variance.

Example 1

The workshop has a number of automatic machines that produce a specific part. The size of each part is a random variable that depends on the setup of each machine and the random deviations that occur during the manufacturing process of the parts. It is necessary to determine, based on the measurement data of the dimensions of the parts, whether the machines are configured in the same way.

Example 2

During the manufacture of an electrical device, various types of insulating paper are used: capacitor, electrical, etc. The device can be impregnated with various substances: epoxy resin, varnish, ML-2 resin, etc. Leaks can be eliminated under vacuum at elevated pressure, with heating. Impregnation can be done by immersion in varnish, under a continuous stream of varnish, etc. The electrical apparatus as a whole is filled with a certain compound, of which there are several options. Quality indicators are the electrical strength of insulation, the overheating temperature of the winding in operating mode, and a number of others. During development of the technological process of manufacturing devices, it is necessary to determine how each of the listed factors affects the performance of the device.

Example 3

The trolleybus depot serves several trolleybus routes. They operate trolleybuses of various types, and 125 inspectors collect fares. The depot management is interested in the question: how to compare the economic indicators of the work of each controller (revenue) taking into account different routes and different types of trolleybuses? How to determine the economic feasibility of producing trolleybuses of a certain type on a particular route? How to establish reasonable requirements for the amount of revenue that a conductor brings in on each route in various types of trolleybuses?

The task of choosing a method is how to obtain maximum information regarding the influence of each factor on the final result, determine the numerical characteristics of such an influence, their reliability at minimal cost and in the shortest possible time. Methods of variance analysis allow solving such problems.

Univariate analysis

The purpose of the study is to assess the magnitude of the influence of a particular case on the analyzed review. Another purpose of univariate analysis may be to compare two or more circumstances with each other to determine the difference in their impact on recall. If the null hypothesis is rejected, then the next step is to quantify and construct confidence intervals for the obtained characteristics. In the case where the null hypothesis cannot be rejected, it is usually accepted and a conclusion is drawn about the nature of the influence.

One-way analysis of variance can become a nonparametric analogue of the Kruskal-Wallis rank method. It was developed by the American mathematician William Kruskal and economist Wilson Wallis in 1952. This criterion is designed to test the null hypothesis of the equality of effects on the studied samples with unknown but equal average values. In this case, the number of samples must be more than two.

The Jonckheere-Terpstra criterion was proposed independently by the Dutch mathematician T. J. Terpstra in 1952 and the British psychologist E. R. Jonckheere in 1954. It is used when it is known in advance that the existing groups of results are ordered by the growth of the influence of the factor under study, which is measured on an ordinal scale.

M - Bartlett's test, proposed by the British statistician Maurice Stevenson Bartlett in 1937, is used to test the null hypothesis about the equality of variances of several normal populations from which the samples under study are taken, generally having different sizes (the number of each sample must be at least four ).

G - Cochran's test, which was discovered by the American William Gemmell Cochran in 1941. It is used to test the null hypothesis about the equality of variances of normal populations in independent samples of equal size.

The nonparametric Levene test, proposed by the American mathematician Howard Levene in 1960, is an alternative to the Bartlett test in conditions where there is no confidence that the samples under study are subject to a normal distribution.

In 1974, American statisticians Morton B. Brown and Alan B. Forsythe proposed a test (Brown-Forsyth test) that is slightly different from Levene's test.

Two-factor analysis

Two-way analysis of variance is used for related normally distributed samples. In practice, complex tables of this method are often used, in particular those in which each cell contains a set of data (repeated measurements) corresponding to fixed level values. If the assumptions required to apply two-way analysis of variance are not met, then use the nonparametric Friedman rank test (Friedman, Kendall and Smith), developed by the American economist Milton Friedman in late 1930. This test does not depend on the type of distribution.

It is only assumed that the distribution of values ​​is identical and continuous, and that they themselves are independent of each other. When testing the null hypothesis, the output data is presented in the form of a rectangular matrix, in which the rows correspond to the levels of factor B, and the columns correspond to levels of A. Each cell of the table (block) can be the result of measurements of parameters on one object or on a group of objects with constant values ​​of the levels of both factors . In this case, the corresponding data are presented as the average values ​​of a certain parameter for all dimensions or objects of the sample under study. To apply the output criterion, it is necessary to move from the direct results of measurements to their rank. Ranking is carried out for each row separately, that is, the values ​​are ordered for each fixed value.

Page's test (L-test), proposed by American statistician E. B. Page in 1963, is designed to test the null hypothesis. For large samples, Page's approximation is used. They, subject to the reality of the corresponding null hypotheses, obey the standard normal distribution. In the case where the rows of the source table have the same values, it is necessary to use average ranks. In this case, the accuracy of the conclusions will be worse, the greater the number of such matches.

Q - Cochran's criterion, proposed by W. Cochran in 1937. It is used in cases where groups of homogeneous subjects are exposed to influences, the number of which exceeds two and for which two options for feedback are possible - conditionally negative (0) and conditionally positive (1) . The null hypothesis consists of equality of treatment effects. Two-way analysis of variance makes it possible to determine the existence of treatment effects, but does not make it possible to determine for which specific columns this effect exists. To solve this problem, the method of multiple Scheffe equations for related samples is used.

Multivariate analysis

The problem of multivariate analysis of variance arises when you need to determine the effect of two or more conditions on a certain random variable. The study involves the presence of one dependent random variable, measured on a difference or ratio scale, and several independent variables, each of which is expressed on a naming or rank scale. Variance analysis of data is a fairly developed section of mathematical statistics, which has a lot of options. The research concept is common for both single-factor and multifactor. Its essence lies in the fact that the total variance is divided into components, which corresponds to a certain grouping of data. Each data grouping has its own model. Here we will consider only the basic provisions necessary for understanding and practical use of its most used options.

Variance analysis of factors requires a fairly careful attitude to the collection and presentation of input data, and especially to the interpretation of the results. Unlike a one-factor test, the results of which can be conditionally placed in a certain sequence, the results of a two-factor test require a more complex presentation. The situation becomes even more complicated when there are three, four or more circumstances. Because of this, it is quite rare to include more than three (four) conditions in a model. An example would be the occurrence of resonance at a certain value of capacitance and inductance of an electric circle; the manifestation of a chemical reaction with a certain set of elements from which the system is built; the occurrence of anomalous effects in complex systems under a certain coincidence of circumstances. The presence of interaction can radically change the model of the system and sometimes lead to a rethinking of the nature of the phenomena with which the experimenter is dealing.

Multivariate analysis of variance with repeated experiments

Measurement data can quite often be grouped not by two, but by a larger number of factors. Thus, if we consider the dispersion analysis of the service life of trolleybus wheel tires taking into account the circumstances (the manufacturing plant and the route on which the tires are operated), then we can single out as a separate condition the season during which the tires are operated (namely: winter and summer operation). As a result, we will have a problem of the three-factor method.

If there are more conditions, the approach is the same as in two-factor analysis. In all cases, they try to simplify the model. The phenomenon of interaction of two factors does not appear so often, and triple interaction occurs only in exceptional cases. Include those interactions for which there is previous information and good reasons to take it into account in the model. The process of identifying individual factors and taking them into account is relatively simple. Therefore, there is often a desire to highlight more circumstances. You shouldn't get carried away with this. The more conditions, the less reliable the model becomes and the greater the likelihood of error. The model itself, which includes a large number of independent variables, becomes quite complex to interpret and inconvenient for practical use.

General idea of ​​analysis of variance

Analysis of variance in statistics is a method of obtaining observational results dependent on various simultaneously operating circumstances and assessing their influence. A controlled variable that corresponds to the method of influencing the object of study and acquires a certain value over a certain period of time is called a factor. They can be qualitative and quantitative. Levels of quantitative conditions acquire a certain meaning on a numerical scale. Examples are temperature, pressing pressure, amount of substance. Qualitative factors are different substances, different technological methods, devices, fillers. Their levels correspond to a scale of names.

Quality can also include the type of packaging material and storage conditions of the dosage form. It is also rational to include the degree of grinding of raw materials, the fractional composition of granules, which have quantitative significance, but are difficult to regulate if a quantitative scale is used. The number of qualitative factors depends on the type of dosage form, as well as the physical and technological properties of medicinal substances. For example, tablets can be obtained from crystalline substances by direct compression. In this case, it is enough to select sliding and lubricating substances.

Examples of quality factors for different types of dosage forms

  • Tinctures. Extractant composition, extractor type, raw material preparation method, production method, filtration method.
  • Extracts (liquid, thick, dry). Composition of the extractant, extraction method, type of installation, method of removing the extractant and ballast substances.
  • Pills. Composition of excipients, fillers, disintegrants, binders, lubricants and lubricants. Method of obtaining tablets, type of technological equipment. Type of shell and its components, film formers, pigments, dyes, plasticizers, solvents.
  • Injection solutions. Type of solvent, filtration method, nature of stabilizers and preservatives, sterilization conditions, method of filling ampoules.
  • Suppositories. Composition of the suppository base, method of producing suppositories, fillers, packaging.
  • Ointments. Composition of the base, structural components, method of preparing the ointment, type of equipment, packaging.
  • Capsules. Type of shell material, method of producing capsules, type of plasticizer, preservative, dye.
  • Liniments. Method of preparation, composition, type of equipment, type of emulsifier.
  • Suspensions. Type of solvent, type of stabilizer, dispersion method.

Examples of quality factors and their levels studied during the tablet manufacturing process

  • Baking powder. Potato starch, white clay, a mixture of sodium bicarbonate with citric acid, basic magnesium carbonate.
  • Binding solution. Water, starch paste, sugar syrup, methylcellulose solution, hydroxypropylmethylcellulose solution, polyvinylpyrrolidone solution, polyvinyl alcohol solution.
  • Sliding substance. Aerosil, starch, talc.
  • Filler. Sugar, glucose, lactose, sodium chloride, calcium phosphate.
  • Lubricant. Stearic acid, polyethylene glycol, paraffin.

Models of variance analysis in the study of the level of state competitiveness

One of the most important criteria for assessing the state of a state, by which the level of its well-being and socio-economic development is assessed, is competitiveness, that is, a set of properties inherent in the national economy that determine the state’s ability to compete with other countries. Having determined the place and role of the state in the world market, it is possible to establish a clear strategy for ensuring economic security on an international scale, because it is the key to positive relations between Russia and all players in the world market: investors, creditors, and governments.

To compare the level of competitiveness of states, countries are ranked using complex indices that include various weighted indicators. These indices are based on key factors influencing the economic, political, etc. situation. A set of models for studying state competitiveness involves the use of multivariate statistical analysis methods (in particular, analysis of variance (statistics), econometric modeling, decision making) and includes the following main stages:

  1. Formation of a system of indicators.
  2. Assessment and forecasting of state competitiveness indicators.
  3. Comparison of indicators of the competitiveness of states.

Now let’s look at the content of the models of each of the stages of this complex.

At the first stage using expert study methods, a well-founded set of economic indicators for assessing the competitiveness of the state is formed, taking into account the specifics of its development based on international ratings and data from statistical departments, reflecting the state of the system as a whole and its processes. The choice of these indicators is justified by the need to select those that most fully, from a practical point of view, allow us to determine the level of the state, its investment attractiveness and the possibility of relative localization of existing potential and actual threats.

The main indicators of international rating systems are indices:

  1. Global Competitiveness (GC).
  2. Economic freedom (IES).
  3. Human Development (HDI).
  4. Perceptions of Corruption (CPC).
  5. Internal and external threats (IETH).
  6. International Influence Potential (IPIP).

Second phase provides for the assessment and forecasting of state competitiveness indicators according to international ratings for the 139 countries of the world under study.

Third stage provides for a comparison of the conditions of competitiveness of states using methods of correlation and regression analysis.

Using the results of the study, it is possible to determine the nature of the processes in general and for individual components of the state’s competitiveness; test the hypothesis about the influence of factors and their relationships at the appropriate level of significance.

The implementation of the proposed set of models will allow not only to assess the current situation of the level of competitiveness and investment attractiveness of states, but also to analyze management shortcomings, prevent errors of wrong decisions, and prevent the development of a crisis in the state.

Analysis of variance

1. Concept of analysis of variance

Analysis of variance is an analysis of the variability of a trait under the influence of any controlled variable factors. In foreign literature, analysis of variance is often referred to as ANOVA, which is translated as analysis of variability (Analysis of Variance).

ANOVA problem consists in isolating variability of a different kind from the general variability of a trait:

a) variability due to the action of each of the independent variables under study;

b) variability due to the interaction of the independent variables being studied;

c) random variability due to all other unknown variables.

Variability due to the action of the variables under study and their interaction is correlated with random variability. An indicator of this relationship is Fisher's F test.

The formula for calculating the F criterion includes estimates of variances, that is, the distribution parameters of the attribute, therefore the F criterion is a parametric criterion.

The more the variability of a trait is due to the variables (factors) under study or their interaction, the higher empirical criterion values.

Zero the hypothesis in the analysis of variance will state that the average values ​​of the studied effective characteristic are the same in all gradations.

Alternative the hypothesis will state that the average values ​​of the resulting characteristic in different gradations of the factor under study are different.

Analysis of variance allows us to state a change in a characteristic, but does not indicate direction these changes.

Let's begin our consideration of variance analysis with the simplest case, when we study the action of only one variable (one factor).

2. One-way analysis of variance for unrelated samples

2.1. Purpose of the method

The method of one-factor analysis of variance is used in cases where changes in an effective characteristic are studied under the influence of changing conditions or gradations of a factor. In this version of the method, the influence of each of the gradations of the factor is different samples of subjects. There must be at least three gradations of the factor. (There may be two gradations, but in this case we will not be able to establish nonlinear dependencies and it seems more reasonable to use simpler ones).

A nonparametric version of this type of analysis is the Kruskal-Wallis H test.

Hypotheses

H 0: Differences between factor grades (different conditions) are no greater than random differences within each group.

H 1: Differences between factor grades (different conditions) are greater than random differences within each group.

2.2. Limitations of One-Way Analysis of Variance for Unrelated Samples

1. One-way analysis of variance requires at least three gradations of the factor and at least two subjects in each gradation.

2. The resulting characteristic must be normally distributed in the sample under study.

True, it is usually not indicated whether we are talking about the distribution of the characteristic in the entire surveyed sample or in that part of it that makes up the dispersion complex.

3. An example of solving a problem using the method of one-way analysis of variance for unrelated samples using the example:

Three different groups of six subjects were given lists of ten words. The words were presented to the first group at a low speed - 1 word per 5 seconds, to the second group at an average speed - 1 word per 2 seconds, and to the third group at a high speed - 1 word per second. Reproduction performance was predicted to depend on the speed of word presentation. The results are presented in Table. 1.

Number of words reproduced Table 1

Subject No.

low speed

average speed

high speed

total amount

H 0: Differences in word production span between groups are no more pronounced than random differences inside each group.

H1: Differences in word production volume between groups are more pronounced than random differences inside each group. Using the experimental values ​​presented in Table. 1, we will establish some values ​​that will be necessary to calculate the F criterion.

The calculation of the main quantities for one-way analysis of variance is presented in the table:

table 2

Table 3

Sequence of operations in one-way analysis of variance for unrelated samples

Often found in this and subsequent tables, the designation SS is an abbreviation for “sum of squares.” This abbreviation is most often used in translated sources.

SS fact means the variability of the characteristic due to the action of the factor under study;

SS generally- general variability of the trait;

S C.A.-variability due to unaccounted factors, “random” or “residual” variability.

MS- “mean square”, or the mathematical expectation of the sum of squares, the average value of the corresponding SS.

df - the number of degrees of freedom, which, when considering nonparametric criteria, we denoted by a Greek letter v.

Conclusion: H 0 is rejected. H 1 is accepted. Differences in word recall between groups were greater than random differences within each group (α=0.05). So, the speed of presentation of words affects the volume of their reproduction.

An example of solving the problem in Excel is presented below:

Initial data:

Using the command: Tools->Data Analysis->One-way ANOVA, we get the following results:

The techniques discussed above for testing statistical hypotheses about the significance of differences between two means have limited application in practice. This is due to the fact that in order to identify the effect of all possible conditions and factors on an effective trait, field and laboratory experiments, as a rule, are carried out using not two, but a larger number of samples (1220 or more).

Often researchers compare the means of several samples combined into a single complex. For example, when studying the effect of different types and doses of fertilizers on crop yields, experiments are repeated in different versions. In these cases, pairwise comparisons become cumbersome, and statistical analysis of the entire complex requires the use of a special method. This method, developed in mathematical statistics, is called analysis of variance. It was first used by the English statistician R. Fisher when processing the results of agronomic experiments (1938).

Analysis of variance is a method for statistically assessing the reliability of the manifestation of the dependence of an effective characteristic on one or more factors. Using the method of variance analysis, statistical hypotheses are tested regarding averages in several general populations that have a normal distribution.

Analysis of variance is one of the main methods for statistical evaluation of experimental results. It is also increasingly used in the analysis of economic information. Analysis of variance makes it possible to determine to what extent sample indicators of the relationship between the resultant and factor characteristics are sufficient to extend the data obtained from the sample to the general population. The advantage of this method is that it gives fairly reliable conclusions from small samples.

By studying the variation of an effective characteristic under the influence of one or several factors using variance analysis, one can obtain, in addition to general estimates of the significance of dependencies, also an assessment of the differences in the magnitude of the means that are formed at different levels of factors, and the significance of the interaction of factors. Analysis of variance is used to study the dependencies of both quantitative and qualitative characteristics, as well as their combination.

The essence of this method is the statistical study of the probability of the influence of one or more factors, as well as their interaction on the resulting characteristic. According to this, three main tasks are solved using variance analysis: 1) general assessment of the significance of differences between group means; 2) assessing the likelihood of interaction between factors; 3) assessment of the significance of differences between pairs of means. Most often, researchers have to solve such problems when conducting field and zootechnical experiments, when the influence of several factors on an effective trait is studied.

The principle scheme of variance analysis includes establishing the main sources of variation in the effective characteristic and determining the volume of variation (sum of squared deviations) according to the sources of its formation; determining the number of degrees of freedom corresponding to the components of the total variation; calculating dispersions as the ratio of the corresponding volumes of variation to their number of degrees of freedom; analysis of the relationship between variances; assessing the reliability of the difference between the means and drawing conclusions.

This scheme is preserved both in simple models of variance analysis, when data are grouped by one characteristic, and in complex models, when data is grouped by two or more characteristics. However, with an increase in the number of group characteristics, the process of decomposing the total variation according to the sources of its formation becomes more complicated.

According to the principle diagram, analysis of variance can be represented in the form of five sequential stages:

1) definition and expansion of variation;

2) determination of the number of degrees of freedom of variation;

3) calculation of variances and their ratios;

4) analysis of variances and their relationships;

5) assessing the significance of the difference between the means and formulating conclusions for testing the null hypothesis.

The most labor-intensive part of variance analysis is the first stage - determining and decomposing variation according to the sources of its formation. The order of decomposition of the total volume of variation was discussed in detail in Chapter 5.

The basis for solving problems of variance analysis is the law of expansion (adding) variation, according to which the total variation (fluctuations) of the resulting attribute is divided into two: the variation caused by the action of the factor(s) under study, and the variation caused by the action of random causes, that is

Let us assume that the population under study is divided according to factor characteristics into several groups, each of which is characterized by its own average value of the resulting characteristic. At the same time, the variation of these values ​​can be explained by two types of reasons: those that act on the effective sign systematically and can be adjusted during the experiment, and those that cannot be adjusted. It is obvious that intergroup (factorial or systematic) variation depends primarily on the action of the factor under study, and intragroup (residual or random) variation depends primarily on the action of random factors.

To assess the reliability of differences between group means, it is necessary to determine intergroup and intragroup variations. If the intergroup (factorial) variation significantly exceeds the intragroup (residual) variation, then the factor influenced the resulting characteristic, significantly changing the values ​​of group averages. But the question arises, what is the relationship between intergroup and intragroup variations that can be considered sufficient to conclude the reliability (significance) of differences between group means.

To assess the significance of differences between the means and formulate conclusions for testing the null hypothesis (H0:x1 = x2 =... = xn) in variance analysis, a kind of standard is used - the G-criterion, the distribution law of which was established by R. Fisher. This criterion is the ratio of two variances: factorial, generated by the action of the factor under study, and residual, due to the action of random causes:

Dispersion relation Γ = £>u : The American statistician Snedecor proposed denoting £*2 with the letter G in honor of the inventor of variance analysis, R. Fisher.

The variances °2 io2 are estimates of the population variance. If samples with variances °2 °2 are made from the same general population, where the variation of values ​​was random, then the discrepancy in the values ​​°2 °2 is also random.

If an experiment tests the influence of several factors (A, B, C, etc.) on an effective trait simultaneously, then the variance due to the action of each of them should be comparable to °e.gP, that is

If the value of the factor dispersion is significantly greater than the residual, then the factor significantly influenced the resulting attribute and vice versa.

In multifactorial experiments, in addition to the variation due to the action of each factor, there is almost always variation due to the interaction of factors ($ав: ^лс ^вс $ліс). The essence of the interaction is that the effect of one factor changes significantly at different levels of the second (for example, the effectiveness of Soil quality at different doses of fertilizers).

The interaction of factors should also be assessed by comparing the corresponding variances 3 ^v.gr:

When calculating the actual value of the B-criterion, the larger of the variances is taken in the numerator, so B > 1. Obviously, the larger the B criterion, the more significant the differences between the variances. If B = 1, then the question of assessing the significance of differences in variances is removed.

To determine the limits of random fluctuations in the ratio of dispersions, G. Fischer developed special B-distribution tables (Appendices 4 and 5). The criterion would be functionally related to probability and depends on the number of degrees of freedom of variation k1 and k2 of the two compared variances. Typically, two tables are used to make conclusions about the extremely high value of the criterion for significance levels of 0.05 and 0.01. A significance level of 0.05 (or 5%) means that only in 5 cases out of 100 criterion B can take a value equal to or higher than that indicated in the table. Reducing the significance level from 0.05 to 0.01 leads to an increase in the value of the criterion between two variances due to the effect of only random reasons.

The value of the criterion also depends directly on the number of degrees of freedom of the two dispersions being compared. If the number of degrees of freedom tends to infinity (k-me), then the ratio B for two dispersions tends to unity.

The tabulated value of criterion B shows the possible random value of the ratio of two variances at a given significance level and the corresponding number of degrees of freedom for each of the variances being compared. The indicated tables show the value of B for samples made from the same general population, where the reasons for changes in values ​​are only random.

The value of Γ is found from the tables (Appendices 4 and 5) at the intersection of the corresponding column (the number of degrees of freedom for greater dispersion - k1) and the row (the number of degrees of freedom for less dispersion - k2). So, if the larger variance (numerator Г) is k1 = 4, and the smaller variance (denominator Г) is k2 = 9, then Г at the significance level а = 0.05 will be 3.63 (Appendix 4). So, as a result of random causes, since the samples are small, the variance of one sample can, at a 5% significance level, exceed the variance for the second sample by 3.63 times. When the significance level decreases from 0.05 to 0.01, the tabular value of criterion G, as noted above, will increase. Thus, with the same degrees of freedom k1 = 4 and k2 = 9 and a = 0.01, the tabulated value of criterion G will be 6.99 (Appendix 5).

Let's consider the procedure for determining the number of degrees of freedom in variance analysis. The number of degrees of freedom, which corresponds to the total sum of squared deviations, is decomposed into the corresponding components similarly to the decomposition of the sums of squared deviations (^total = No^gr + ]¥vhr), that is, the total number of degrees of freedom (k") is decomposed into the number of degrees of freedom for intergroup (k1) and intragroup (k2) variations.

Thus, if a sample population consisting of N observations divided by T groups (number of experimental options) and P subgroups (number of repetitions), then the number of degrees of freedom k will accordingly be:

a) for the total sum of squared deviations (s7zag)

b) for the intergroup sum of squared deviations ^m.gP)

c) for the intragroup sum of squared deviations V v.gR)

According to the rule for adding variations:

For example, if in an experiment four variants of the experiment were formed (t = 4) in five repetitions each (n = 5), and the total number of observations is N = = T o p = 4 * 5 = 20, then the number of degrees of freedom is correspondingly equal to:

Knowing the sum of squared deviations and the number of degrees of freedom, we can determine unbiased (corrected) estimates for three variances:

The null hypothesis H0 is tested using criterion B in the same way as using Student’s t-test. To make a decision on checking H0, it is necessary to calculate the actual value of the criterion and compare it with the tabulated value Ba for the accepted significance level a and the number of degrees of freedom k1 and k2 for two dispersions.

If Bfaq > Ba, then, in accordance with the accepted level of significance, we can conclude that differences in sample variances are determined not only by random factors; they are significant. In this case, the null hypothesis is rejected and there is reason to assert that the factor significantly influences the resulting characteristic. If< Ба, то нулевую гипотезу принимают и есть основание утверждать, что различия между сравниваемыми дисперсиями находятся в границах возможных случайных колебаний: действие фактора на результативный признак не является существенным.

The use of a particular variance analysis model depends both on the number of factors being studied and on the method of sampling.

c Depending on the number of factors that determine the variation of the resulting characteristic, samples can be formed according to one, two or more factors. According to this, analysis of variance is divided into single-factor and multifactor. Otherwise, it is also called a single-factor and multifactor dispersion complex.

The decomposition scheme of total variation depends on the formation of groups. It can be random (observations of one group are not related to observations of the second group) and non-random (observations of two samples are related to each other by the common experimental conditions). Independent and dependent samples are obtained accordingly. Independent samples can be formed with both equal and uneven numbers. The formation of dependent samples assumes their equal size.

If the groups are formed in a random order, then the total volume of variation of the resulting trait includes, along with the factorial (intergroup) and residual variation, the variation of repetitions, that is

In practice, in most cases it is necessary to consider dependent samples when conditions for groups and subgroups are equalized. So, in a field experiment, the entire site is divided into blocks, with the most varied conditions. In this case, each variant of the experiment receives equal opportunities to be represented in all blocks, thereby equalizing the conditions for all tested variants of the experiment. This method of constructing an experiment is called the randomized block method. Experiments with animals are carried out similarly.

When processing socio-economic data using the variance analysis method, it is necessary to keep in mind that due to the large number of factors and their interrelation, it is difficult, even with the most careful leveling of conditions, to establish the degree of objective influence of each individual factor on the resulting characteristic. Therefore, the level of residual variation is determined not only by random causes, but also by significant factors that were not taken into account when constructing the variance analysis model. As a result of this, the residual variance as a basis for comparison sometimes becomes inadequate for its purpose; it is clearly overestimated in value and cannot act as a criterion for the significance of the influence of factors. In this regard, when constructing variance analysis models, the problem of selecting the most important factors and leveling the conditions for the manifestation of the action of each of them becomes relevant. Besides. the use of variance analysis assumes a normal or close to normal distribution of the statistical populations under study. If this condition is not met, then the estimates obtained in the analysis of variance will be exaggerated.

A person can recognize his abilities only by trying to apply them. (Seneca)

Analysis of variance

Introductory overview

In this section, we will review the basic methods, assumptions, and terminology of ANOVA.

Note that in the English-language literature, analysis of variance is usually called analysis of variation. Therefore, for brevity, below we will sometimes use the term ANOVA (An alysis o f va riation) for ordinary ANOVA and the term MANOVA for multivariate analysis of variance. In this section we will sequentially review the main ideas of analysis of variance ( ANOVA), analysis of covariance ( ANCOVA), multivariate analysis of variance ( MANOVA) and multivariate analysis of covariance ( MANCOVA). After a brief discussion of the merits of contrast analysis and post hoc tests, let's look at the assumptions on which ANOVA methods are based. Towards the end of this section, the advantages of a multivariate approach for repeated measures analysis over the traditional univariate approach are explained.

Key Ideas

Purpose of analysis of variance. The main purpose of analysis of variance is to examine the significance of differences between means. Chapter (Chapter 8) provides a brief introduction to the study of statistical significance. If you are simply comparing the means of two samples, analysis of variance will give the same result as ordinary analysis. t- test for independent samples (if two independent groups of objects or observations are compared) or t- criterion for dependent samples (if two variables are compared on the same set of objects or observations). If you are not familiar with these criteria, we recommend that you refer to the introductory chapter overview (Chapter 9).

Where did the name come from Analysis of variance? It may seem strange that the procedure for comparing means is called analysis of variance. In reality, this is because when we examine the statistical significance of differences between means, we are actually analyzing variances.

Partitioning the sum of squares

For a sample size n, the sample variance is calculated as the sum of squared deviations from the sample mean divided by n-1 (sample size minus one). Thus, for a fixed sample size n, the variance is a function of the sum of squares (deviations), denoted, for brevity, SS(from English Sum of Squares - Sum of Squares). The basis of variance analysis is the separation (or partitioning) of variance into parts. Consider the following data set:

The means of the two groups are significantly different (2 and 6, respectively). Sum of squared deviations inside each group is equal to 2. Adding them up, we get 4. If we now repeat these calculations excluding group membership, that is, if we calculate SS based on the overall mean of the two samples, we get 28. In other words, the variance (sum of squares) based on within-group variability results in much smaller values ​​than when calculated based on the overall variability (relative to the overall mean). The reason for this is obviously a significant difference between the means, and this difference between the means explains the existing difference between the sums of squares. In fact, if you use the module to analyze the given data Analysis of variance, the following results will be obtained:

As can be seen from the table, the total sum of squares SS=28 is divided by the sum of squares given by intragroup variability ( 2+2=4 ; see second row of the table) and the sum of squares due to the difference in mean values. (28-(2+2)=24; see first row of the table).

SS errors andSS effect. Within-group variability ( SS) is usually called dispersion errors. This means that it usually cannot be predicted or explained when an experiment is performed. On the other side, SS effect(or between-group variability) can be explained by differences between the means of the study groups. In other words, belonging to a certain group explains intergroup variability, because we know that these groups have different means.

Significance check. The basic ideas of statistical significance testing are discussed in Chapter Basic concepts of statistics(Chapter 8). This chapter also explains the reasons why many tests use the ratio of explained to unexplained variance. An example of this use is analysis of variance itself. Testing for significance in ANOVA is based on comparing the variance due to between-group variance (called mean square effect or MSEffect) and variance due to within-group variation (called mean squared error or MSerror). If the null hypothesis (equality of means in the two populations) is true, then one would expect relatively little difference in the sample means due to random variation. Therefore, under the null hypothesis, the within-group variance will practically coincide with the total variance calculated without taking into account group membership. The resulting within-group variances can be compared using F- test that checks whether the variance ratio is significantly greater than 1. In the example discussed above F- the criterion shows that the difference between the means is statistically significant.

Basic logic of analysis of variance. To summarize, the purpose of ANOVA is to test the statistical significance of the difference between means (for groups or variables). This check is carried out using analysis of variance, i.e. by dividing the total variance (variation) into parts, one of which is due to random error (that is, intragroup variability), and the second is associated with differences in mean values. The last variance component is then used to analyze the statistical significance of the difference between the means. If this difference is significant, the null hypothesis is rejected and the alternative hypothesis that there is a difference between the means is accepted.

Dependent and independent variables. Variables whose values ​​are determined by measurements during an experiment (for example, a test score) are called dependent variables. Variables that can be controlled in an experiment (for example, teaching methods or other criteria for dividing observations into groups) are called factors or independent variables. These concepts are described in more detail in the chapter Basic concepts of statistics(Chapter 8).

Multivariate analysis of variance

In the simple example above, you could immediately calculate the independent samples t-test using the appropriate module option Basic statistics and tables. The results obtained will naturally coincide with the results of analysis of variance. However, ANOVA contains flexible and powerful techniques that can be used for much more complex studies.

Many factors. The world is complex and multidimensional in nature. Situations when a certain phenomenon is completely described by one variable are extremely rare. For example, if we are trying to learn how to grow large tomatoes, we should consider factors related to the plant's genetic structure, soil type, light, temperature, etc. Thus, when conducting a typical experiment, one has to deal with a large number of factors. The main reason why using ANOVA is preferable to repeated comparisons of two samples at different factor levels using t- criterion is that analysis of variance is more effective and, for small samples, more informative.

Factor management. Suppose that in the two-sample analysis example discussed above, we add another factor, e.g. Floor- Gender. Let each group consist of 3 men and 3 women. The design of this experiment can be presented in the form of a 2 by 2 table:

Experiment. Group 1 Experiment. Group 2
Men2 6
3 7
1 5
Average2 6
Women4 8
5 9
3 7
Average4 8

Before doing the calculations, you can notice that in this example the total variance has at least three sources:

(1) random error (within group variance),

(2) variability associated with experimental group membership, and

(3) variability due to the gender of the objects of observation.

(Note that there is another possible source of variability - interaction of factors, which we will discuss later). What happens if we don't include floorgender as a factor in the analysis and calculate the usual t-criterion? If we calculate sums of squares, ignoring floor -gender(i.e., combining objects of different sexes into one group when calculating within-group variance, thereby obtaining a sum of squares for each group equal to SS=10, and the total sum of squares SS= 10+10 = 20), then we obtain a larger value of intragroup variance than with a more accurate analysis with additional division into subgroups according to semi- gender(in this case, the within-group means will be equal to 2, and the total within-group sum of squares will be equal to SS = 2+2+2+2 = 8). This difference is due to the fact that the average value for men - males less than the average for women –female, and this difference in means increases the overall within-group variability when sex is not taken into account. Controlling the error variance increases the sensitivity (power) of the test.

This example shows another advantage of variance analysis compared to conventional t- criterion for two samples. Analysis of variance allows you to study each factor by controlling the values ​​of the remaining factors. This is, in fact, the main reason for its greater statistical power (smaller sample sizes are required to obtain meaningful results). For this reason, analysis of variance, even on small samples, gives statistically more significant results than simple t- criterion.

Interaction Effects

There is another advantage of using analysis of variance compared to conventional t- criterion: analysis of variance allows us to detect interaction between factors and therefore allows the study of more complex models. To illustrate, consider another example.

Main effects, pairwise (two-factor) interactions. Suppose that there are two groups of students, and psychologically the students of the first group are determined to complete the assigned tasks and are more purposeful than the students of the second group, consisting of lazier students. Let's randomly split each group in half and give one half of each group a difficult task and the other half an easy one. We will then measure how hard students work on these tasks. The averages for this (fictional) study are shown in the table:

What conclusion can be drawn from these results? Can we conclude that: (1) students work more intensely on a complex task; (2) Do motivated students work harder than lazy students? None of these statements capture the essence of the systematic nature of the means shown in the table. Analyzing the results, it would be more correct to say that only motivated students work harder on difficult tasks, while only lazy students work harder on easy tasks. In other words, the character of the students and the difficulty of the task interacting influence each other on the effort expended. That's an example pair interaction between the character of the students and the difficulty of the task. Note that statements 1 and 2 describe main effects.

Higher-order interactions. While pairwise interactions are still relatively easy to explain, higher-order interactions are much more difficult to explain. Let's imagine that in the example considered above, another factor is introduced floor -Gender and we got the following table of averages:

What conclusions can now be drawn from the results obtained? Mean plots make it easy to interpret complex effects. The ANOVA module allows you to build these graphs with almost one click of the mouse.

The image in the graphs below represents the three-factor interaction being studied.

Looking at the graphs, we can tell that for women there is an interaction between personality and test difficulty: motivated women work harder on a difficult task than on an easy one. For men, the same interaction is reversed. It can be seen that the description of the interaction between factors becomes more confusing.

A general way to describe interactions. In general, the interaction between factors is described as a change in one effect under the influence of another. In the example discussed above, the two-factor interaction can be described as a change in the main effect of the factor characterizing the difficulty of the task under the influence of the factor describing the student’s character. For the interaction of the three factors from the previous paragraph, we can say that the interaction of the two factors (difficulty of the task and the character of the student) changes under the influence genderGender. If the interaction of four factors is studied, we can say that the interaction of the three factors changes under the influence of the fourth factor, i.e. There are different types of interactions at different levels of the fourth factor. It turns out that in many areas the interaction of five or even more factors is not unusual.

Complicated plans

Between-group and within-group designs (repeated measures designs)

When comparing two different groups, it is usually used t- criterion for independent samples (from the module Basic statistics and tables). When two variables are compared on the same set of objects (observations), it is used t-criterion for dependent samples. For analysis of variance, it is also important whether the samples are dependent or not. If there are repeated measurements of the same variables (under different conditions or at different times) for the same objects, then they talk about the presence repeated measures factor(also called intragroup factor, since the within-group sum of squares is calculated to assess its significance). If different groups of objects are compared (for example, men and women, three strains of bacteria, etc.), then the difference between the groups is described intergroup factor. The methods for calculating significance criteria for the two described types of factors are different, but their general logic and interpretations are the same.

Inter- and intra-group plans. In many cases, the experiment requires the inclusion of both a between-subjects factor and a repeated measures factor in the design. For example, the math skills of female and male students are measured (where floor -Gender-intergroup factor) at the beginning and end of the semester. The two measures of each student's skills form a within-group factor (repeated measures factor). The interpretation of main effects and interactions for between-subjects and repeated measures factors is consistent, and both types of factors can obviously interact with each other (e.g., women gain skills over the course of a semester, while men lose them).

Incomplete (nested) plans

In many cases the interaction effect can be neglected. This occurs either when it is known that there is no interaction effect in the population, or when the implementation of a complete factorial plan is impossible. For example, the effect of four fuel additives on fuel consumption is being studied. Four cars and four drivers are selected. Full factorial the experiment requires that each combination: additive, driver, car - appear at least once. This requires at least 4 x 4 x 4 = 64 groups of tests, which is too time consuming. Additionally, there is unlikely to be any interaction between the driver and the fuel additive. Taking this into account, you can use the plan Latin squares, which contains only 16 test groups (the four additives are designated by the letters A, B, C and D):

Latin squares are described in most books on experimental design (e.g., Hays, 1988; Lindman, 1974; Milliken and Johnson, 1984; Winer, 1962) and will not be discussed in detail here. Note that Latin squares are Notnfull designs in which not all combinations of factor levels are included. For example, driver 1 drives car 1 only with additive A, driver 3 drives car 1 only with additive C. Factor levels additives ( A, B, C and D) are nested in table cells automobile x driver - like eggs in nests. This mnemonic is useful for understanding nature nested or nested plans. Module Analysis of variance provides simple ways to analyze these types of plans.

Covariance Analysis

main idea

In chapter Key Ideas The idea of ​​factor control and how the inclusion of additive factors reduces the sum of squared errors and increases the statistical power of the design was briefly discussed. All this can be extended to variables with a continuous set of values. When such continuous variables are included as factors in a design, they are called covariates.

Fixed covariates

Suppose we are comparing the math skills of two groups of students who were taught using two different textbooks. Let's also assume that intelligence quotient (IQ) data is available for each student. You can assume that IQ is related to math skills and use that information. For each of the two groups of students, the correlation coefficient between IQ and math skills can be calculated. Using this correlation coefficient, it is possible to isolate the proportion of variance in groups that is explained by the influence of IQ and the unexplained proportion of variance (see also Basic concepts of statistics(Chapter 8) and Basic statistics and tables(chapter 9)). The remaining portion of the variance is used in the analysis as error variance. If there is a correlation between IQ and math skills, then the error variance can be reduced significantly SS/(n-1) .

Impact of covariates onF- criterion. F- the criterion evaluates the statistical significance of the difference in mean values ​​in groups, and the ratio of intergroup variance is calculated ( MSeffect) to error variance ( MSerror) . If MSerror decreases, for example, when taking into account the IQ factor, the value F increases.

Lots of covariates. The reasoning used above for a single covariate (IQ) can easily be extended to multiple covariates. For example, in addition to IQ, you can include measurements of motivation, spatial thinking, etc. Instead of the usual correlation coefficient, a multiple correlation coefficient is used.

When the valueF -criteria decreases. Sometimes introducing covariates into an experimental design reduces the significance F-criteria . This typically indicates that the covariates are correlated not only with the dependent variable (e.g., math skills) but also with the factors (e.g., different textbooks). Suppose that IQ is measured at the end of the semester, after almost a year of teaching two groups of students using two different textbooks. Although students were assigned to groups randomly, it may be that the textbook differences are so great that both IQ and math skills will vary greatly between groups. In this case, the covariates not only reduce the error variance but also the between-group variance. In other words, after controlling for differences in IQ across groups, differences in math skills are no longer significant. You can say it differently. After “ruling out” the influence of IQ, the influence of the textbook on the development of mathematical skills is unintentionally excluded.

Adjusted averages. When a covariate influences the between-subjects factor, one should calculate adjusted means, i.e. those means that are obtained after removing all covariate estimates.

Interactions between covariates and factors. Just as interactions between factors are examined, interactions between covariates and between groups of factors can be examined. Let's say that one of the textbooks is especially suitable for smart students. The second textbook is boring for smart students, and the same textbook is difficult for less smart students. As a result, there is a positive correlation between IQ and learning outcome in the first group (smarter students, better results) and zero or slight negative correlation in the second group (the smarter the student, the less likely it is to acquire mathematical skills from the second textbook). Some studies discuss this situation as an example of a violation of the assumptions of covariance analysis. However, because the ANOVA module uses the most common methods of analysis of covariance, it is possible, in particular, to evaluate the statistical significance of the interaction between factors and covariates.

Variable covariates

While fixed covariates are discussed quite often in textbooks, variable covariates are mentioned much less frequently. Typically, when conducting experiments with repeated measurements, we are interested in differences in measurements of the same quantities at different points in time. Namely, we are interested in the significance of these differences. If covariates are measured simultaneously with measurements of the dependent variables, the correlation between the covariate and the dependent variable can be calculated.

For example, math interest and math skills could be explored at the beginning and end of the semester. It would be interesting to test whether changes in interest in mathematics are correlated with changes in mathematics skills.

Module Analysis of variance V STATISTICA automatically assesses the statistical significance of changes in covariates in designs where possible.

Multivariate designs: multivariate analysis of variance and covariance

Intergroup plans

All of the examples discussed previously included only one dependent variable. When there are several dependent variables at the same time, only the complexity of the calculations increases, but the content and basic principles do not change.

For example, a study is carried out on two different textbooks. At the same time, students' success in studying physics and mathematics is studied. In this case, there are two dependent variables and you need to find out how two different textbooks influence them simultaneously. To do this, you can use multivariate analysis of variance (MANOVA). Instead of one-dimensional F criterion, multidimensional is used F test (Wilks' l test), based on comparison of the error covariance matrix and the intergroup covariance matrix.

If dependent variables are correlated with each other, then this correlation should be taken into account when calculating the significance criterion. Obviously, if the same measurement is repeated twice, then nothing new can be obtained. If a correlated dimension is added to an existing dimension, some new information is obtained, but the new variable contains redundant information, which is reflected in the covariance between the variables.

Interpretation of results. If the overall multivariate test is significant, we can conclude that the corresponding effect (e.g., textbook type) is significant. However, the following questions arise. Does textbook type affect improvements in math skills only, physical skills only, or both skills? In fact, after obtaining a significant multivariate test, a univariate test is examined for the individual main effect or interaction. F criterion. In other words, dependent variables that contribute to the significance of the multivariate test are examined separately.

Repeated Measures Designs

If students' math and physics skills are measured at the beginning of the semester and at the end, then these are repeated measures. The study of the significance criterion in such plans is a logical development of the one-dimensional case. Note that multivariate analysis of variance techniques are also commonly used to examine the significance of univariate repeated measures factors having more than two levels. The corresponding applications will be discussed later in this part.

Summation of variable values ​​and multivariate analysis of variance

Even experienced users of univariate and multivariate analysis of variance often find it difficult to obtain different results when applying multivariate analysis of variance, for example, to three variables, and when applying univariate analysis of variance to the sum of these three variables, as if it were a single variable.

Idea summation variables is that each variable contains some true variable, which is being studied, as well as a random measurement error. Therefore, when averaging the values ​​of variables, the measurement error will be closer to 0 for all measurements and the averaged values ​​will be more reliable. In fact, in this case, applying ANOVA to the sum of variables is reasonable and a powerful technique. However, if the dependent variables are multidimensional in nature, summing the values ​​of the variables is inappropriate.

For example, let the dependent variables consist of four indicators success in society. Each indicator characterizes a completely independent aspect of human activity (for example, professional success, success in business, family well-being, etc.). Adding these variables is like adding apples and oranges. The sum of these variables would not be an appropriate unidimensional measure. Therefore, such data must be treated as multidimensional indicators in multivariate analysis of variance.

Contrast analysis and post hoc tests

Why are separate sets of averages compared?

Typically, hypotheses about experimental data are formulated not simply in terms of main effects or interactions. An example would be this hypothesis: a certain textbook improves math skills only in male students, while another textbook is approximately equally effective for both sexes, but is still less effective for males. It can be predicted that textbook effectiveness interacts with student gender. However, this forecast also applies nature interactions. A significant difference between genders is expected for students using one book and virtually independent results by gender for students using the other book. This type of hypothesis is usually examined using contrast analysis.

Analysis of Contrasts

In short, contrast analysis allows one to evaluate the statistical significance of certain linear combinations of complex effects. Contrast analysis is the main and mandatory element of any complex ANOVA plan. Module Analysis of variance has quite a variety of contrast analysis capabilities that allow you to isolate and analyze any type of comparison of means.

A posteriori comparisons

Sometimes, as a result of processing an experiment, an unexpected effect is discovered. Although in most cases a creative researcher will be able to explain any result, this does not allow for further analysis and estimates for prediction. This problem is one of those for which a posteriori criteria, that is, criteria that do not use a priori hypotheses. To illustrate, consider the following experiment. Let's assume that there are 100 cards containing numbers from 1 to 10. Putting all these cards in a hat, we randomly select 5 cards 20 times, and calculate the average value (the average of the numbers written on the cards) for each sample. Can you expect that there will be two samples whose means are significantly different? This is very plausible! By selecting two samples with a maximum and a minimum mean, you can obtain a difference in means that is very different from the difference in means, for example, of the first two samples. This difference can be explored, for example, using contrast analysis. Without going into details, there are several so-called a posteriori criteria that are based exactly on the first scenario (taking extreme means from 20 samples), i.e. these criteria are based on choosing the most different means to compare all means in the design. These criteria are used to ensure that an artificial effect is not obtained purely by chance, for example, to detect a significant difference between means when there is none. Module Analysis of variance offers a wide range of such criteria. When unexpected results are encountered in an experiment involving several groups, then a posteriori procedures for examining the statistical significance of the results obtained.

Sum of squares type I, II, III and IV

Multivariate regression and analysis of variance

There is a close relationship between the multivariate regression method and analysis of variance (analysis of variance). In both methods, a linear model is studied. In short, almost all experimental designs can be examined using multivariate regression. Consider the following simple intergroup 2 x 2 design.

D.V. A B AxB
3 1 1 1
4 1 1 1
4 1 -1 -1
5 1 -1 -1
6 -1 1 -1
6 -1 1 -1
3 -1 -1 1
2 -1 -1 1

Columns A and B contain codes characterizing the levels of factors A and B, column AxB contains the product of two columns A and B. We can analyze this data using multivariate regression. Variable D.V. defined as a dependent variable, variables from A before AxB as independent variables. The study of significance for the regression coefficients will coincide with the calculations in the analysis of variance of the significance of the main effects of the factors A And B and interaction effect AxB.

Unbalanced and balanced plans

When calculating the correlation matrix for all variables, such as the data depicted above, you will notice that the main effects of the factors A And B and interaction effect AxB uncorrelated. This property of effects is also called orthogonality. They say the effects A And B - orthogonal or independent from each other. If all the effects in a plan are orthogonal to each other, as in the example above, then the plan is said to be balanced.

Balanced plans have a “good property.” The calculations for analyzing such plans are very simple. All calculations boil down to calculating the correlation between effects and dependent variables. Since the effects are orthogonal, partial correlations (as in full multidimensional regressions) are not calculated. However, in real life plans are not always balanced.

Let's consider real data with an unequal number of observations in cells.

Factor A Factor B
B1 B2
A1 3 4, 5
A2 6, 6, 7 2

If we code this data as above and calculate a correlation matrix for all variables, we find that the design factors are correlated with each other. Factors in a plan are no longer orthogonal and such plans are called unbalanced. Note that in the example under consideration, the correlation between factors is entirely due to the difference in frequencies of 1 and -1 in the columns of the data matrix. In other words, experimental designs with unequal cell volumes (more precisely, disproportionate volumes) will be unbalanced, meaning that main effects and interactions will be confounded. In this case, the full multivariate regression must be calculated to calculate the statistical significance of the effects. There are several strategies here.

Sum of squares type I, II, III and IV

Sum of squares typeIAndIII. To examine the significance of each factor in a multivariate model, the partial correlation of each factor can be calculated, provided that all other factors are already accounted for in the model. You can also enter factors into the model in a step-by-step manner, capturing all factors already entered into the model and ignoring all other factors. In general, this is the difference between type III And typeI sum of squares (this terminology was introduced in SAS, see, for example, SAS, 1982; detailed discussion can also be found in Searle, 1987, p. 461; Woodward, Bonett, and Brecht, 1990, p. 216; or Milliken and Johnson, 1984, p. 138).

Sum of squares typeII. The next “intermediate” model formation strategy consists of: controlling for all main effects when examining the significance of a single main effect; in controlling for all main effects and all pairwise interactions when examining the significance of an individual pairwise interaction; in controlling for all main effects of all pairwise interactions and all interactions of three factors; when studying the individual interaction of three factors, etc. The sums of squares for effects calculated in this way are called typeII sum of squares. So, typeII sum of squares controls for all effects of the same order and lower, while ignoring all higher order effects.

Sum of squares typeIV. Finally, for some special plans with missing cells (incomplete plans), it is possible to calculate the so-called type IV sum of squares. This method will be discussed later in connection with incomplete designs (designs with missing cells).

Interpretation of the sum of squares hypothesis of types I, II, and III

Sum of squares typeIII easiest to interpret. Recall that the sums of squares typeIII examine effects after controlling for all other effects. For example, after finding a statistically significant typeIII effect for factor A in the module Analysis of variance, we can say that there is a single significant effect of the factor A, after introducing all other effects (factors) and interpret this effect accordingly. In probably 99% of all ANOVA applications, this is the type of test that the researcher is interested in. This type of sum of squares is usually calculated in modulo Analysis of variance by default, regardless of whether the option is selected Regression approach or not (standard approaches adopted in the module Analysis of variance discussed below).

Significant effects obtained using sums of squares type or typeII sums of squares are not so easy to interpret. They are best interpreted in the context of stepwise multivariate regression. If, when using the sum of squares typeI the main effect of factor B was significant (after factor A was included in the model, but before the interaction between A and B was added), we can conclude that there is a significant main effect of factor B, provided that there is no interaction between factors A and B. (If using the criterion typeIII, factor B also turned out to be significant, then we can conclude that there is a significant main effect of factor B, after introducing all other factors and their interactions into the model).

In terms of marginal means hypothesis typeI And typeII usually do not have a simple interpretation. In these cases, it is said that one cannot interpret the significance of effects by looking only at marginal means. Rather presented p means are related to a complex hypothesis that combines means and sample size. For example, typeII the hypotheses for factor A in the simple example of a 2 x 2 design discussed earlier would be (see Woodward, Bonett, and Brecht, 1990, p. 219):

nij- number of observations in the cell

uij- average value in cell

n. j- marginal average

Without going into too much detail (for more details, see Milliken and Johnson, 1984, Chapter 10), it is clear that these are not simple hypotheses and in most cases none of them are of particular interest to the researcher. However, there are cases when hypotheses typeI may be interesting.

Default computational approach in the module Analysis of variance

Default if option is not checked Regression approach, module Analysis of variance uses cell average model. Characteristic of this model is that the sums of squares for different effects are calculated for linear combinations of cell means. In a full factorial experiment, this results in sums of squares that are the same as the sums of squares discussed earlier as type III. However, in the option Planned comparisons(in the window ANOVA results), the user can test a hypothesis against any linear combination of weighted or unweighted cell means. Thus, the user can test not only hypotheses typeIII, but hypotheses of any type (including typeIV). This general approach is especially useful when examining designs with missing cells (called incomplete designs).

For full factorial designs, this approach is also useful when one wants to analyze weighted marginal means. For example, suppose that in the simple 2 x 2 design considered earlier, we need to compare weighted (by factor levels B) marginal means for factor A. This is useful when the distribution of observations across cells was not prepared by the experimenter, but was constructed randomly, and this randomness is reflected in the distribution of the number of observations across the levels of factor B in the aggregate.

For example, there is a factor - the age of widows. The possible sample of respondents is divided into two groups: under 40 years old and over 40 (factor B). The second factor (Factor A) in the plan was whether or not widows received social support from some agency (some widows were randomly selected, others served as controls). In this case, the distribution of widows by age in the sample reflects the actual distribution of widows by age in the population. Assessing the effectiveness of a social support group for widows all ages will correspond to a weighted average for two age groups (with weights corresponding to the number of observations in the group).

Planned comparisons

Note that the sum of the entered contrast coefficients is not necessarily equal to 0 (zero). Instead, the program will automatically make adjustments to ensure that the corresponding hypotheses are not confounded with the overall average.

To illustrate this, let's go back to the simple 2 x 2 plan discussed earlier. Recall that the numbers of observations in the cells of this unbalanced design are -1, 2, 3, and 1. Suppose we want to compare the weighted marginal means for factor A (weighted by the frequency of the levels of factor B). You can enter contrast coefficients:

Note that these coefficients do not add up to 0. The program will set the coefficients so that they add up to 0, and their relative values ​​will be preserved, i.e.:

1/3 2/3 -3/4 -1/4

These contrasts will compare weighted means for factor A.

Hypotheses about the principal average. The hypothesis that the unweighted principal mean is 0 can be explored using the coefficients:

The hypothesis that the weighted principal mean is 0 is tested using:

In no case does the program adjust contrast ratios.

Analysis of plans with missing cells (incomplete plans)

Factorial designs that contain empty cells (processing combinations of cells that have no observations) are called incomplete. In such designs, some factors are usually not orthogonal and some interactions cannot be calculated. There is generally no better method for analyzing such plans.

Regression approach

In some older programs that rely on analyzing ANOVA designs using multivariate regression, factors in incomplete designs are specified by default as usual (as if the design were complete). Multivariate regression analyzes are then performed on these dummy coded factors. Unfortunately, this method produces results that are very difficult, if not impossible, to interpret because it is unclear how each effect contributes to the linear combination of means. Consider the following simple example.

Factor A Factor B
B1 B2
A1 3 4, 5
A2 6, 6, 7 Missed

If we perform multivariate regression of the form Dependent Variable = Constant + Factor A + Factor B, then the hypothesis about the significance of factors A and B in terms of linear combinations of means looks like this:

Factor A: Cell A1,B1 = Cell A2,B1

Factor B: Cell A1,B1 = Cell A1,B2

This case is simple. In more complex designs, it is impossible to actually determine what exactly will be examined.

Cell means, ANOVA approach , Type IV hypotheses

The approach that is recommended in the literature and which seems preferable is to study meaningful (in terms of research questions) a priori hypotheses about the means observed in the cells of the plan. Detailed discussion of this approach can be found in Dodge (1985), Heiberger (1989), Milliken and Johnson (1984), Searle (1987), or Woodward, Bonett, and Brecht (1990). Sums of squares associated with hypotheses about the linear combination of means in incomplete designs that examine estimates of part of the effects are also called sums of squares IV.

Automatic generation of type hypothesesIV. When multivariate designs have complex missing cell patterns, it is desirable to define orthogonal (independent) hypotheses whose investigation is equivalent to examining main effects or interactions. Algorithmic (computational) strategies (based on the pseudo-inverse design matrix) have been developed to generate suitable weights for such comparisons. Unfortunately, the final hypotheses are not defined in a unique way. Of course, they depend on the order in which the effects were identified and rarely allow for a simple interpretation. Therefore, it is recommended to carefully study the nature of the missing cells, then formulate hypotheses typeIV, which most meaningfully correspond to the objectives of the study. Then explore these hypotheses using the option Planned comparisons in the window results. The easiest way to specify comparisons in this case is to require the introduction of a vector of contrasts for all factors together in the window Planned comparisons. After calling the dialog box Planned comparisons All groups in the current plan will be shown and those that are missing will be marked.

Missing cells and testing for specific effect

There are several types of designs in which the location of missing cells is not random, but is carefully planned, allowing simple analysis of main effects without affecting other effects. For example, when the required number of cells in a plan is not available, plans are often used Latin squares to estimate the main effects of several factors with a large number of levels. For example, a 4 x 4 x 4 x 4 factorial design requires 256 cells. At the same time you can use Greco-Latin square to estimate main effects with only 16 cells in the design (Chapter Experiment planning, Volume IV, contains a detailed description of such plans). Incomplete designs in which main effects (and some interactions) can be estimated using simple linear combinations of means are called balanced incomplete plans.

In balanced designs, the standard (default) method of generating contrasts (weights) for main effects and interactions will then produce a table of variances analysis in which the sums of squares for the respective effects are not confounded with each other. Option Specific effects window results will generate missing contrasts by writing a zero to the missing plan cells. Immediately after the option is requested Specific effects for the user examining some hypothesis, a table of results appears with the actual weights. Note that in a balanced design, the sums of squares of the corresponding effects are calculated only if those effects are orthogonal (independent) to all other main effects and interactions. Otherwise, you need to use the option Planned comparisons to explore meaningful comparisons between means.

Missing cells and pooled effects/error terms

If option Regression approach in the module start panel Analysis of variance is not selected, the cell average model will be used when calculating the sum of squares for effects (the default setting). If the design is not balanced, then when combining non-orthogonal effects (see above discussion of the option Missed cells and specific effect) one can obtain a sum of squares consisting of non-orthogonal (or overlapping) components. The results obtained are usually not interpretable. Therefore, one must be very careful when selecting and implementing complex incomplete experimental designs.

There are many books with detailed discussions of different types of plans. (Dodge, 1985; Heiberger, 1989; Lindman, 1974; Milliken and Johnson, 1984; Searle, 1987; Woodward and Bonett, 1990), but this type of information is beyond the scope of this textbook. However, an analysis of different types of plans will be demonstrated later in this section.

Assumptions and the effects of violating assumptions

Deviation from the assumption of normal distributions

Suppose the dependent variable is measured on a numerical scale. Let us also assume that the dependent variable is normally distributed within each group. Analysis of variance contains a wide range of graphs and statistics to support this assumption.

Effects of disruption. At all F the test is very robust to deviations from normality (for detailed results, see Lindman, 1974). If kurtosis is greater than 0, then the value of the statistic is F may become very small. The null hypothesis is accepted, although it may not be true. The situation is reversed when kurtosis is less than 0. Distribution skewness usually has little effect on F statistics. If the number of observations in a cell is large enough, then deviation from normality is not particularly significant due to central limit theorem, according to which the distribution of the average value is close to normal, regardless of the initial distribution. Detailed discussion of sustainability F statistics can be found in Box and Anderson (1955), or Lindman (1974).

Uniformity of variance

Assumptions. It is assumed that the variances of different design groups are the same. This assumption is called the assumption homogeneity of variance. Recall that at the beginning of this section, when describing the calculation of the sum of squared errors, we performed the summation within each group. If the variances in two groups are different from each other, then adding them up is not very natural and does not provide an estimate of the total within-group variance (since in this case there is no total variance at all). Module Analysis of variance -ANOVA/MANOVA contains a large set of statistical criteria for detecting deviations from assumptions of homogeneity of variance.

Effects of disruption. Lindman (1974, p. 33) shows that F the criterion is quite stable with respect to violation of the assumptions of homogeneity of variance ( heterogeneity variance, see also Box, 1954a, 1954b; Hsu, 1938).

Special case: correlation of means and variances. There are times when F statistics can mislead. This happens when the means of the design cells are correlated with the variance. Module Analysis of variance allows you to plot scatterplots of variance or standard deviation against the mean to detect such correlation. The reason why this correlation is dangerous is the following. Let's imagine that there are 8 cells in the plan, 7 of which have almost the same average, and in one cell the average is much higher than the others. Then F the test may detect a statistically significant effect. But suppose that in a cell with a large average value the variance is significantly larger than the others, i.e. the average value and variance in cells are dependent (the higher the average, the greater the variance). In this case, a large average is unreliable because it may be caused by large variance in the data. However F statistics based on united variance within cells will capture the grand mean, although tests based on the variance within each cell will not consider all differences in means to be significant.

This type of data (large mean and large variance) often occurs when there are outlier observations. One or two outlier observations greatly shift the mean and greatly increase the variance.

Homogeneity of Variance and Covariance

Assumptions. Multivariate designs with multivariate dependent measures also apply the assumption of homogeneity of variance described earlier. However, since there are multivariate dependent variables, it is also required that their cross-correlations (covariances) be uniform across all cells of the design. Module Analysis of variance offers different ways to test these assumptions.

Effects of disruption. Multidimensional analogue F- criterion - Wilks' λ-test. Not much is known about the robustness of the Wilks λ test with respect to violations of the above assumptions. However, since the interpretation of the module results Analysis of variance is usually based on the significance of univariate effects (after establishing the significance of the general criterion), the discussion of robustness concerns mainly univariate analysis of variance. Therefore, the significance of univariate effects should be carefully examined.

Special case: analysis of covariance. Particularly severe violations of variance/covariance homogeneity can occur when covariates are included in the design. In particular, if the correlation between covariates and dependent measures varies across cells in the design, misinterpretation of the results may ensue. Remember that analysis of covariance essentially performs a regression analysis within each cell to isolate that portion of the variance that is accounted for by the covariate. The homogeneity of variance/covariance assumption assumes that this regression analysis is conducted under the following constraint: all regression equations (slopes) for all cells are the same. If this is not assumed, then large errors may appear. Module Analysis of variance has several special criteria to test this assumption. It is advisable to use these criteria to ensure that the regression equations for different cells are approximately the same.

Sphericity and complex symmetry: reasons for using a multivariate approach to repeated measures in analysis of variance

In designs containing repeated measures factors with more than two levels, the use of univariate ANOVA requires additional assumptions: the compound symmetry assumption and the sphericity assumption. These assumptions are rarely met (see below). Therefore, in recent years, multivariate analysis of variance has gained popularity in such designs (both approaches are combined in the module Analysis of variance).

Assumption of complex symmetry The assumption of compound symmetry is that the variances (shared within groups) and covariances (shared within groups) for different repeated measures are homogeneous (the same). This is a sufficient condition for the univariate F test for repeated measures to be valid (i.e., the reported F values ​​are on average consistent with the F distribution). However, in this case this condition is not necessary.

Assumption of sphericity. The assumption of sphericity is a necessary and sufficient condition for the F-test to be valid. It consists in the fact that within groups all observations are independent and equally distributed. The nature of these assumptions, and the impact of violating them, are not usually well described in books on ANOVA - these will be covered in the following paragraphs. It will also be shown that the results of a univariate approach may differ from the results of a multivariate approach, and it will be explained what this means.

The need for independence of hypotheses. The general way to analyze data in ANOVA is model fitting. If, relative to the model that fits the data, there are some a priori hypotheses, then the variance is divided to test these hypotheses (criteria for main effects, interactions). From a computational point of view, this approach generates a set of contrasts (a set of comparisons of plan means). However, if the contrasts are not independent of each other, the partitioning of variances becomes meaningless. For example, if two contrasts A And B are identical and the corresponding part of the variance is extracted, then the same part is extracted twice. For example, it is stupid and pointless to identify two hypotheses: “the mean in cell 1 is higher than the mean in cell 2” and “the mean in cell 1 is higher than the mean in cell 2.” So, the hypotheses must be independent or orthogonal.

Independent hypotheses in repeated measures. General algorithm implemented in the module Analysis of variance, will attempt to generate independent (orthogonal) contrasts for each effect. For the repeated measures factor, these contrasts provide many hypotheses regarding differences between the levels of the factor under consideration. However, if these differences are correlated within groups, then the resulting contrasts are no longer independent. For example, in teaching where students are measured three times in one semester, it may happen that the change between the 1st and 2nd measurement is negatively correlated with the change between the 2nd and 3rd measurement of the subjects. Those who mastered most of the material between the 1st and 2nd dimensions master a smaller part during the time that passed between the 2nd and 3rd dimensions. In fact, for most cases where ANOVA is used for repeated measures, changes across levels can be assumed to be correlated across subjects. However, when this happens, the complex symmetry assumption and the sphericity assumption do not hold and independent contrasts cannot be calculated.

The impact of violations and ways to correct them. When complex symmetry or sphericity assumptions are not met, ANOVA may produce erroneous results. Before multivariate procedures were sufficiently developed, several assumptions were proposed to compensate for violations of these assumptions. (See, for example, Greenhouse & Geisser, 1959 and Huynh & Feldt, 1970). These methods are still widely used (which is why they are presented in the module Analysis of variance).

Multivariate analysis of variance approach to repeated measures. In general, the problems of complex symmetry and sphericity relate to the fact that the sets of contrasts included in the study of the effects of repeated measures factors (with more than 2 levels) are not independent of each other. However, they do not need to be independent if used multidimensional a test for simultaneously testing the statistical significance of two or more repeated measures factor contrasts. This is the reason why multivariate analysis of variance techniques have become increasingly used to test the significance of univariate repeated measures factors with more than 2 levels. This approach is widely accepted because it does not generally require complex symmetry or sphericity.

Cases in which the multivariate analysis of variance approach cannot be used. There are examples (designs) where the multivariate analysis of variance approach cannot be applied. These are typically cases where there are a small number of subjects in the design and many levels in the repeated measures factor. There may then be too few observations to conduct a multivariate analysis. For example, if there are 12 subjects, p = 4 repeated measures factor, and each factor has k = 3 levels. Then the interaction of 4 factors will “consume” (k-1)P = 2 4 = 16 degrees of freedom. However, there are only 12 subjects, so a multivariate test cannot be performed in this example. Module Analysis of variance will independently detect these observations and calculate only one-dimensional criteria.

Differences in univariate and multivariate results. If a study involves a large number of repeated measures, there may be cases where the univariate repeated measures ANOVA approach produces results that are very different from those obtained with the multivariate approach. This means that the differences between the levels of the corresponding repeated measures are correlated across subjects. Sometimes this fact is of some independent interest.

Multivariate analysis of variance and structural equation modeling

In recent years, structural equation modeling has become popular as an alternative to multivariate analysis of variance (see, for example, Bagozzi and Yi, 1989; Bagozzi, Yi, and Singh, 1991; Cole, Maxwell, Arvey, and Salas, 1993). This approach allows testing hypotheses not only about means in different groups, but also about correlation matrices of dependent variables. For example, one could relax the assumptions of homogeneity of variances and covariances and explicitly include error variances and covariances in the model for each group. Module STATISTICAStructural Equation Modeling (SEPATH) (see Volume III) allows for such an analysis.

The use of statistics in this note will be illustrated with a cross-cutting example. Let's say you are the production manager at Perfect Parachute. The parachutes are made from synthetic fibers supplied by four different suppliers. One of the main characteristics of a parachute is its strength. You need to ensure that all fibers supplied are of the same strength. To answer this question, an experimental design should be designed to measure the strength of parachutes woven from synthetic fibers from different suppliers. The information obtained from this experiment will determine which supplier provides the most durable parachutes.

Many applications involve experiments that consider multiple groups or levels of a single factor. Some factors, such as ceramic firing temperature, may have multiple numerical levels (i.e. 300°, 350°, 400° and 450°). Other factors, such as the location of items in a supermarket, may have categorical levels (eg, first supplier, second supplier, third supplier, fourth supplier). Single-factor experiments in which experimental units are randomly assigned to groups or factor levels are called completely randomized.

UsageF-criteria for assessing differences between several mathematical expectations

If the numerical measurements of the factor in the groups are continuous and some additional conditions are met, analysis of variance (ANOVA) is used to compare the mathematical expectations of several groups. An alysis o f Va riance). Analysis of variance using completely randomized designs is called a one-way ANOVA procedure. In some ways, the term analysis of variance is a misnomer because it compares differences between the expected values ​​of groups rather than between variances. However, the comparison of mathematical expectations is carried out precisely on the basis of an analysis of data variation. In the ANOVA procedure, the total variation in the measurement results is divided into between-groups and within-groups (Fig. 1). Within-group variation is explained by experimental error, and between-group variation is explained by the effects of experimental conditions. Symbol With denotes the number of groups.

Rice. 1. Partitioning Variation in a Completely Randomized Experiment

Download the note in or format, examples in format

Let's pretend that With groups are extracted from independent populations that have a normal distribution and equal variance. The null hypothesis is that the mathematical expectations of the populations are the same: H 0: μ 1 = μ 2 = ... = μ s. The alternative hypothesis states that not all mathematical expectations are the same: H 1: not all μ j are the same j= 1, 2, …, s).

In Fig. Figure 2 presents the true null hypothesis about the mathematical expectations of the five compared groups, provided that the populations have a normal distribution and the same variance. The five populations associated with different levels of the factor are identical. Consequently, they are superimposed on one another, having the same mathematical expectation, variation and shape.

Rice. 2. Five general populations have the same mathematical expectation: μ 1 = μ 2 = μ 3 = μ 4 = μ 5

On the other hand, suppose that in fact the null hypothesis is false, with the fourth level having the highest expected value, the first level having a slightly lower expected value, and the remaining levels having the same and even lower expected values ​​(Figure 3). Note that, with the exception of the expected values, all five populations are identical (that is, they have the same variability and shape).

Rice. 3. The effect of the experimental conditions is observed: μ 4 > μ 1 > μ 2 = μ 3 = μ 5

When testing the hypothesis about the equality of mathematical expectations of several general populations, the total variation is divided into two parts: intergroup variation, due to differences between groups, and intragroup variation, due to differences between elements belonging to the same group. The total variation is expressed by the total sum of squares (SST – sum of squares total). Since the null hypothesis is that the mathematical expectations of all With groups are equal to each other, the total variation is equal to the sum of squared differences between individual observations and the overall average (average of averages), calculated for all samples. Full variation:

Where - general average, X ij - i-e observation in j-group or level, n j- number of observations in j th group, n- the total number of observations in all groups (i.e. n = n 1 + n 2 + … + n c), With- number of groups or levels studied.

Between-group variation, usually called the between-group sum of squares (SSA – sum of squares among groups), is equal to the sum of squares of the differences between the sample mean of each group j and overall average , multiplied by the volume of the corresponding group n j:

Where With- number of groups or levels studied, n j- number of observations in j th group, j- average value j th group, - overall average.

Within-group variation, usually called the within-group sum of squares (SSW - sum of squares withing groups), is equal to the sum of squares of the differences between the elements of each group and the sample mean of this group j:

Where Xij - i th element j th group, j- average value j th group.

Since they are compared With factor levels, the intergroup sum of squares has s – 1 degrees of freedom. Each of With levels has n j – 1 degrees of freedom, so the intragroup sum of squares has n- With degrees of freedom, and

In addition, the total sum of squares has n – 1 degrees of freedom, since each observation Xij is compared with the overall average calculated over all n observations. If each of these sums is divided by the corresponding number of degrees of freedom, three types of dispersion arise: intergroup(mean square among - MSA), intragroup(mean square within - MSW) and full(mean square total - MST):

Despite the fact that the main purpose of analysis of variance is to compare mathematical expectations With groups in order to identify the effect of experimental conditions, its name is due to the fact that the main tool is the analysis of variances of various types. If the null hypothesis is true, and between the mathematical expectations With groups there are no significant differences, all three variances - MSA, MSW and MST - are variance estimates σ 2 inherent in the analyzed data. Thus, to test the null hypothesis H 0: μ 1 = μ 2 = ... = μ s and alternative hypothesis H 1: not all μ j are the same j = 1, 2, …, With), it is necessary to calculate statistics F-criterion, which is the ratio of two variances, MSA and MSW. Test F-statistics in one-way analysis of variance

Statistics F-subject to criteria F-distribution with s – 1 degrees of freedom in the numerator M.S.A. And n – s degrees of freedom in the denominator M.S.W.. For a given significance level α, the null hypothesis is rejected if the calculated F FU, inherent F-distribution with s – 1 n – s degrees of freedom in the denominator. Thus, as shown in Fig. 4, the decision rule is formulated as follows: null hypothesis H 0 rejected if F>FU; otherwise it is not rejected.

Rice. 4. Critical area of ​​analysis of variance when testing a hypothesis H 0

If the null hypothesis H 0 is true, calculated F-statistics is close to 1, since its numerator and denominator are estimates of the same quantity - the dispersion σ 2 inherent in the analyzed data. If the null hypothesis H 0 is false (and there is a significant difference between the mathematical expectations of different groups), calculated F-statistic will be much larger than one because its numerator, MSA, estimates, in addition to the natural variability of the data, the effect of the experimental conditions or the difference between groups, while the denominator MSW estimates only the natural variability of the data. Thus, the ANOVA procedure is F-criterion in which, at a given significance level α, the null hypothesis is rejected if the calculated F-statistics are greater than the upper critical value FU, inherent F-distribution with s – 1 degrees of freedom in the numerator and n – s degrees of freedom in the denominator, as shown in Fig. 4.

To illustrate one-way analysis of variance, let's return to the scenario outlined at the beginning of the note. The purpose of the experiment is to determine whether parachutes woven from synthetic fibers obtained from different suppliers have the same strength. Each group has five parachutes. The groups are divided by supplier - Supplier 1, Supplier 2, Supplier 3 and Supplier 4. The strength of parachutes is measured using a special device that tests the fabric for tearing on both sides. The force required to break a parachute is measured on a special scale. The higher the breaking force, the stronger the parachute. Excel allows you to analyze F-statistics in one click. Go through the menu DataData analysis, and select the line One-way ANOVA, fill out the window that opens (Fig. 5). The experimental results (breaking strength), some descriptive statistics and the results of one-way analysis of variance are presented in Fig. 6.

Rice. 5. Window One-Way Analysis of Variance Analysis Package Excel

Rice. 6. Strength indicators of parachutes woven from synthetic fibers obtained from different suppliers, descriptive statistics and results of one-way analysis of variance

Analysis of Figure 6 shows that there is some difference between the sample means. The average strength of fibers obtained from the first supplier is 19.52, from the second - 24.26, from the third - 22.84 and from the fourth - 21.16. Is this difference statistically significant? The distribution of rupture force is demonstrated in the scatter plot (Fig. 7). It clearly shows differences both between and within groups. If each group were larger in size, a stem-and-leaf diagram, box plot, or bell plot could be used to analyze them.

Rice. 7. Diagram of strength dispersion for parachutes woven from synthetic fibers obtained from four suppliers.

The null hypothesis states that there are no significant differences between the average strength scores: H 0: μ 1 = μ 2 = μ 3 = μ 4. An alternative hypothesis is that there is at least one supplier whose average fiber strength differs from the others: H 1: not all μ j are the same ( j = 1, 2, …, With).

Overall average (see Fig. 6) = AVERAGE(D12:D15) = 21.945; to determine, you can also average all 20 original numbers: = AVERAGE(A3:D7). Variance values ​​are calculated Analysis package and are reflected in the plate Analysis of variance(see Fig. 6): SSA = 63.286, SSW = 97.504, SST = 160.790 (see column SS tables Analysis of variance Figure 6). The averages are calculated by dividing these sums of squares by the appropriate number of degrees of freedom. Because the With= 4, a n= 20, we obtain the following values ​​of degrees of freedom; for SSA: s – 1= 3; for SSW: n–c= 16; for SST: n – 1= 19 (see column df). Thus: MSA = SSA / ( s – 1)= 21.095; MSW = SSW / ( n–c) = 6.094; MST = SST / ( n – 1) = 8.463 (see column MS). F-statistics = MSA / MSW = 3.462 (see column F).

Upper critical value FU, characteristic of F-distribution, determined by the formula =F.OBR(0.95;3;16) = 3.239. Parameters of the function =F.OBR(): α = 0.05, the numerator has three degrees of freedom, and the denominator has 16. Thus, the calculated F-statistic equal to 3.462 exceeds the upper critical value FU= 3.239, the null hypothesis is rejected (Fig. 8).

Rice. 8. Critical region of analysis of variance at a significance level of 0.05 if the numerator has three degrees of freedom and the denominator is -16

R-value, i.e. the probability that if the null hypothesis is true F-statistics not less than 3.46, equal to 0.041 or 4.1% (see column p-value tables Analysis of variance Figure 6). Since this value does not exceed the significance level α = 5%, the null hypothesis is rejected. Moreover, R-value indicates that the probability of detecting such or a greater difference between the mathematical expectations of the general populations, provided that in fact they are the same, is equal to 4.1%.

So. There is a difference between the four sample means. The null hypothesis was that all mathematical expectations of the four populations are equal. Under these conditions, a measure of the total variability (i.e. total SST variation) of the strength of all parachutes is calculated by summing the squared differences between each observation X ij and overall average . The total variation was then separated into two components (see Fig. 1). The first component was the between-group variation in SSA and the second was the within-group variation in SSW.

What explains the variability in the data? In other words, why are all observations not the same? One reason is that different companies supply fibers of different strengths. This partly explains why the groups have different mathematical expectations: the stronger the effect of the experimental conditions, the greater the difference between the mathematical expectations of the groups. Another reason for data variability is the natural variability of any process, in this case the production of parachutes. Even if all fibers were purchased from the same supplier, their strength would not be the same, all other things being equal. Because this effect occurs within each group, it is called within-group variation.

Differences between sample means are called intergroup variation SSA. Part of the within-group variation, as already indicated, is explained by the belonging of the data to different groups. However, even if the groups were exactly the same (i.e., the null hypothesis was true), between-group variation would still exist. The reason for this is the natural variability of the parachute manufacturing process. Because the samples are different, their sample means differ from each other. Therefore, if the null hypothesis is true, both between- and within-group variability represent an estimate of population variability. If the null hypothesis is false, the between-groups hypothesis will be greater. It is this fact that underlies F-criteria for comparing differences between the mathematical expectations of several groups.

After performing a one-way ANOVA and finding a significant difference between firms, it remains unknown which supplier is significantly different from the others. We only know that the mathematical expectations of the general populations are not equal. In other words, at least one of the mathematical expectations is significantly different from the others. To determine which supplier is different from the others, you can use Tukey procedure, using pairwise comparisons between suppliers. This procedure was developed by John Tukey. Subsequently, he and K. Kramer independently modified this procedure for situations in which sample sizes differ from each other.

Multiple comparison: Tukey-Kramer procedure

In our scenario, one-way analysis of variance was used to compare the strength of parachutes. Having found significant differences between the mathematical expectations of the four groups, it is necessary to determine which groups differ from each other. Although there are several ways to solve this problem, we will only describe the Tukey-Kramer multiple comparison procedure. This method is an example of post hoc comparison procedures because the hypothesis being tested is formulated after data analysis. The Tukey-Kramer procedure allows all pairs of groups to be compared simultaneously. At the first stage, the differences are calculated Xj -Xj, Where j ≠j, between mathematical expectations s(s – 1)/2 groups. Critical scope Tukey-Kramer procedure is calculated by the formula:

Where Q U- the upper critical value of the studentized range distribution, which has With degrees of freedom in the numerator and n - With degrees of freedom in the denominator.

If the sample sizes are not the same, the critical range is calculated for each pair of mathematical expectations separately. At the last stage, each of s(s – 1)/2 pairs of mathematical expectations are compared with the corresponding critical range. Elements of a pair are considered significantly different if the difference modulus | Xj -Xj| between them exceeds the critical range.

Let us apply the Tukey-Kramer procedure to the problem of the strength of parachutes. Since the parachute company has four suppliers, there are 4(4 – 1)/2 = 6 pairs of suppliers to check (Figure 9).

Rice. 9. Pairwise comparisons of sample means

Since all groups have the same volume (i.e. all n j = n j), it is enough to calculate only one critical range. To do this, according to the table ANOVA(Fig. 6) we determine the value MSW = 6.094. Then we find the value Q U at α = 0.05, With= 4 (number of degrees of freedom in the numerator) and n- With= 20 – 4 = 16 (the number of degrees of freedom in the denominator). Unfortunately, I did not find the corresponding function in Excel, so I used the table (Fig. 10).

Rice. 10. Critical value of the studentized range Q U

We get:

Since only 4.74 > 4.47 (see bottom table of Fig. 9), a statistically significant difference exists between the first and second supplier. All other pairs have sample means that do not allow us to talk about their differences. Consequently, the average strength of parachutes woven from fibers purchased from the first supplier is significantly less than that of the second.

Necessary conditions for one-way analysis of variance

When solving the problem of the strength of parachutes, we did not check whether the conditions under which it is possible to use a one-factor F-criterion. How do you know if you can use one-factor F-criterion when analyzing specific experimental data? Single factor F-criterion can be applied only if three basic assumptions are met: the experimental data must be random and independent, have a normal distribution, and their variances must be equal.

First guess - randomness and data independence- must always be performed, since the correctness of any experiment depends on the randomness of the choice and/or the randomization process. To avoid biasing the results, it is necessary that data be extracted from With general populations randomly and independently of each other. Similarly, the data should be randomly distributed across With levels of the factor we are interested in (experimental groups). Violation of these conditions can seriously distort the results of the analysis of variance.

Second guess - normality- means that the data is extracted from normally distributed populations. As for t-criteria, one-way analysis of variance based on F-criteria is relatively little sensitive to violation of this condition. If the distribution does not deviate too significantly from normal, the significance level F-criterion changes little, especially if the sample size is large enough. If the condition of normality of distribution is seriously violated, it should be applied.

Third guess - homogeneity of variance- means that the variances of each population are equal to each other (i.e. σ 1 2 = σ 2 2 = ... = σ j 2). This assumption allows one to decide whether to separate or pool within-group variances. If the group sizes are the same, the condition of homogeneity of variance has little effect on the conclusions obtained using F-criteria. However, if the sample sizes are unequal, violation of the equality of variances condition can seriously distort the results of the analysis of variance. Therefore, efforts should be made to ensure that sample sizes are equal. One of the methods for checking the assumption of homogeneity of variance is the criterion Levene described below.

If, of all three conditions, only the condition of homogeneity of variance is violated, a procedure similar to t-criterion using separate variance (for more details, see). However, if the assumptions of normal distribution and homogeneity of variance are violated simultaneously, it is necessary to normalize the data and reduce the differences between variances or apply a nonparametric procedure.

Levene's test for testing homogeneity of variance

Although F-the criterion is relatively resistant to violations of the condition of equality of variances in groups; a gross violation of this assumption significantly affects the level of significance and power of the criterion. Perhaps one of the most powerful is the criterion Levene. To check equality of variances With general populations, we will test the following hypotheses:

Н 0: σ 1 2 = σ 2 2 = … = σj 2

H 1: Not all σ j 2 are the same ( j = 1, 2, …, With)

The modified Levene's test is based on the statement that if the variability in groups is the same, analysis of variance of the absolute values ​​of the differences between observations and group medians can be used to test the null hypothesis of equality of variances. So, you should first calculate the absolute values ​​of the differences between observations and medians in each group, and then perform a one-way analysis of variance on the resulting absolute values ​​of the differences. To illustrate Levene's criterion, let's return to the scenario outlined at the beginning of the note. Using the data presented in Fig. 6, we will conduct a similar analysis, but in relation to the modules of differences in the initial data and medians for each sample separately (Fig. 11).

Latest materials in the section:

Course work: Variance analysis Multivariate analysis of variance
Course work: Variance analysis Multivariate analysis of variance

Variance analysis is a set of statistical methods designed to test hypotheses about the relationship between certain characteristics and...

How to find the least common multiple, nok for two or more numbers
How to find the least common multiple, nok for two or more numbers

Finding the LCM In order to find the common denominator when adding and subtracting fractions with different denominators, you need to know and be able to...

Cases of reduction to the simplest form Forms of equilibrium equations of a plane system of forces
Cases of reduction to the simplest form Forms of equilibrium equations of a plane system of forces

Let several pairs of forces with moments acting in different planes be simultaneously applied to a rigid body. Is it possible to give this system of pairs...