Skewness and kurtosis of the distribution of a random variable. Calculation of skewness and kurtosis of an empirical distribution in Excel Kurtosis coefficient of a normal distribution

Asymmetry coefficient shows the “skewness” of the distribution series relative to the center:

where is the third-order central moment;

– cube of standard deviation.

For this calculation method: if , the distribution is right-sided (positive asymmetry), if , the distribution is left-sided (negative asymmetry)

In addition to the central moment, asymmetry can be calculated using the mode or median:

or , (6.69)

For this calculation method: if , the distribution is right-sided (positive asymmetry), if , the distribution is left-sided (negative asymmetry) (Fig. 4).


Rice. 4. Asymmetric distributions

The value showing the “steepness” of the distribution is called kurtosis coefficient:

If , in the distribution there is pointedness – kurtosis is positive if , is observed in the distribution flatness – kurtosis is negative (Fig. 5).

Rice. 5. Distribution excesses

Example 5. There is data on the number of sheep on farms in the region (Table 9).

1. Average number of sheep per farm.

3. Median.

4. Variation indicators

· dispersion;

· standard deviation;

· the coefficient of variation.

5. Indicators of asymmetry and kurtosis.

Solution.

1. Since the value of the options in the aggregate is repeated several times, with a certain frequency to calculate the average value we use the weighted arithmetic average formula:

2. This series is discrete, so the mode will be the option with the highest frequency - .

3. This series is even, in this case the median for a discrete series is found using the formula:

That is, half of the farms in the study population have up to 4.75 thousand heads of sheep. and half are above this number.

4. To calculate the variation indicators, we will draw up table 10, in which we will calculate the deviations, the squares of these deviations, the calculation can be carried out using both simple and weighted calculation formulas (in the example we use a simple one):

Table 10

2,00 -2,42 5,84
2,50 -1,92 3,67
2,50 -1,92 3,67
3,00 -1,42 2,01
3,00 -1,42 2,01
4,00 -0,42 0,17
5,50 1,08 1,17
5,50 1,08 1,17
5,50 1,08 1,17
6,00 1,58 2,51
6,50 2,08 4,34
7,00 2,58 6,67
Total 53,00 0,00 34,42
Average 4,4167

Let's calculate the variance:

Let's calculate the standard deviation:

Let's calculate the coefficient of variation:

5. To calculate the indicators of asymmetry and kurtosis, we will build table 11, in which we will calculate , ,

Table 11

2,00 -2,42 -14,11 34,11
2,50 -1,92 -7,04 13,50
2,50 -1,92 -7,04 13,50
3,00 -1,42 -2,84 4,03
3,00 -1,42 -2,84 4,03
4,00 -0,42 -0,07 0,03
5,50 1,08 1,27 1,38
5,50 1,08 1,27 1,38
5,50 1,08 1,27 1,38
6,00 1,58 3,97 6,28
6,50 2,08 9,04 18,84
7,00 2,58 17,24 44,53
Total 53,00 0,00 0,11 142,98
Average 4,4167

The skewness of the distribution is:

That is, left-sided asymmetry is observed, since , which is confirmed when calculated using the formula:

In this case, which for this formula also indicates left-sided asymmetry

The kurtosis of the distribution is equal to:

In our case, the kurtosis is negative, that is, flatness is observed.

Example 6. Data on workers' wages are presented for the household (Table 12)

Solution.

For an interval variation series, the mode is calculated using the formula:

Where modal interval – interval with the highest frequency, in our case 3600-3800, with frequency

Minimum modal interval limit (3600);

Modal interval value (200);

Interval frequency preceding modal interval (25);

Frequency following modal interval (29);

Modal interval frequency (68).

Table 12

For an interval variation series, the median is calculated using the formula:

Where median interval this is an interval whose cumulative (accumulated) frequency is equal to or greater than half the sum of frequencies, in our example it is 3600-3800.

Minimum limit of the median interval (3600);

Median interval value (200);

Sum of frequencies of the series (154);

Sum of accumulated frequencies, all intervals preceding the median (57);

– frequency of the median interval (68).

Example 7. For three farms in one district, there is information on the capital intensity of production (the amount of fixed capital costs per 1 ruble of produced products): I – 1.29 rubles, II – 1.32 rubles, III – 1.27 rubles. It is necessary to calculate the average capital intensity.

Solution. Since capital intensity is the inverse indicator of capital turnover, we use the harmonic average simple formula.

Example 8. For three farms in one district, there is data on the gross grain harvest and average yield (Table 13).

Solution. Calculating the average yield using the arithmetic mean is impossible, since there is no information on the number of sown areas, so we use the weighted harmonic mean formula:

Example 9. There is data on the average potato yield in individual areas and the number of hillings (Table 14)

Table 14

Let's group the data (Table 15):

Table 15

Grouping areas based on the number of weedings

1. Calculate the total variance of the sample (Table 16).

When analyzing variation series, the displacement from the center and the steepness of the distribution are characterized by special indicators. Empirical distributions, as a rule, are shifted from the center of the distribution to the right or left, and are asymmetric. The normal distribution is strictly symmetrical about the arithmetic mean, which is due to the parity of the function.

Skewness of distribution arises due to the fact that some factors act more strongly in one direction than in another, or the process of development of the phenomenon is such that some cause dominates. In addition, the nature of some phenomena is such that there is an asymmetrical distribution.

The simplest measure of asymmetry is the difference between the arithmetic mean, mode and median:

To determine the direction and magnitude of the shift (asymmetry) of the distribution, it is calculated asymmetry coefficient , which is a normalized moment of third order:

As= 3 / 3, where  3 is the third-order central moment;  3 – standard deviation cubed. 3 = (m 3 – 3m 1 m 2 + 2m 1 3)k 3 .

For left-sided asymmetry asymmetry coefficient (As<0), при правосторонней (As>0) .

If the top of the distribution is shifted to the left and the right part of the branch turns out to be longer than the left, then such asymmetry is right-sided, otherwise left-handed .

The relationship between the mode, median and arithmetic mean in symmetric and asymmetric series allows us to use a simpler indicator as a measure of asymmetry asymmetry coefficient Pearson :

K a = ( –Mo)/. If K a >0, then the asymmetry is right-sided, if K a<0, то асимметрия левосторонняя, при К a =0 ряд считается симметричным.

Asymmetry can be more accurately determined using the third-order central moment:

, where 3 = (m 3 – 3m 1 m 2 + 2m 1 3)k 3 .

If > 0, then the asymmetry can be considered significant if < 0,25 асимметрию можно считать не значительной.

To characterize the degree of deviation of a symmetric distribution from a normal distribution along the ordinate, an indicator of peakiness, the steepness of the distribution, called excess :

Ex = ( 4 / 4) – 3, where:  4 – fourth-order central moment.

For a normal distribution, Ex = 0, i.e.  4 / 4 = 3.  4 = (m 4 – 4m 3 m 1 + 6m 2 m 2 1 – 3 m 4 1)* k 4 .

High-peak curves have a positive kurtosis, while low-peak curves have a negative kurtosis (Fig. D.2).

Indicators of kurtosis and skewness are necessary in statistical analysis to determine the heterogeneity of the population, the asymmetry of the distribution, and the proximity of the empirical distribution to the normal law. With significant deviations of the asymmetry and kurtosis indicators from zero, the population cannot be considered homogeneous, and the distribution close to normal. Comparison of actual curves with theoretical ones allows one to mathematically substantiate the obtained statistical results, establish the type and nature of the distribution of socio-economic phenomena, and predict the likelihood of the occurrence of the events being studied.

4.7. Justification of the closeness of the empirical (actual) distribution to the theoretical normal distribution. Normal distribution (Gauss-Laplace law) and its characteristics. "The Three Sigma Rule." Goodness-of-fit criteria (using the example of the Pearson or Kolgomogorov criterion).

You can notice a certain connection in the change in the frequencies and values ​​of the varying characteristic. As the value of the attribute increases, the frequencies first increase and then, after reaching a certain maximum value, decrease. Such regular changes in frequencies in variation series are called distribution patterns.

To identify a distribution pattern, it is necessary that the variation series contain a sufficiently large number of units, and that the series themselves represent qualitatively homogeneous populations.

A distribution polygon constructed based on actual data is empirical (actual) distribution curve, reflecting not only objective (general), but also subjective (random) distribution conditions that are not characteristic of the phenomenon being studied.

In practical work, the distribution law is found by comparing the empirical distribution with one of the theoretical ones and assessing the degree of difference or correspondence between them. Theoretical distribution curve reflects in its pure form, without taking into account the influence of random factors, the general pattern of frequency distribution (distribution density) depending on the values ​​of varying characteristics.

Various types of theoretical distributions are common in statistics: normal, binomial, Poisson, etc. Each of the theoretical distributions has its own specifics and scope.

Normal distribution law characteristic of the distribution of equally probable events occurring during the interaction of many random factors. The law of normal distribution underlies statistical methods for estimating distribution parameters, representativeness of sample observations, and measuring the relationship of mass phenomena. To check how well the actual distribution corresponds to the normal one, it is necessary to compare the frequencies of the actual distribution with the theoretical frequencies characteristic of the normal distribution law. These frequencies are a function of normalized deviations. Therefore, based on the data of the empirical distribution series, normalized deviations t are calculated. Then the corresponding theoretical frequencies are determined. This flattens the empirical distribution.

Normal distribution or the Gauss-Laplace law is described by the equation
, where y t is the ordinate of the normal distribution curve, or the frequency (probability) of the value x of the normal distribution; – mathematical expectation (average value) of individual x values. If the values ​​(x – ) measure (express) in terms of standard deviation , i.e. in standardized (normalized) deviations t = (x – )/, then the formula will take the form:
. The normal distribution of socio-economic phenomena in its pure form is rare, however, if the homogeneity of the population is maintained, the actual distributions are often close to normal. The pattern of distribution of the studied quantities is revealed by checking the compliance of the empirical distribution with the theoretical normal distribution law. To do this, the actual distribution is aligned with the normal curve and calculated consent criteria .

The normal distribution is characterized by two significant parameters that determine the center of grouping of individual values ​​and the shape of the curve: the arithmetic mean and standard deviation . Normal distribution curves differ in the position of the distribution center on the x-axis and a scatter option around this center  (Fig. 4.1 and 4.2). A feature of the normal distribution curve is its symmetry relative to the center of the distribution - on both sides of its middle, two uniformly decreasing branches are formed, asymptotically approaching the abscissa axis. Therefore, in a normal distribution, the mean, mode and median are the same: = Mo = Me.

  x

The normal distribution curve has two inflection points (transition from convexity to concavity) at t = 1, i.e. when options deviate from the average (x – ), equal to the standard deviation . Within  with a normal distribution is 68.3%, within 2 – 95.4%, within 3 – 99.7% of the number of observations or frequencies of the distribution series. In practice, there are almost no deviations exceeding 3therefore, the given relationship is called “ three sigma rule ».

To calculate theoretical frequencies, the formula is used:

.

Magnitude
is a function of t or the density of the normal distribution, which is determined from a special table, excerpts from which are given in table. 4.2.

Normal distribution density values ​​Table 4.2

Graph in Fig. 4.3 clearly demonstrates the closeness of the empirical (2) and normal (1) distributions.

Rice. 4.3. Distribution of postal service branches by number

workers: 1 – normal; 2 – empirical

To mathematically substantiate the closeness of the empirical distribution to the law of normal distribution, calculate consent criteria .

Kolmogorov criterion - a goodness-of-fit criterion that allows one to assess the degree of closeness of the empirical distribution to normal. A. N. Kolmogorov proposed to use the maximum difference between the accumulated frequencies or frequencies of these series to determine the correspondence between the empirical and theoretical normal distributions. To test the hypothesis that the empirical distribution corresponds to the law of normal distribution, the goodness-of-fit criterion = D/ is calculated
, where D is the maximum difference between the cumulative (accumulated) empirical and theoretical frequencies, n is the number of units in the population. Using a special table, P() is determined - the probability of achieving , which means that if a variational characteristic is distributed according to a normal law, then For random reasons, the maximum discrepancy between the empirical and theoretical accumulated frequencies will be no less than the actually observed one. Based on the value of P(), certain conclusions are drawn: if the probability P() is sufficiently large, then the hypothesis that the actual distribution corresponds to the normal law can be considered confirmed; if the probability P() is small, then the null hypothesis is rejected, and the discrepancies between the actual and theoretical distributions are considered significant.

Probability values ​​for the goodness-of-fit criterion  Table 4.3

Pearson criteria 2 (“chi-square”) - goodness-of-fit criterion that allows one to assess the degree of closeness of the empirical distribution to normal:
,where f i, f" i are the frequencies of the empirical and theoretical distributions in a certain interval. The greater the difference between the observed and theoretical frequencies, the greater the criterion  2. To distinguish the significance of differences in the frequencies of the empirical and theoretical distributions according to the criterion  2 from differences due to chance samples, the calculated value of the criterion  2 calc is compared with the tabulated  2 table with the appropriate number of degrees of freedom and a given significance level. The significance level is selected so that P( 2 calc > 2 tab) = . hl, Where h– number of groups; l– the number of conditions that must be met when calculating theoretical frequencies. To calculate the theoretical frequencies of the normal distribution curve using the formula
you need to know three parameters , , f, therefore the number of degrees of freedom is h–3. If  2 calc > 2 tab, i.e.  2 falls into the critical region, then the discrepancy between the empirical and theoretical frequencies is significant and cannot be explained by random fluctuations in the sample data. In this case, the null hypothesis is rejected. If  2 calculation  2 tables, i.e. the calculated criterion does not exceed the maximum possible divergence of frequencies that can arise due to chance, then in this case the hypothesis about the correspondence of the distributions is accepted. The Pearson criterion is effective with a significant number of observations (n50), and the frequencies of all intervals must number at least five units (with a smaller number, the intervals are combined), and the number of intervals (groups) must be large (h>5), since the estimate  2 depends on the number of degrees of freedom.

Romanovsky criterion - a goodness-of-fit criterion that allows one to assess the degree of closeness of the empirical distribution to normal. V.I. Romanovsky proposed to evaluate the closeness of the empirical distribution to the normal distribution curve in relation to:

, where h is the number of groups.

If the ratio is greater than 3, then the discrepancy between the frequencies of the empirical and normal distributions cannot be considered random and the hypothesis of a normal distribution law should be rejected. If the ratio is less than or equal to 3, then we can accept the hypothesis that the data distribution is normal.

To obtain an approximate idea of ​​the shape of the distribution of a random variable, a graph of its distribution series (polygon and histogram), function or distribution density is plotted. In the practice of statistical research one encounters very different distributions. Homogeneous populations are characterized, as a rule, by single-vertex distributions. Multivertex indicates the heterogeneity of the population being studied. In this case, it is necessary to regroup the data in order to identify more homogeneous groups.

Determining the general nature of the distribution of a random variable involves assessing the degree of its homogeneity, as well as calculating the indicators of asymmetry and kurtosis. In a symmetric distribution, in which the mathematical expectation is equal to the median, i.e. , it can be considered that there is no asymmetry. But the more noticeable the asymmetry, the greater the deviation between the characteristics of the distribution center - the mathematical expectation and the median.

The simplest coefficient of asymmetry of the distribution of a random variable can be considered , where is the mathematical expectation, is the median, and is the standard deviation of the random variable.

In the case of right-sided asymmetry, left-sided asymmetry. If , the asymmetry is considered to be low, if - medium, and at - high. A geometric illustration of right- and left-sided asymmetry is shown in the figure below. It shows graphs of the distribution density of the corresponding types of continuous random variables.

Drawing. Illustration of right- and left-sided asymmetry in density plots of distributions of continuous random variables.

There is another coefficient of asymmetry of the distribution of a random variable. It can be proven that a non-zero central moment of an odd order indicates an asymmetry in the distribution of the random variable. In the previous indicator we used an expression similar to the first order moment. But usually in this other asymmetry coefficient the third-order central moment is used , and in order for this coefficient to become dimensionless, it is divided by the cube of the standard deviation. The resulting asymmetry coefficient is: . For this asymmetry coefficient, as for the first one in the case of right-sided asymmetry, left-sided - .

Kurtosis of a random variable

The kurtosis of the distribution of a random variable characterizes the degree of concentration of its values ​​near the center of the distribution: the higher the concentration, the higher and narrower the density graph of its distribution will be. The kurtosis (sharpness) indicator is calculated using the formula: , where is the central moment of the 4th order, and is the standard deviation raised to the 4th power. Since the powers of the numerator and denominator are the same, kurtosis is a dimensionless quantity. In this case, it is accepted as the standard of absence of kurtosis, zero kurtosis, to take the normal distribution. But it can be proven that for a normal distribution . Therefore, in the formula for calculating kurtosis, the number 3 is subtracted from this fraction.

Thus, for a normal distribution the kurtosis is zero: . If the kurtosis is greater than zero, i.e. , then the distribution is more peaked than normal. If the kurtosis is less than zero, i.e. , then the distribution is less peaked than normal. The limiting value of negative kurtosis is the value of ; the magnitude of positive kurtosis can be infinitely large. What graphs of peaked and flat-topped distribution densities of random variables look like in comparison with a normal distribution is shown in the figure.

Drawing. Illustration of peaked and flat-topped density distributions of random variables compared to the normal distribution.

The asymmetry and kurtosis of the distribution of a random variable show how much it deviates from the normal law. For large asymmetries and kurtosis, calculation formulas for normal distribution should not be used. What is the level of admissibility of asymmetry and kurtosis for the use of normal distribution formulas in the analysis of data for a specific random variable should be determined by the researcher based on his knowledge and experience.

Definition. Fashion M 0 of a discrete random variable is called its most probable value. For a continuous random variable, mode is the value of the random variable at which the distribution density has a maximum.

If the distribution polygon for a discrete random variable or the distribution curve for a continuous random variable has two or more maxima, then such a distribution is called bimodal or multimodal.

If a distribution has a minimum but no maximum, then it is called antimodal.

Definition. Median M D of a random variable X is its value relative to which it is equally probable that a larger or smaller value of the random variable will be obtained.

Geometrically, the median is the abscissa of the point at which the area limited by the distribution curve is divided in half.

Note that if the distribution is unimodal, then the mode and median coincide with the mathematical expectation.

Definition. The starting moment order k random variable X is the mathematical expectation of the value X k .

For a discrete random variable: .

.

The initial moment of the first order is equal to the mathematical expectation.

Definition. Central moment order k random variable X is the mathematical expectation of the value

For a discrete random variable: .

For a continuous random variable: .

The first order central moment is always zero, and the second order central moment is equal to the dispersion. The third-order central moment characterizes the asymmetry of the distribution.

Definition. The ratio of the central moment of the third order to the standard deviation to the third power is called asymmetry coefficient.

Definition. To characterize the peakedness and flatness of the distribution, a quantity called excess.

In addition to the quantities considered, the so-called absolute moments are also used:

Absolute starting moment: .

Absolute central point: .

Quantile , corresponding to a given level of probability R, is the value at which the distribution function takes a value equal to R, i.e. Where R- specified level of probability.

In other words quantile there is a value of a random variable at which

Probability R, specified as a percentage, gives the name to the corresponding quantile, for example, it is called the 40% quantile.

20. Mathematical expectation and dispersion of the number of occurrence of an event in independent experiments.

Definition. Mathematical expectation a continuous random variable X, the possible values ​​of which belong to the segment , is called a definite integral

If possible values ​​of a random variable are considered on the entire numerical axis, then the mathematical expectation is found by the formula:

In this case, of course, it is assumed that the improper integral converges.

Mathematical expectation A discrete random variable is the sum of the products of its possible values ​​and their corresponding probabilities:

M(X) =X 1 R 1 +X 2 R 2 + … +X P R P . (7.1)

If the number of possible values ​​of a random variable is infinite, then
, if the resulting series converges absolutely.

Note 1. The mathematical expectation is sometimes called weighted average, since it is approximately equal to the arithmetic mean of the observed values ​​of the random variable over a large number of experiments.

Note 2. From the definition of mathematical expectation it follows that its value is no less than the smallest possible value of a random variable and no more than the largest.

Note 3. The mathematical expectation of a discrete random variable is non-random(constant. We will see later that the same is true for continuous random variables.

Properties of mathematical expectation.

    The mathematical expectation of a constant is equal to the constant itself:

M(WITH) =WITH.(7.2)

Proof. If we consider WITH as a discrete random variable taking only one value WITH with probability R= 1, then M(WITH) =WITH·1 = WITH.

    The constant factor can be taken out of the mathematical expectation sign:

M(CX) =CM(X). (7.3)

Proof. If the random variable X given by distribution series

x i

x n

p i

p n

then the distribution series for CX has the form:

WITHx i

WITHx 1

WITHx 2

WITHx n

p i

p n

Then M(CX) =Cx 1 R 1 +Cx 2 R 2 + … +Cx P R P =WITH(X 1 R 1 +X 2 R 2 + … +X P R P) =CM(X).

Mathematical expectation continuous random variable is called

(7.13)

Note 1. The general definition of variance remains the same for a continuous random variable as for a discrete one (def. 7.5), and the formula for calculating it has the form:

(7.14)

The standard deviation is calculated using formula (7.12).

Note 2. If all possible values ​​of a continuous random variable do not fall outside the interval [ a, b], then the integrals in formulas (7.13) and (7.14) are calculated within these limits.

Theorem. The variance of the number of occurrences of an event in independent trials is equal to the product of the number of trials and the probabilities of the occurrence and non-occurrence of an event in one trial: .

Proof. Let be the number of occurrences of the event in independent trials. It is equal to the sum of occurrences of the event in each trial: . Since the tests are independent, the random variables – are independent, therefore .

As shown above, , and .

Then ah .

In this case, as mentioned earlier, the standard deviation is .

When analyzing the population distribution, of significant interest is the assessment of the deviation of a given distribution from symmetrical, or, in other words, its skewness. The degree of skewness (asymmetry) is one of the most important properties of population distribution. There are a number of statistics designed to calculate asymmetry. All of them meet at least two requirements for any skewness indicator: it must be dimensionless and equal to zero if the distribution is symmetrical.

In Fig. 2 a, b show curves of two asymmetric population distributions, one of which is skewed to the left, and the other to the right. The relative position of the mode, median and mean is shown qualitatively. It can be seen that one of the possible skewness indicators can be constructed taking into account the distance at which the mean and mode are located from each other. But taking into account the complexity of determining the mode from empirical data, and on the other hand, the well-known relationship (3) between mode, median and average, the following formula was proposed for calculating the asymmetry index:

From this formula it follows that distributions skewed to the left have positive skewness, and distributions skewed to the right have negative skewness. Naturally, for symmetric distributions, for which the mean and median coincide, the asymmetry is zero.

Let us calculate the asymmetry indicators for the data given in table. 1 and 2. For the distribution of the duration of the cardiac cycle we have:

Thus, this distribution is slightly left-skewed. The obtained value for asymmetry is approximate and not exact, since the values ​​and calculated in a simplified way were used to calculate it.

For the distribution of sulfhydryl groups in blood serum we have:

Thus, this distribution has a negative skewness, i.e. skewed to the right.

Theoretically, it is shown that the value determined by formula 13 lies within 3. But in practice, this value very rarely reaches its limiting values, and for moderately asymmetric single-vertex distributions its absolute value is usually less than one.

The asymmetry indicator can be used not only for a formal description of the population distribution, but also for a meaningful interpretation of the data obtained.

In fact, if the characteristic we observe is formed under the influence of a large number of causes independent of each other, each of which makes a relatively small contribution to the value of this characteristic, then, in accordance with some theoretical premises discussed in the section on probability theory, we have the right to expect that the population distribution obtained as a result of the experiment will be symmetrical. However, if a significant asymmetry value is obtained for the experimental data (the numerical value of As modulo is within a few tenths), then it can be assumed that the conditions specified above are not met.

In this case, it makes sense to assume either the existence of one or two factors, the contribution of which to the formation of the value observed in the experiment is significantly greater than the others, or to postulate the presence of a special mechanism that is different from the mechanism of the independent influence of many causes on the value of the observed characteristic.

So, for example, if changes in a quantity of interest to us, corresponding to the action of a certain factor, are proportional to this value itself and the intensity of the action of the cause, then the resulting distribution will always be skewed to the left, i.e. have a positive skewness. Biologists, for example, encounter such a mechanism when estimating quantities associated with the growth of plants and animals.

Another way to assess skewness is based on the method of moments, which will be discussed in Chapter 44. In accordance with this method, skewness is calculated by using the sum of the deviations of all values ​​of a data series relative to the average, raised to the third power, i.e.:

The third power ensures that the numerator of this expression is equal to zero for symmetric distributions, since in this case the sums of deviations up and down from the average to the third power will be equal and have opposite signs. Dividing by provides dimensionlessness for the asymmetry measure.

Formula (14) can be transformed as follows. In the previous paragraph, standardized values ​​were introduced:

Thus, the measure of skewness is the average of the standardized data cubed.

For the same data for which asymmetry was calculated using formula (13), we find the indicator using formula (15). We have:

Naturally, the asymmetry indicators calculated using different formulas differ from each other in magnitude, but equally indicate the nature of the skewness. In application packages for statistical analysis, when calculating asymmetry, formula (15) is used as it gives more accurate values. For preliminary calculations using simple calculators, you can use formula (13).

Excess. So, we have examined three of the four groups of indicators with the help of which population distributions are described. The last of them is a group of indicators of peakiness, or kurtosis (from the Greek - humpbacked). To calculate one of the possible indicators of kurtosis, the following formula is used:

Using the same approach that was applied when transforming the asymmetry formula (14) it is easy to show that:

Theoretically, it was shown that the value of kurtosis for a normal (Gaussian) distribution curve, which plays a large role in statistics, as well as in probability theory, is numerically equal to 3. Based on a number of considerations, the sharpness of this curve is taken as a standard, and therefore as an indicator of kurtosis use the value:

Let's find the peak value for the data given in table. 1. We have:

Thus, the distribution curve of the duration of cardiac cycles is flattened compared to the normal curve, for which.

In table Figure 3 shows the distribution of the number of marginal flowers in one of the chrysanthemum species. For this distribution

Kurtosis can take on very large values, as can be seen from the example given, but its lower limit cannot be less than one. It turns out that if the distribution is bimodal, then the kurtosis value approaches its lower limit, so it tends to -2. Thus, if as a result of calculations it turns out that the value is less than -1-1.4, we can be sure that the population distribution at our disposal is at least bimodal. This is especially important to take into account when experimental data, bypassing the pre-processing stage, are analyzed using a digital computer and the researcher does not have a direct graphical representation of the population distribution in front of his eyes.

The two-peaked distribution curve of experimental data can arise for many reasons. In particular, such a distribution can appear by combining two sets of heterogeneous data into a single set. To illustrate this, we artificially combined data on the width of shells of two types of fossil mollusks into one set (Table 4, Fig. 3).

The figure clearly shows the presence of two modes, since two sets of data from different populations are mixed. The calculation gives for the kurtosis value 1.74, and therefore = -1.26. Thus, the calculated value of the peak index indicates, in accordance with the previously stated position, that the distribution has two peaks.

There is one caveat here. Indeed, in all cases when the population distribution has two maxima, the kurtosis value will be close to unity. However, this fact cannot automatically lead to the conclusion that the analyzed data set is a mixture of two heterogeneous samples. Firstly, such a mixture, depending on the number of its constituent aggregates, may not have two peaks, and the kurtosis index will be significantly greater than one. Secondly, a homogeneous sample can have two modes if, for example, the requirements for the selection of experimental data are violated. Thus, in this, as in other cases, after the formal calculation of various statistics, a thorough professional analysis must be carried out, which will allow the data obtained to be given a meaningful interpretation.

Latest materials in the section:

Research
Research work "Crystals" What is called a crystal

CRYSTALS AND CRYSTALLOGRAPHY Crystal (from the Greek krystallos - “transparent ice”) was originally called transparent quartz (rock crystal),...

"Sea" idioms in English

“Hold your horses!” - a rare case when an English idiom is translated into Russian word for word. English idioms are an interesting...

Henry the Navigator: biography and interesting facts
Henry the Navigator: biography and interesting facts

The Portuguese prince Enrique the Navigator made many geographical discoveries, although he himself went to sea only three times. He started...