**Author: Luis Latorre Lloréns**

What is a copula? An heuristic approach would be to say that copulas are multivariate probability distributions that embody rules determining the dependency between the random variables involved. So, the way in which these variates depend on each other lies inside the copula. If we change the copula, we will also change the way in which the variables depend between them. This suggests that copulas may be useful to model the dependence between random variables or between risks that affect a particular activity.

Before this idea the reader may reply that the correlation coefficient is a measure of statistical dependence and that the role of copulas is performed by the correlation coefficient. But this is true only in some cases. The correlation coefficient only measures dependence when dealing with a special class of distributions, the so-called elliptical distributions, such as normal or t multivariate distributions. Let us imagine a bivariate normal density function with a correlation coefficient of 0.5. If we make an horizontal cut to the surface that represents this 3-dimensional function, we will obtain an ellipse. The point is that there are other multivariate distributions (bivariate, to follow the example) where the figure obtained by this procedure in no way resembles an ellipse. In these, the correlation coefficient cannot be utilized as a measure of dependence.

The correlation coefficient may lead to erroneous conclusions when applied in situations where it is not suitable as a dependence measure. Let’s look at an example. Each of the following graphs shows 500 random simulations of the losses from two risks, X and Y (i.e.: claims from two different types of events, compensations to pay for two different causes, etc.)

Risk X and risk Y have been simulated from the same distribution function and therefore have the same mean, variance, etc. However, both graphs A and B, differ in something very

important: In A there has not been any result where the sum of the amount of risk X plus the amount of Y exceeds 10 units, while in B there are some experiences that do exceed this threshold (represented in A and B by the straight diagonally descending line). Is this difference due to the fact that the correlation coefficient is greater in B than in A? No. In both cases the coefficient is 0.5. The greater severity of the risks (X, Y) in graph B stems from the different way in which these risks depend on each other, severity that is not reflected by the correlation coefficient.

This example leads us to the conclusion that any bivariate (and, in general, multivariate) distribution function involves two aspects defining the statistical behavior of these random variables:

- The marginal distribution of each one, that is, how it behaves statistically when observed in isolation from the other variable, and
- The joint behavior of both variables (i.e., the dependence structure). The copula is the function (in our example, function of two variables) that specifically regulates such joint behavior; therefore, the copula does not tell us anything about the marginal behavior of those variates.

To obtain the copula from a distribution function, we must “extract” the marginal distributions from it. What remains after this operation is the copula. If we have a distribution function of two variables *F*(*x*,*y*) with marginal distribution functions *u=g(x), v=h(y)*, we know that these functions *g* and *h* determine the function *F*(*x*,*y*) so that in order to “isolate” or leave the copula function alone, we must “undo” the effect that *g *and *h* exert on *F*(*x*,*y*). This can be done in two steps:

First, we obtain the inverse functions *g*^{-1} and *h*^{-1}. This is done by solving *u=g(x)* and* v=h(y)* for *x* and *y *respectively. We thus obtain *x*=* g*^{-1}(*u*), *y*=* h*^{-1}(*v*). We note that *u* and *v* represent the probability corresponding to each value of *x* and *y* according to the marginal distribution function *g* and *h*. So, the range of *u* and *v* is the interval (0 -1).

Next, we insert the expressions x =* g*^{-1}(*u*) and *y*=* h*^{-1}(*v*), into *F*(*x*,*y*). It thus follows that *F*(*x*, *y*) = *F*[*g*^{-1}(*u*), *h*^{-1}(*v*)] is, in short, a function that depends on *u* and *v* that is typically designated by *C*(*u*,*v*), and is called a **copula**.

The copula is, therefore, “what remains” of a distribution function once the marginal behavior of the random variables is eliminated. Therefore it is the element that regulates and determines the dependence structure between those variables. The copula is a function defined for values between 0 and 1 of its independent variables *u*, *v*. Actually, it is the joint distribution function of these uniform variables (0,1) designated by *u*, *v*. The following graph shows the density function of the copula corresponding to a t-distribution with 4 degrees of freedom and a correlation coefficient of 0.6.

We have obtained the copula from a distribution function. In fact, it is also possible to follow the opposite route: To arrive at the distribution function by “connecting” the marginal distribution functions to the copula. That is, we would * symbolically* have:

**Copula + marginal distributions = Joint distribution function**

And also,

**Joint distribution function – Marginal distributions = Copula**

The possibility of combining a particular copula model with different distribution functions and, likewise, a certain set of copulas with different copula models, opens up a variety of different models for describing the joint statistical behavior of two or more random variables (i.e. of joint distribution models).

We have already seen how a copula can be obtained from a known multinomial distribution (Gauss, t-distribution). However, different copula models can also be obtained if we build functions that meet the essential requirements for the result to be considered a copula. Thus, in addition to the aforementioned Gauss and Student copulas, there are many others such as Clayton, Gumbel, Frank, etc., so that the professional has a remarkable variety of options for modeling the phenomenon of interest.

What is, then, the use of copulas? Wherever there is a diversity of risks that may be dependent on each other, there will be scope for attempting to apply copulas. Regulations of financial institutions (banks and insurance companies) tend to include a plurality of risks that affect them in their solvency systems. Our interest will be not only on how these risks are distributed individually or separately (marginal distribution) but also on the way in which they depend on one another. It may be the case that, under normal conditions, there is no dependency between them, but when a crisis occurs, things get worse in all of them. In this situation, the correlation coefficient is not a suitable instrument, while the application of copula theory may in fact provide a satisfactory result. In the European Union regulations on insurance company solvency (Solvency II), the coefficient of correlation is in some cases increased so as to allow for the insufficiency of this coefficient. Despite the advantage of its simplicity, this solution seems too crude to us to be satisfactory. Thus, in such cases it is advisable to consider the application of copula theory, that offers models flexible enough to adapt to a wide variety of phenomena.

The interested reader may use my work entitled *Teoría de Cópulas. Introducción y Aplicaciones a Solvencia II* (*Theory of Copulas. Introduction and Applications to Solvency II, *only available in Spanish) (Fundación MAPFRE) as an introduction to this topic, and subsequently continue with the study of the extensive literature available, proof of the great interest that it deserves among scholars and professionals of these issues.