of 37
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
1. Generalized Additive Models for Conditional Dependence Structures Thibault Vattera,∗ , Val´erie Chavez-Demoulina a Faculty of Business and Economics, University of…
  • 1. Generalized Additive Models for Conditional Dependence Structures Thibault Vattera,∗ , Val´erie Chavez-Demoulina a Faculty of Business and Economics, University of Lausanne, 1015 Lausanne, Switzerland Abstract We develop a generalized additive modeling framework for taking into account the effect of pre- dictors on the dependence structure between two variables. We consider dependence or concor- dance measures that are solely functions of the copula, because they contain no marginal infor- mation: rank correlation coefficients or tail-dependence coefficients represent natural choices. We propose a maximum penalized log-likelihood estimator, derive its root-n-consistency and asymptotic normality, discuss details of the estimation procedure and the selection of the smoothing parameter. Finally, we present the results from a simulation study and apply the new methodology to a real dataset. Using intraday asset returns, we show that an intraday dependence pattern, due to the cyclical nature of market activity, is shaped similarly to the individual conditional second moments. Keywords: Conditional rank correlations, Copula, Penalized log-likelihood, Regression splines, Semi-parametric modeling, Intraday financial returns. 1. INTRODUCTION Generalized additive models (Hastie and Tibshirani 1986) are a natural extension of linear and generalized linear models. Built on roughness penalty smoothing, a generalized additive model (GAM) is a flexible data analysis tool in a traditionally univariate context. Consider ∗ Corresponding author. Phone: +41 21 693 61 04. Postal Address: University of Lausanne, Anthropole Building, Office 3016, 1015 Lausanne, Switzerland. Email addresses: (Thibault Vatter), (Val´erie Chavez-Demoulin) Preprint submitted to Journal of Multivariate Analysis January 29, 2015
  • 2. the following research questions however: • there are relationships between a population’s life expectancy and the country’s GDP, as well as between male and female life expectancy in a given country. Can we measure the effect of the GDP on the later while controlling for the former? • Cellular biologists study how predictor genes coordinate the expression of target genes. Can we further quantify how the association between the targets depends on the predic- tors? • Volatilities of intraday asset returns show periodicities due to the cyclical nature of market activity and macroeconomic news releases. Is this also true for their dependence structure? To obtain a statistically sound answer, we need to extend GAMs to the dependence struc- ture between random variables. In this context, we distinguish between two closely related concepts, namely dependence and concordance. For instance, Pearson’s correlation is often used as a measure of concordance and its absolute value as a measure of dependence. It detects linear relationships between variables, but it depends on their margins and lacks ro- bustness to outliers. Borrowing from Nelsen (1999), two desirable properties of a dependence (respectively concordance) measure are: • invariance to monotone increasing transformations of the margins (up to a sign change if one of the transformations is monotone decreasing); • the existence and uniqueness of a minimum (a zero), which is attained whenever the variables are independent. Essentially, the first property states that such a measure should depend on the copula only. This is the case for rank correlation coefficients (concordance) or tail-dependence coefficients (dependence). Furthermore, convenient mappings between such measures and the parame- ters of common copulas often exist. As such, many conditional dependence and concordance measures, although unobservable, can be modeled directly. 2
  • 3. While copulas are well studied (Joe 1997; Nelsen 1999), their formal developments to conditional distributions have rather recent origins. Patton (2002) first extended the standard theory by imposing a mutual conditioning algebra for each margin and the copula. Fermanian and Wegkamp (2012) relax this mutual algebra requirement and develop the concept of pseudo- copulas under strong mixing assumptions. However, the investigation of modeling with the dependence structure as function of covari- ates has only been recently explored. As often in statistics, it is useful to distinguish between distribution-free and parametric methods. In Gijbels et al. (2011), the authors suggest two kernel-based estimators of conditional copulas and corresponding conditional association mea- sures. While their estimators are useful as descriptive statistics, their framework is limited to a single covariate without formal inference. On the parametric side, Acar et al. (2011) consider a copula parameter that varies with a single covariate. The authors estimate their model using the concept of local likelihood and further suggest a testing framework in Acar et al. (2013). In Craiu and Sabeti (2012), the authors develop Bayesian inference tools for a bivariate copula, conditional on a single covariate, coupling mixed or continuous outcomes. It is extended to multiple covariates in the continuous case by Sabeti et al. (2014). Compared to existing methods, the framework that we develop in this paper benefits di- rectly from the complete GAM toolbox. We consider models where the dependence structure varies with an arbitrary set of covariates in a parametric, non-parametric or semi-parametric way. The structure of the paper is as follows: in Section 2, we develop the theoretical framework of generalized additive models for the dependence structure. We present the general model for the conditional dependence or concordance measure in Section 2.1. In Section 2.2, we state some asymptotic properties of the penalized log-likelihood estimator, namely its root- n-consistency and asymptotic normality. In Section 2.3, we recast the penalized likelihood estimation as an iteratively reweighted generalized ridge regression. We close our theoretical considerations in Section 2.4, by discussing a measure of the penalized model’s effective di- mension and the selection of smoothing parameters. In Section 3, we present a simulation 3
  • 4. study and an application using a real dataset. We analyze the results of the simulation study in Section 3.1. We study the cross-sectional dynamics of intraday asset returns in Section 3.2. We conclude and suggest directions for further work in Section 4. 2. GENERALIZED ADDITIVE MODELS FOR CONDITIONAL DEPENDENCE STRUCTURES In this section, we detail the approach to model the dependence structure between two random variables as a parametric, non-parametric or semi-parametric function of an arbitrary set of exogenous predictors (covariates). To set up the notations, we use double-struck capital letters for real intervals (or cartesian products thereof), except for the expectation operator E. Subscripts and superscripts are integers and Ck (W) denotes functions with k continuous (partial) derivatives on W ⊆ Rl . Let (Y1, Y2) ∈ Y ⊆ R2 be the random variables (responses) of interest, X ∈ X ⊆ Rq be a set of q covariates (predictors) and FYi|X(yi | x) = P(Yi ≤ yi | X = x) (respectively Ui = FYi|X(Yi | X)) be the conditional margins (respectively conditional probability integral transforms) for i ∈ {1, 2}. Assume further that all variables are continuous and have strictly positive densities. In Patton (2002), the author showed an conditional equivalent to Sklar’s theorem (Sklar 1959): for all x ∈ X, there exists a unique conditional copula C(·, · | x) which is the conditional distribution of (U1, U2) | X = x. In other words, we have that C(u1, u2 | x) = P(U1 ≤ u1, U2 ≤ u2 | X = x) = FY1,Y2|X F−1 Y1|X(u1 | x), F−1 Y2|X(u2 | x) | x . (1) Remark. Because the conditioning vector is the same for the two margins and copula, this definition is similar to that of Patton (2002). In time series, conditioning algebras are usually augmented with past observations. Therefore, the concept of conditional copulas can be rather 4
  • 5. restrictive (see Fermanian and Wegkamp 2012). However, conditional copulas and exogeneity of the predictors are sufficient to develop a regression-like theory for the dependence structure. The joint conditional density is ∂FY1|X (y1 | x) ∂y1 ∂FY2|X (y2 | x) ∂y2 ∂2 C (u1, u2 | x) ∂u1∂u2 , where ui = FYi|X (yi | x) for i ∈ {1, 2}. The log-likelihood function is then a sum of three terms and inference for the margins and copula can be done separately. In the unconditional case, the efficiency loss incurred by the two-step procedure (compared to full likelihood maximization) is small (cf. Joe 2005). Henceforth assuming known margins, we focus on bivariate copulas parametrized by a single scalar η taking values in H ⊆ R. As for the conditional copula density, namely c (u1, u2 | x) = ∂2 C (u1, u2 | x) ∂u1∂u2 , we denote c {u1, u2; η(x)} ≡ c (u1, u2 | x) with η(x) : X → H a parametrization for all x ∈ X. 2.1. The Model Consider an arbitrary dependence or concordance measure ψ between (Y1, Y2) that we want to condition on X. Let us further assume that • ψ satisfies the two properties exposed in Section 1 and takes values in a closed P ⊂ R; • ∃ a mapping ν : P → H with the copula parameter and ν ∈ C∞ (P) is strictly increasing. For instance, Kendall’s tau, defined by ψ = 4 CdC − 1, is a natural choice. In this case, the mappings for common copulas such as the Gaussian, Clayton and Gumbel are ν(ψ) = sin π 2 ψ , ν(ψ) = 2ψ/(1 − ψ) and ν(ψ) = 1/(1 − ψ). Remark. For copula families with several parameters, all but one need to be treated as nui- sances. The issue is discussed in Section 3.1. 5
  • 6. A generalized additive model for the conditional measure can then be written ψ(x; θ) = g z β + K k=1 hk(tk) , (2) where • g : R → P is a strictly increasing and C∞ (R) link expressing the relationship between the GAM and ψ (e.g., g(x) = (ex − 1)/(ex + 1) when P = [−1, 1]), • z ∈ Rp and t ∈ RK are subsets of x or products thereof to consider interactions, • β ∈ Rp is a column vector of parameters, • hk : Hk → R are smooth functions supported on closed Hk ⊂ R and • θ ∈ Θ is the column vector of stacked parameters containing both β and the information encoding the hk. Remark. As it introduces the requirement of a one-to-one mapping, our choice of modeling a dependence or concordance measure instead of the copula parameter directly may seem an unnecessary complication. In fact, this is dictated by the context of application and has two desirable properties. First, a dependence or concordance measure has a more natural interpretation than a copula parameter. Second, it is easier to compare various parametric families, for instance using information criteria, when the modeled distributional feature is the same. Nonetheless, the whole theory can be adapted in a straightforward fashion, using ν as the identity and modifying g accordingly (or vice-versa). In Section 3, we also illustrate models specified for the copula parameter. In this paper, we assume that the smooth functions hk ∈ C2 (Hk) admit a finite-dimensional basis-quadratic penalty representation (Hastie and Tibshirani 1990; Green and Silverman 2000; Wood 2006). Using this representation, we denote by hk = (hk,1, . . . , hk,mk ) ∈ Rmk a mk- 6
  • 7. dimensional parametrization and Sk the fixed matrix such that Hk hk(t)2 dt = hk Skhk. (3) hk can be further constrained using a matrix Ck, such that an additional set of constraints is met whenever Ckhk = 0. For instance, a sensible identifiability requirement is that hk integrates to zero over the range of its covariates. Following Wood (2004), in the presence of such constraints, we can define Zk as the null space of Ck. With wk a reparametrization such that hk = Zkwk, then Ckhk = 0 is straightforward. Hence, we do not mention the additional constraints further and consider that the vector of parameters is θ = (β , h1 , · · · , hK) , taking values in a closed Θ ⊂ Rd with d = p + K k=1 mk. Remark. Additionally to potential multicollinearities arising from the linear part of (2), con- curvity (see e.g. Hastie and Tibshirani 1990, pages 118–123) may also play a role in the model’s identifiability. To check for ill-conditioning in a matrix of linear predictors, the condition num- ber is the standard measure. Similarly, there exists checks for concurvity with respect to the data and chosen bases. A natural cubic spline (NCS) h : H ⊂ R → R with fixed knots is a particular case. Suppose that the m fixed knots are such that inf H = s0 < s1 < · · · < sm < sm = sup H. As an NCS is linear on the two extreme intervals [s0, s1] and [sm, sm+1] and twice continuously differentiable on its support H, h has only m free parameters. For such an NCS, there exists a unique m × m symmetric matrix of rank m − 2 such that it verifies (3). This matrix is fixed in the sense that it depends on the knots but not on h itself. Apart from NCSs, many alternative C2 smoothers admit this finite dimensional basis-quadratic penalty representation. For instance, tensor product splines (functions of multiple predictors) or cyclic cubic splines (to model periodic functions) are included in the GAM toolbox. Remark. Fixing the number and location of the knots beforehand amounts to assuming that the true model can be represented using the finite-dimensional basis (Yu and Ruppert 2002). If this is not the case, consistency would require either additional assumptions on the smooth 7
  • 8. functions or an infinity of knots, or both. However, a finite-dimensional parameter space implies root-n-consistency and asymptotic normality under standard regularity assumptions (see Section 2.2). For x ∈ X and θ ∈ Θ, the copula parameter is η(x; θ) = ν {ψ(x; θ)} and we denote the log-likelihood function for (u1, u2) ∈ [0, 1]2 by 0(u1, u2, x; θ) = log (c [u1, u2; ν {ψ(x; θ)}]) . (4) Considering a sample of n observations {ui1, ui2, xi}n i=1, the parameters can be estimated by maximizing the penalized log-likelihood (θ, γ) = (θ) − 1 2 K k=1 γk Hk hk(tk)2 dtk, = (θ) − 1 2 K k=1 γkhk Skhk, (5) with (θ) = n i=1 0(ui1, ui2, xi; θ), γ = (γ1, . . . , γK) and γk ∈ R+ ∪{0} for all k. The integral terms are roughness penalties on each component and γ is a vector of smoothing parameters. The penalized maximum log-likelihood estimator is defined as θn = argmax θ ∈ Θ (θ, γ). 2.2. Asymptotic Properties In this section, we first establish the required assumptions to ensure root-n-consistency and asymptotic normality of the penalized maximum likelihood estimator. Second, we explicitly formulate the resulting asymptotic properties. Third, we discuss a misspecified case where the 8
  • 9. smooth components do not admit a finite dimensional basis-quadratic penalty representation. In what follows, partial derivatives, gradients and Hessians are taken with respect to any vector of parameters θ and expectations, weak convergence and convergence in probability with respect to the true vector of parameters θ0 ∈ Θ. For all x ∈ X and θ ∈ Θ, we have: Assumption 1. γk = o(1) for k ∈ {1, · · · , K}. Assumption 2. All mixed partial derivatives of c[u1, u2; ν{ψ(x; θ)}] up to degree 3 exist and are continuous for all (u1, u2) ∈ [0, 1]2 . Assumption 3. The conditional expectation of the log-likelihood function’s gradient satisfies E { 0(U1, U2, X; θ) | X = x} = 0. Assumption 4. The conditional Fisher information I(θ, x) = Cov { 0(U1, U2, X; θ) | X = x} is positive-definite and I(θ) = lim n→∞ n−1 n i=1 I(θ, Xi) satisfies, for all 1 ≤ q, r ≤ d, I(θ)qr < ∞ and lim n→∞ max i∈{1,··· ,n} n−1 I(θ, Xi)qr/I(θ)qr = 0. Assumption 5. The conditional expectation of the Hessian J(θ, x) = E 2 0(U1, U2, X; θ) | X = x is positive-definite and J(θ) = lim n→∞ n−1 n i=1 J(θ, Xi) satisfies J(θ)qr < ∞ for all 1 ≤ q, r ≤ d. Assumption 6. There exists a measurable function Mqrs : [0, 1]2 × X → R such that |∂qrs 0(u1, u2, x; θ)| ≤ Mqrs(u1, u2, x) with mqrs = lim n→∞ n−1 n i=1 Mqrs(Ui1, Ui2, Xi) < ∞ for any triplet 1 ≤ q, r, s ≤ d. 9
  • 10. Assumption 1 expresses a penalty that vanishes in probability. It is necessary in order to obtain consistency when the true model can be represented by the underlying finite dimensional basis. Assumptions 2–6 are the usual regularity conditions augmented with Lindeberg–Feller type of assumptions to account for the independent-but-not-identically-distributed context. Typically satisfied for common parametric copulas in the standard case (Joe 1997, Section 10.1.1), they are a simple requirement to deal with exogenous predictors. Remark. Although the mapping ν and link g are both C∞ and strictly increasing, usually {∂η(x; θ)/∂θq}2 = ∂2 η(x; θ)/∂2 θq for 1 ≤ q ≤ d, which implies I(θ) = J(θ). Theorem 2.1. Asymptotic properties of the penalized maximum likelihood esti- mator If Assumptions 1–6 hold, then θn is 1. root-n-consistent with θn − θ0 = Op(n−1/2 ), 2. and asymptotically normal with √ n(θn − θ0) d −→ N 0, J(θ0)−1 I(θ0)J(θ0)−1 . Remark. The proof is provided in the Appendix. While the theorem applies to θn, it can still be used to construct confidence intervals for the dependence measure or the copula parameter. Using the asymptotic distribution of θn, sampling a large number of parameter realizations is computationally straightforward. Then combining (2) and parameter realizations samples any derived quantity, which is much faster than bootstrap techniques. Remark. The theorem can also be applied to compare nested models using likelihood-ratio statistics; 2{ (θ1, γ1) − (θ0, γ0)} d → χ2 (d1 − d0) when d0 < d1 are the number of parameters in null and alternative nested models. 10
  • 11. In many practical cases, smoothers are used as approximations of the underlying functions of interest. In exploratory data analysis for instance, smoothers “let the data speak for them- selves” using a minimal set of assumptions. When the true underlying functions do not admit a finite dimensional-quadratic basis representation, θn is naturally interpreted as a projection. Let us assume that the true model is like (2) and replace each hk by an arbitrary square- integrable fk ∈ C2 (Hk). In this case, the log-likelihood function 0 in (4) using hk is mis- specified; each fk is to be approximated by the corresponding hk. We denote by θ0 ∈ Θ the functional parameter that solves 0 = [0,1]2 0(u1, u2, x; θ0)dC(u1, u2 | x), (6) with integration taken with respect to the true copula. Hence the interpretation of the misspec- ified θn is as a plug-in estimator of θ0. Taking expectations, weak convergence and convergence in probability with respect to the true model in Assumptions 1–6, we further require: Assumption 7. (6) has a unique solution θ0 ∈ Θ. Then Theorem 2.1 still holds and the proof is the same as in Appendix A. 2.3. Penalized Maximum Likelihood Estimation and Iteratively Reweighted Ridge Regression In this section, we describe the estimation procedure, recasting the penalized maximum like- lihood estimation into an iteratively reweighted ridge regression problem. The idea to use iteratively reweighted least squares for maximum likelihood estimation was first proposed in Green (1984). It was then extended to iteratively reweighted ridge regression for penalized maximum likelihood estimation in the context of exponential families in O’Sullivan et al. (1986). Finally, it appeared in the general setting that we use in Green (1987). This reformu- lation is particularly convenient, because algorithms solving the problem that appears at each iteration are implemented in standard software packages. 11
  • 12. Let ψ(θ) be the n × 1 vector with ψi(θ) = ψ(xi; θ), D(θ) the n × d matrix ψ(θ) = ∂ψ(θ)/∂θ and v(θ) the n × 1 vector ∂ (θ, γ)/∂ψ. The penalized maximum log-likelihood estimator θn satisfies d score equations (θn, γ) = D (θn)v(θn) − P(γ)θn = 0, where P(γ) is a d × d block diagonal penalty matrix with K + 1 blocks; the first p × p is filled with zeros and the r
  • Similar documents
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks