A fast majorize-minimize algorithm for the recovery of sparse and low-rank matrices


of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
We introduce a novel algorithm to recover sparse and low-rank matrices from noisy and undersampled measurements. We pose the reconstruction as an optimization problem, where we minimize a linear combination of data consistency error, nonconvex
  742 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2012 A Fast Majorize–Minimize Algorithm for theRecovery of Sparse and Low-Rank Matrices Yue Hu  , Student Member, IEEE  , Sajan Goud Lingala  , Student Member, IEEE  , and Mathews Jacob  , Member, IEEE   Abstract— We introduce a novel algorithm to recover sparse andlow-rank matrices from noisy and undersampled measurements.We pose the reconstruction as an optimization problem, wherewe minimize a linear combination of data consistency error,nonconvex spectral penalty, and nonconvex sparsity penalty. Wemajorize the nondifferentiable spectral and sparsity penaltiesin the criterion by quadratic expressions to realize an iterativethree-step alternating minimization scheme. Since each of thesesteps can be evaluated either analytically or using fast schemes,we obtain a computationally efficient algorithm. We demonstratethe utility of the algorithm in the context of dynamic magneticresonance imaging (MRI) reconstruction from sub-Nyquist sam-pled measurements. The results show a significant improvementin signal-to-noise ratio and image quality compared with classicaldynamic imaging algorithms. We expect the proposed scheme tobe useful in a range of applications including video restorationand multidimensional MRI.  Index Terms— Dynamicmagnetic resonance imaging (MRI), lowrank, majorize minimize, matrix recovery, sparse. I. I NTRODUCTION T HE RECOVERY of a low-rank or approximately low-rank matrix from very few measurements of its entrieshas received a lot of attention in recent years, mainly due to itsapplication in machine learning, computer vision, and recom-mendation systems [1], [2]. Our own motivation in this area isto use this framework to recover dynamic imaging/video datafrom sparse and noisy measurements. Several researchers haveposed the reconstruction of dynamic image data as the recoveryof a low-rank Casorati matrix, whose columns correspond tospatial pixels and rows correspond to temporal intensity varia-tions of the pixels [3]–[7]. In addition, the low-rank matrix maybe known to be additionally sparse in a specified domain. Forexample, each of the frames of a dynamic imaging data set isa natural image and can hence have sparse wavelet coefficients Manuscript received March 14, 2011; revised June 10, 2011 and July 28,2011; accepted July 29, 2011. Date of publication August 22, 2011; date of cur-rent version January 18, 2012. This work is supported by the National ScienceFoundation Award CCF-0844812 and Award CCF-1116067. The associate ed-itor coordinating the review of this manuscript and approving it for publicationwas Prof. Ali Bilgin.Y. Hu is with the Department of Electrical and Computer Engineering, Uni-versity of Rochester, Rochester, NY 014627 USA (e-mail: yue.hu@rochester.edu).S. G. Lingala and M. Jacob are with the Department of Biomedical Engi-neering and the Department of Electrical and Computer Engineering, Univer-sity of Iowa, Iowa City, IA 52242 USA (e-mail: sajangoud-lingala@uiowa.edu;mathews-jacob@uiowa.edu).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIP.2011.2165552 or gradients [8]. It is known that sparsity and low-rank proper-ties are somewhat complementary [1]; most randomly selectedlow-rank matrices are not sparse, and most sparse random ma-trices are not of low rank [9]. However, there are matrices thatare simultaneously sparse and of low rank. For example, a ma-trix with only one nonzero entry will have a sparsity of one andwillbeofunitrank.Weexploitthefewerdegreesofthefreedomof the set of matrices that are simultaneously sparse and of lowrank, as compared with only sparse or only low-rank matrices,tosignificantlyreducethenumberofmeasurements.We,aswellas other researchers, have demonstrated the utility of modelingdynamic imaging data set as sparse and low-rank Casorati ma-trices [5], [6], [10], [11].In this paper, we focus on nonconvex spectral and sparsitypenalties to further decrease the number of measurementsrequired for recovery. Specifically, we consider Schatten-functionals, which are the extensions of the classical nu-clear-norm spectral penalty. This choice is inspired by therecent success of nonconvex compressed sensing schemesthat rely on penalties [12], [13]. Similar to thevector penalty, the Schatten- functions cease to be norms andare nonconvex for . We also use nonconvex sparsitypenalties as in [12]–[14]. Thus, we pose the recovery of thesparse and low-rank matrices as a nonconvex optimizationproblem, where the cost function is a linear combination of the data consistency term, the nonconvex spectral penalty, andthe nonconvex sparsity penalty. Most of the current matrixrecovery algorithms are variants of iterative singular-valuethresholding (IST) [15]–[17]. Unfortunately, it is not easy toadapt this scheme to our case since our penalty is a linearcombination of nonconvex spectral and sparsity functions; itis difficult to efficiently evaluate the proximal mapping of thelinear combination of penalties. It may be possible to extendthe algorithm in [18], which relies on multiple proximal projec-tions, to solve the matrix recovery problem with convex priors.However, the proximal projections will not have closed-formexpressions when nonconvex penalties are used, making theefficient implementation of these schemes challenging.We introduce a novel majorize–minimize (MM) algorithm torecover sparse and low-rank matrices from undersampled mea-surements. In contrast with current matrix recovery schemesthat majorize the data consistency term, we majorize each of the penalty terms with quadratic functions. We use the propertyof unitarily invariant matrix penalties to majorize the spectralpenalty. This majorization of the penalty terms enables us tosolve for the matrix using a three-step alternating minimizationscheme with closed-form shrinkage rules. The iterative algo-rithm alternates between the three simple steps: 1) the solution 1057-7149/$26.00 © 2011 IEEE  HU  et al. : FAST MAJORIZE–MINIMIZE ALGORITHM FOR THE RECOVERY OF SPARSE AND LOW-RANK MATRICES 743 of a linear system of equations; 2) a singular value shrinkage;and 3) a gradient shrinkage. The linear system of equations canbe solved using either analytical expressions or a few conjugategradient (CG) steps. Both the shrinkage steps are obtained asthe proximal mappings [19] of new matrix functions, which arerelated to the srcinal spectral and sparsity penalties. Due to theproperty of convex conjugate matrix functions, these shrinkagerules have analytical expressions, even when nonconvex penal-ties are used. Since each step of the algorithm can be evaluatedanalyticallyorusingfastschemes,thealgorithmiscomputation-ally very efficient. The proposed MM scheme is equivalent tothe variable splitting (VS) scheme, which was introduced in ourearlier work [6], when nuclear-norm and total-variation (TV)penalties are used. However, the proximal projections in the VSscheme do not have closed-form expressions when nonconvexpenalties are used. While the proximal projections can be ap-proximated, as shown in [20], we observe that the resulting ap-proximate VS algorithm is significantly slower than the corre-spondingMMscheme.Wealsointroduceacontinuationschemeto accelerate the convergence of the algorithm. In addition toproviding fast algorithms, this approach makes the algorithmrobust to local minima. Specifically, we use a sequence of cri-teria with gradually increasing complexity while using the so-lution from the previous iteration to initialize the new criterion.Similar homotopy continuation schemes are used in nonconvexcompressed sensing to minimize local-minimum effects [21].The rest of this paper is organized as follows: We brieflyreview the background literature in Section II to make thispaper self-contained. We introduce the MM algorithm inSection III, whereas its numerical implementation is describedin Section IV. We study the convergence of the algorithm andits utility in practical applications in the results in Section V.II. B ACKGROUND  A. Matrix Recovery Using Nuclear-Norm Minimization Current theoretical results indicate that matrixof rank can be perfectly recovered from itslinear measurements [1], [2]. This recovery can beformulated as the following constrained optimization problem:such that rank (1)To realize computationally efficient algorithms, the afore-mentioned problem is often reformulated as an unconstrainedconvex optimization scheme, i.e.,(2)where is the nuclear norm of matrix. This penalty is the convex relax-ation of the rank and is defined as the sum of the singularvalues of : .  B. MM Algorithms MM algorithms rely on a surrogate function thatmajorizes the objective function using the current iterate. The surrogate function is equal to the objective function attangent [i.e., ] and is larger than theobjective function elsewhere, i.e.,(3)The successive minimization of the majorant functionensures that the cost function monotonicallydecreases. This property guarantees global convergence forconvex cost functions. We rely on homotopy continuationschemes to minimize local-minimum problems, when non-convex cost functions are used. C. Matrix Recovery Using IST  Thecommonapproachestosolvefor(2)involvedifferentfla-vors of IST [15]–[17]. These schemes majorize the data-consis-tency term in (2) with the following quadratic expression:(4)Here, is a constant such that , is a constantthat is independent of , and .Here, is the identity operator. Thus, we have(5)Theminimizationoftheaforementionedexpressionistermedas the proximal mapping of , associated with the nuclear-norm penalty [19]. This proximal mapping has an analytical so-lution [15], i.e.,(6)where and are the singular vectors and values are thesingular values of . The thresholding function in (6) is de-fined asif else. (7)Unfortunately,itisnotstraightforwardtoadaptthisalgorithmtooptimizationschemeswithmultiplenondifferentiablepenaltyterms (e.g., spectral and sparsity penalties), as previously dis-cussed.  D. Unitarily Invariant Matrix Functions We focus on the general class of unitarily invariant spectralpenalties, which satisfy the following property:(8)  744 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2012 Here, is the set of unitary matrices. The afore-mentioned definition implies that is invariant to pre- andpostmultiplication of by unitary matrices. This class has sev-eral attractive properties, which are valuable in realizing fast al-gorithms. Specifically, Lewis has shown that unitarily invariantconvex matrix functions are fully characterized by compositefunctions of the singular values of the matrix [22]. If de-notes the dimensional vectorofsingular valuesof ,any unitarily invariant penalty can be specified as(9)Here, is a function that is invariantunder sign changes and permutations of the elements of .An interesting case is when , whereis a function that is invariant to sign of . Thisclass includes most of the spectral penalties of practical in-terest. For example, the Schatten- spectral norms correspondto . These penalties are convex and are normswhen . Clearly, the nuclear norm is a special case of Schatten- norms, when . Due to the results in [22],many of the properties of vector functions can be extendedto matrices. For example, the subgradient of is givenby [22] . Here,is the singular-value decomposition of . When , we have(10)where is the subgradient of . Similarly, the convexconjugate of unitarily invariant penalties can be easily derivedin terms of the convex conjugates of the corresponding func-tions. We use these results to extend the MM algorithms, whichare srcinally developed for vector recovery, to matrix recoveryproblems.  E. Dynamic Imaging Using Matrix Recovery Schemes Our motivation in developing this algorithm is to use it in dy-namic imaging and video restoration. We denote the spatiotem-poral signal as , where is the spatial location and de-notes time. We denote the sparse and noisy measurements to berelated to as , where is the measurementoperator and is the noise process. The vectors correspondingto the temporal profiles of the voxels are often highly correlatedor linearly dependent. The spatiotemporal signal can berearranged as a Casorati matrix to exploit the following correla-tions [3], [4], [6], [7]:... (11)The th row of corresponds to the temporal intensity varia-tions of voxel . Similarly, the th column of represents theimageatthetimepoint .Sincetherowsofthis matrixarelinearlydependent,therankof isgivenby .Wewill refer to the dynamic imaging data set either as or asin the remaining sections. The low-rank structure of dynamicimaging data sets was used to recover them from undersam-pled Fourier measurements by several authors [3], [4], [6], [7],[23]. These schemes rely on either simpler two-step algorithms,which are relatively inefficient at high acceleration factors, orgreedy low-rank decomposition schemes. In contrast with thesemethods, the proposed scheme is computationally efficient, ac-curate,highlyflexible,andcapableofusingmultiplenonconvexspectral and sparsity priors.III. MM A LGORITHM FOR  M ATRIX  R ECOVERY We introduce the problem formulation and the algorithmhere. The details of the numerical implementation are coveredin Section IV.  A. Matrix Recovery Using Sparsity and Spectral Penalties To exploit the low rank and the sparsity of the matrix in thetransformdomain(specifiedby and ),weformulatethema-trix recoveryas the following constrained optimization scheme:such that rank (12)We rewrite the aforementioned constrained optimizationproblem using Lagrange’s multipliers and relax the penaltiesto obtain(13)Here, the spectral penalty is the relaxation of the rank con-straint. We choose it as the class of Schatten- matrix penalties, specified by(14)Similarly, we specify the sparsity penalty as, which is the norm of the matrix entries,specified by(15)When and , the cost function (13) is convex andhence has a unique minimum. We now generalize the sparsitypenalty to account for nonseparable convex and nonconvexTV-like penalties, i.e.,(16)which are widely used in imaging applications [24], [25]. Theaforementioned penalty is often implemented using finite-dif-ference operators. Rewriting the aforementioned expression interms of matrix , we get(17)  HU  et al. : FAST MAJORIZE–MINIMIZE ALGORITHM FOR THE RECOVERY OF SPARSE AND LOW-RANK MATRICES 745 where . Here, and ,, are matrices that operate on the rows and thecolumns of , respectively. The nonseparable gradient penaltyin (16) is thus obtained when , , and correspond to thefinite differences of along , , and , respectively;and are the corresponding finite-difference matrices.Gao  et al. , have recently used a linear combination of sparseand low-rank matrices [26] to model dynamic imaging data setand recover it from undersampled measurements. They chosethe regularization parameters such that the low-rank componentis the static background signal. The dynamic components areassumed to be sparse in a preselected basis/frame, which isenforced by a convex sparsity prior. The use of a sparse modelto capture the dynamic components is conceptually similarto classical compressed sensing dynamic imaging schemes[27]–[29]. We have shown that the basis functions estimatedfrom the data itself (using low-rank recovery) are more effec-tive in representing the data compared with preselected basisfunctions, particularly when the significant respiratory motionis present [6]. We plan to compare the proposed scheme withthe model in [26] and other state-of-the-art dynamic imagingschemes in the future.  B. Algorithm Formulation We now derive a fast MM algorithm to solve (13). Specifi-cally, we majorize the penalty terms by quadratic functions of , i.e.,(18)(19)Here, and , ,areauxiliarymatrixvariables,and is the Frobenius norm of . By definition,and are matrix functions that are dependent onand , respectively.Analytical expressions for and can be derived in manycases, as shown below. However, we find in Section IV thatanalytical expressions for and are not required for efficientimplementation. Using the aforementioned majorizations, wesimplify the srcinal cost function in (13) as(20)where(21)We propose to use an iterative alternating minimizationscheme to minimize the aforementioned criterion. Specifically,we alternatively minimize (21) with respect to each of thevariables, assuming others to be fixed. We denote the thiterate of these variables as , , and ,respectively. One iteration of this scheme is described below.1) Derive , assuming and , i.e.,(22)Since this expression is quadratic in , we derive theanalytical solutions for many measurement operators inSection IV.2) Derive , assuming , i.e.,(23)The optimal is thus obtained as the proximal mappingof , corresponding to the spectral penalty . We de-rive analytical expressions for this step for the widely usednuclear-norm and Schatten- functionals in Section IV.3) Derive , assuming , i.e.,(24)The optimal is thus the proximalmapping of , associated withthe matrix penalty . Since is nonseparable, the corre-sponding shrinkage involves the simultaneous processingof the component matrices .This step also has analytical expressions, as shown inSection IV. C. Expression of  We now focus on determining function , such that the ma- jorization of the spectral penalty term in (18) holds. Since ana-lytical expressions for and are not essential to realize an ef-ficient algorithm, readers may skip this section and go directlyto Section IV.We reorder the terms in (18) to obtain(25)Here, is the inner product of twomatrices. From the theory in [22], the aforementioned relationis satisfiedif is aconvexfunction and is theconvexdual of , i.e.,(26)  746 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2012 Fig. 1. Huber approximation of the spectral penalty and the corresponding   function. Note that the approximation of the srcinal spectral penalty by theHuber function improves with increasing values of     . Clearly, large values of    are required to approximate Schatten-    norms;      . It is observed that            ,      , when      . Hence, the VS interpretation (see Section IV-E)is equivalent to the MM scheme. However, this equivalence breaks down when     . Specifically,   00 0    00 0     only when      . (a)            .(b)         . (c)         . (d)               . (e)           . (f)            . Note that need not be convex for the aforementioned rela-tion to hold. This majorization is valid if is convex, whichis possible even when is concave. Due to the property of uni-tarily invariant functions, the dual of a specified matrix functionis obtained as(27)Thus, is the convex conjugate of . Fromtheaforementionedrelations,wehave ,where .We now approximate the nondifferentiable penalties bycontinuously differentiable Huber functionals. These approxi-mations are required to ensure that is convex. In addi-tion, the differentiability of also provides additional simpli-fications.1) Nuclear norm: We approximate the nuclear-norm penaltyas . Here,is the standard scalar Huber function, i.e.,if else.(28)Note that as . With this choice,the corresponding is given byif else.(29)Note that is convex for any . Using the property of convexconjugatefunctionsdescribedearlier,wefindintheAppendix that . Thus, we have ,.2) Schatten- norm: We approximate the Schatten- matrixnorm by the corresponding Huber matrix function, i.e.,if else.(30)Here, . The threshold specified byandconstant ischosensuchthat iscontinu-ously differentiable and is convex. The aforementionedformula is essentially an extension of the generalizationproposed by [24] to matrix functionals. It is difficult to de-rive analytical expressions for for arbitrary .However,wecannumericallysolvefor andeval-uate for specific values of , as shown in Fig. 1. Weshow in the next section that analytical expressions for theproximal mapping, specified by (23), can be derived evenif analytical expressions for are not available.We plot , , and for and for differentvalues of in Fig. 1. Note that , , when .However, is different from when . Thisimplies that the VS interpretation of the MM algorithm breaksdown when , as explained in Section IV.  D. Expression for  The Huber approximation of the TV norm wasconsidered in [30], where they showed that(31)Analytical expressions of cannot be obtained when .However, we derive the analytical expression for the shrinkagein Section IV, which will enable the efficient implementation.IV. N UMERICAL  A LGORITHM We now focus on the numerical implementation of the threemain subproblems. Specifically, we show that all of the threesteps can be solved either analytically or using efficient algo-rithms for most penalties and measurement operators of prac-tical interest. This enables us to realize a computationally ef-ficient algorithm. We also introduce a continuation scheme toaccelerate the convergence of the algorithm.  A. Quadratic Subproblem Specified by (22) Since subproblem (22) is entirely quadratic, we rewrite it asa Tikhonov regularized image recovery problem, i.e.,(32)Here, and are the 3-D data sets corre-sponding to the corresponding Casorati matrices. Similarly,is the linear operator such that . We obtain theEuler–Lagrange equation of this variational problem as(33)
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks