Building Composite Grid tasks using STROLL File-system


of 4

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Building Composite Grid tasks using STROLL File-system
  Building Composite Grid tasks using S TROLL  File-system Abdulrahman Azab University of Stavanger4036-Stavanger, NorwayEmail: Hein Meling University of Stavanger4036-Stavanger, NorwayEmail:  Abstract — Despite availability of a range of Grid computingplatforms, domain specialists and scientists only rarely takeadvantage of these computing facilities. One reason for this isthe complexity of Grid computing, and the need to learn a newprogramming environment to interact with the Grid. Moreover,users cannot easily deploy their compute tasks to multiple Gridplatforms without rewriting their program to use different tasksubmission interfaces. S TROLL  is a universal filesystem-basedinterface for seamless task submission to one or more Gridfacilities. Users interact with the Grid through simple readand write filesystem commands. S TROLL  supports simple, i.e.single and batch, and composite, i.e. work-flow tasks. This paperdescribes how to build composite S TROLL  tasks in the form of filesystem directory structures. A R code example is describedas a use-case, and evaluating the CPU consumption of the usermachine shows that implementing a composite task can fairlyreduce the computation overhead. I. I NTRODUCTION Grid computing provides the infrastructure for aggregatingdifferent types of resources (e.g. desktops, mainframes, storageservers) for solving intensive problems in different scientificand industrial fields, e.g. DNA analysis, weather forecasting,modelling and simulation of geological Phenomenon [1].However despite its many benefits, the adoption of Grid com-puting in different scientific communities are perhaps not at thelevel one might expect or hope. The primary reason for this iscomplexity [2]. Managing and using a Grid computing frame-work is a challenging undertaking, and thus limits adoptionto expert programmers capable of, and willing to learn howto program APIs specific to different Grid frameworks. Thismodel is not very approachable for non-computer scientists.Therefore, a major challenge faced by Grid computing vendorsis to provide  a ubiquitous and easy to use interface for access-ing the Grid  , enabling domain-specialist programmers to easilydevelop and deploy Grid computing applications. S TROLL  [3]is a universal interface for seamless deployment of computetasks to a variety of Grid platforms. S TROLL  is based on theubiquitous filesystem interface, and allows task submission,monitoring, and administration through simple  read()  and write()  filesystem functions or commands. The interface isimplemented as a user space  virtual filesystem (VFS), enablingaccess to it from any application or programming languagethat can interact with a filesystem. Hence, the approach lendsitself well to providing Grid access to domain-specialistswhom, although not computer scientists, are expected to havegeneral familiarity with common filesystem operations. In [3],we described the construction and handling of   single  and Batch  Grid tasks [4] using S TROLL . This paper describesthe structure of composite Grid tasks in the form of directorystructures. We use a multi-procedure R code as a use case.The evaluation of the CPU consumption of the user machineshows that implementing a composite task fairly reduces thecomputation overhead.II. S TROLL  A RCHITECTURE S TROLL  is providing provide accessibility for submis-sion, monitoring/control, and output collection for Grid tasksthrough the execution of   read()  and  write()  filesystemcommands. The architecture for the communication processbetween the user and the targeted Grid architecture(s) isdepicted is composed of five layers and depicted in Fig. 1. Fig. 1: S TROLL  Architecture 1)  Grid accesss (front-end):  is the front end where access-ing the Grid by executing  read()  /  write()  filesystemcommands.2)  File-system interface:  is the interface through whichaccess to S TROLL  is made by the execution of filesystemcommands. This access is supported by any OS througha local filesystem interface, e.g. shell and,or remotely, e.g. by sharing the virtual access path andmounting it as a network storage.3)  S TROLL  filesystem:  is the Grid access filesystem whichrole is to translate filesystem commands into standardGrid commands and submit them to the target Gridclient.4)  Grid Client:  Which includes one or more Grid clientsattached to one or more Grids of different architectures,  e.g. Condor [5] and Unicore [6]. The installed Gridclients are either command line, e.g.  condor schedd  forCondor, or API based, e.g. HiLA [7] for Unicore, andhas to be granted access to existing Grid back end.5)  Grid Architecture (back-end):  One or more Gridarchitecture could be included as a back end in the setup.To execute Grid read/right commands on one of theattached Grid architectures through S TROLL , the usersneed to be registered in the targeted Grid(s) and grantedthe associated privileges.III. T ASK  S TRUCTURE Generally, a Grid task is composed of: the executable, task input files, and the configuration file (e.g. task submissionscript). Since S TROLL  provides an alternate submission andmanipulation mechanism based on filesystem interface, Thesubmission script is replaced by S TROLL  filesystem submis-sion and monitoring commands. The following are the generaltask identity rules: •  Each Grid task must be created as a directory. •  The task name is the task directory name. •  Directory names must be unique.Files in a S TROLL  task directory are classified into: i)  realtask files , which are the srcinal Grid task files, e.g. executableand input files. ii)  virtual control files , which are automaticallycreated by S TROLL  for the user to configure, monitor andcontrol the task execution and mainly three:1)  status  file, which has only read access and displays thetask status in real-time, e.g. state (running, waiting, orhalted).2)  control  file, which has only write access and is usedto execute task control commands, e.g. submission andforced termination. Three control commands are sup-ported: •  SUBMIT : Submits the task for execution. •  TERMINATE : Forces task termination. •  RESTART : Terminates the task and re-submits it.3)  config  directory file, is a repository for the task configu-ration parameters and is composed of a set of virtual fileseach named as a configuration parameter and containsthe value of that parameter as text. The main task configuration parameters/files provided by S TROLL  arelisted in Table I.We are providing two ways setting the configurationparameter values for S TROLL  tasks:  individually  and collectively . The first is by writing each parameter valueindividually into the associated configuration virtual file,as follows: echo ’job.class’ >  taskDir  /config/exececho ’java’ >  taskDir  /config/rt The second is to set all the parameters collectively ina single  write()  command to the  control  file asfollows: echo " exec = java.class ; rt = java " > taskDir  /control TABLE I: Task parameters (files in the  config  directory)C ONFIG  D ESCRIPTION exec  Executable file name. rt  Runtime environment needed to run the executable,e.g. JVM, R, or CLR. wait  Value  true  causes the  submit  command to holduntil task termination. subtasks  Number of parallel subtasks to be executed for batchtasks. in  Comma separated list of input files. For batch tasks,the input file names should include a zero basedindex, e.g. in1,in2,..,in20 for 20 subtasks. out  Comma separated list of output files. args  Comma separated list of task arguments. req  Resource requirements, e.g. memory and load aver-age. The format is a similar to Condor’s ClassAd [8],only more compact:  LA0.1 && M512  correspondsto  LoadAvg<=0.1 && Memory>=512 . parents  A comma separated list of the parent tasks. This con-figuration parameter is used for configuring parent-child relations in  composite  tasks. Grid tasks provided by S TROLL  are classified into: a)  Sim- ple , and b)  Composite . A simple task, described in [3], is withexecution stage and is classified into: i)  Single , which has asingle executable a single data file and consumes one CPUelement, and ii)  Batch  [4] which has a single executable butmultiple data files and consumes multiple CPU elements inparallel (e.g. parallel matrix multiplication). A composite task is described in Section IV.IV. C OMPOSITE  T ASKS A composite Grid task is composed of a set of simpleand/or composite tasks. These tasks are collected in onedirectory which is the  composite task folder, CTF  . Adescribing example of composite task is shown in Fig 2. The Fig. 2: Composite task structure main composite task,  taskDir  , is composed of a set of simpletasks  { T1,T2,T17,T18,T19,T20 }  in addition to two compositetasks  { Tx,Ty }  and vice virsa. Each directory created insidethe S TROLL  root contains the auto-generated config directory.  The configuration set in the config sub-directory of any CTFwill be automatically inherited by all included tasks inside theCTF, i.e. any tsak inside a CTF will have the configurationof the holding CTF in addition to any local configurationin its local config directory. In case of a value conflict, thelocal configuration value will override the CTF value. Fig 3shows an example of setting the configuration parameters of  T1  using  taskDir/config  and  taskDir/T1/config .The set of tasks contained in one composite tasks, e.g. Fig. 3: Configuration inheritance example { T6,T10,T11,T15 } , can construct a composite task by settingparent-child relations between the included task elements,which is carried out through setting the  parent   configurationparameter in a child process as the names of its parenttasks. Taking Fig 2 as example: parents(T12) =  { T7,T8 } ,parents(T19) =  { T16,T17 } ,...etc. Composite task rules aredescribed as follows: 1)  {∀ task A  :  IF  ∃ { B,C  } ⊂  Parents ( A )  THEN  Inputs ( A )  ⊂ Outputs ( B )  ∪  Outputs ( C  ) } 2)  {∀ task A  :  IF  A  IsComposite THEN  | Parents ( A ) | ≤  1 } 3)  {∀ task T   ∈  Parents ( A ) :  IF  T.Status  = Fail  THEN  A.Status  =  Fail } Loops are not provided by S TROLL  composite tasks whichstructure is directory structure. The reason is that the filesys-tem directory structure is tree based. This is resolved using active tasks .  A. Active Task  An active task is a S TROLL  task which internally submitsS TROLL  tasks within its run-time. To distinguish the identity,active task names has to be prefixed with ’% ’ to identify thatthis task has to be submitted to a worker with S TROLL  filesys-tem mounted. This will allow executing S TROLL  commandsby running tasks on the worker. We call such workers, S TROLL peers. To set this configuration within S TROLL  filesystem,read/right privileges must be granted to Grid users on S TROLL virtual drive. Setting similar configuration on regular Grid, sys-tems the Grid user has to be granted execute privileges for Gridcommands in addition to filesystem read/right. An active task example is  % T3  in Fig 2 which internally and iteratively calls T3.1 . Since increasing the depth of internal submissions maynegatively affect the whole work-flow execution, maximumdepth value can be set upon S TROLL  installation. Generally,an active task should run locally on the user machine unlessit is computational/data intensive itself.V. A U SE  C ASE To prove the usability and evaluate the efficiency of S TROLL composite task model, we implement the model in a R [9]based modelling project. The project is modelling the relation-ship between ECG [10] characteristics and CPR [11] qualityduring cardiac arrest, and is based on the PSM [12] stochasticmodelling package and mainly uses the  PSM.estimate() procedure which estimates population parameters in a mixedeffects model based on stochastic differential equations [12]. PSM.estimate()  calls  optim  [13], a general purposeoptimisation procedure, for optimising the  APL.KF  procedurewith a gradient .  APL.KF  evaluates the popula-tion likelihood procedure which is used in both  optim  and , and is the main computational intensive and par-allelizable procedure in the R project. To accelerate the modelrunning, the  APL.KF  procedure is restructured to internallysubmit a S TROLL  batch task,  APL.KF.Parallel , for carry-ing out the its main computational intensive loop in parallel. APL.KF  is iteratively called by  optim  and ,and the iterations are continued until the optimum value isreached with the minimal error or by reaching the maxi-mum number of iterations which is an input parameter to PSM.estimate . Thus, the batch task   APL.KF.Parallel is submitted iteratively and internally from  APL.KF . In ourprevious implementation [3], we ran all procedures on theuser machine, and only  APL.KF.Parallel  was submittedas a batch task to the Grid. Here we construct a compositetask where all procedures, other than the main  simmod , onthe Grid by constructing a composite task, depicted in Fig 2. Simmod is the main submitting routine and Normalize is ananalysis procedure for analysing the final results after finishingall required iterations. The composite task has two active tasks: Fig. 4: CPR code execution as a composite task  % PSM.estimate , iteratively submits the  % optim  task,which in tern iteratively submits  APL.KF.Parallel  withinthe Grid through S TROLL . Listing 1 presents the compositetask submission using Unix shell.  A. Performance Evaluation Here we evaluate the impact of implementing a compositeon reducing the resource consumption of the user machine.We run the code using two structures. First, running allprocedures on the user machine, except  APL.KF.Parallel which is submitted to the Grid as a simple batch task. Second,  Listing 1: Composite R task submission using Unix shell #!/bin/sh# Make a new task directory in the stroll root mkdir stroll/R cd   /stroll/R #Set the general configuration for all tasks in stroll root echo  ’R’ > config/rt echo  ’ true ’ > config/wait echo  ’M128’ > config/req # Create task directories inside the virtual path: /stroll mkdir simmod %PSM.estimate %optim APL.KF.parallel Normalize #Set specific configuration for each task echo  "simmod.R" > simmod/config/ exececho  "exec=PSM.estimate.R;parents=simmod" >%PSM.estimate/config/control echo  "exec=optim.R;parents=PSM.estimate" >%optim/config/control echo  "exec=APL.KF.parallel.R;parents=optim;in=inp$(i);out=out$(i);subtasks=20" > APL.KF.parallel/config/control # Copy task files into task directories cp ˜/simmod.R simmodcp ˜/Normalize.R Normalizecp ˜/PSM.estimate.R %PSM.estimatecp ˜/optim.R %optimcp ˜/APL.KF.parallel.R APL.KF.parallel # Submit the task echo  "submit" > simmod/control # Wait until ’echo’ command returns to collect the output cp Normalize/out.RData ˜/ submitting the composite task, in Figure 4, to the Grid so thatonly  Simmod  runs on the user machine while the rest runon the Grid. Our Grid is condor based with 16 worker ma-chines. Each machine is a windows-XP with 3GHz dual-coreprocessor and 2GB of RAM. The user machine has the samespecifications. Figure 5 shows that the CPU consumption incase of implementing a composite task is much less, since onlythe main procedure,  Simmod , runs on the user machine whilethe rest are submitted to Grid nodes. The overhead shown inthis case is caused by the  condor_shadow  process, whichis the client of Condor Grid.VI. C ONCLUSION S TROLL  is a universal filesystem-based interface for seamlesstask submission to one or more Grid computing facilities.S TROLL  provides drivers for both Condor and Unicore, en-abling non-expert users to submit compute tasks to the Gridboth manually and programmatically. In this paper, we de-scribed building composite Grid tasks for S TROLL  is verysimple using filesystem directory structures. It was also shownthat implementing a composite task can reduce the executionoverhead by allowing most of the code procedures to run onthe Grid instead of the user machine. 0 200 400 600 800 1 , 000020406080100 Time (seconds)       %      C     P     U   c   o   n   s   u   m   p    t     i   o   n Simple task Composite task  Fig. 5: CPU consumption of the user machine in case of submitting one simplebatch task, and in case of submitting six tasks as a composite task  R EFERENCES[1] I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the grid:Enabling scalable virtual organizations,”  International J. Supercomputer  Applications , vol. 15, no. 3, 2001.[2] M. Bijsterbosch, M. K. Elbaek, P. Hochstenbach, J. Ludwig, G. S.Pedersen, R. Russell, B. Schmidt, B. Sierman, M. Vanderfeesten, andK. V. Godtsenhoven, “Technology watch report,” DRIVER, DigitalRepository Infrastructure Vision for European Research II, Report 1,December 2008.[3] A. Azab and H. Meling, “Stroll: A universal filesystem-based interfacefor seamless task deployment in grid computing,” in  DAIS  , 2012, pp.162–176.[4] “Batch processing – userguide/hpcxuser/batch processing.html,” 6 Retrieved 2011.[5] M. Litzkow, M. Livny, and M. Mutka, “Condor - a hunter of idleworkstations,” in  Proceedings of the 8th International Conference of  Distributed Computing Systems , June 1988.[6] D. W. Erwin and D. F. Snelling, “Unicore: A grid computing envi-ronment,” in  European Conference on Parallel Processing , 2001, pp.825–834.[7] B. Hagemeier, R. Menday, B. Schuller, and A. Streit, “”a universal apifor grids”,” in  Cracow Grid Workshop ’06 Proceedings , M. Bubak,M. Turała, and K. Wiatr, Eds. ul. Nawojki 11, 30-950 Krak?w 61,P.O. Box 386, Poland: Academic Computer Centre CYFRONET AGH,July 2007, pp. 312–319.[8] “Condor sumbit description file – /v6.6/condor submit.html.”[9] R Development Core Team,  R: A Language and Environment for Statistical Computing , R Foundation for Statistical Computing, Vienna,Austria, 2011, ISBN 3-900051-07-0.[10] S. Bowbrick and A. Borg,  ECG complete . Churchill Livingstone, 2006.[11] S. P, “History of cardiopulmonary-cerebral resuscitation,” in  Cardiopul-monary Resuscitation , New York, 1989, pp. 1–53.[12] K. S, M. SB, K. NR, O. RV, and M. H, “Population stochasticmodelling (psm) - an r package for mixed-effects models based onstochastic differential equations,”  Computer Methods and Programs in Biomedicine , vol. 3, no. 94, pp. 279–289, 2009.[13] “R-forge optimizer –”
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks