DistSim: Difference between revisions

From
Jump to navigation Jump to search
No edit summary
Line 101: Line 101:
===using the configuration library===
===using the configuration library===
[[Image:class-hierarchy.png|thumb|class structure]]
[[Image:class-hierarchy.png|thumb|class structure]]
The configuration library is the back end of the client GUI and performs all actual functions. It can be found in the Java package client.data. The configuration is represented as a tree of ''BdBackedData'' objects. These objects can be loaded from, updated in and saved to the database and they contain ''Saveable'' child objects which are affected by certain changes in their parent object and may again have children. You can ''add'' child objects to potential parents and ''remove'' them again. ''commit'' Applies these changes to the database. ''loadFromDb'' loads a ''DbBackedData'' object with certain characteristics from the database. This method also loads all direct children of that object but not its grandchildren. For example loading a study will load all groups for that study, but not its simulations. Like this the potentially long delay caused by loading all simulation parameters from the database into memory is avoided. ''getDependent'' and certain constructors perform the same function. Details are explained in the javadoc comments. In contrast, methods '''writing''' to the database perform their function recursively over the whole object tree. The rationale behind that is that once the data is loaded we know that the machine can handle it and we expect changes to affect all ancestors of an object. ''Group'' and ''Study'' additionally contain subobjects. Calls to the superobjects are passed on to the subobjects. For example if a ''Study'' is loaded from the database its ''StudyPackages'' are loaded, too.
The configuration library is the back end of the client GUI and performs all actual functions. It can be found in the Java package client.data. The configuration is represented as a tree of ''DbBackedData'' objects. These objects can be loaded from, updated in and saved to the database and they contain ''Saveable'' child objects which are affected by certain changes in their parent object and may again have children. You can ''add'' child objects to potential parents and ''remove'' them again. ''commit'' Applies these changes to the database. ''loadFromDb'' loads a ''DbBackedData'' object with certain characteristics from the database. This method also loads all direct children of that object but not its grandchildren. For example loading a study will load all groups for that study, but not its simulations. Like this the potentially long delay caused by loading all simulation parameters from the database into memory is avoided. ''getDependent'' and certain constructors perform the same function. Details are explained in the javadoc comments. In contrast, methods '''writing''' to the database perform their function recursively over the whole object tree. The rationale behind that is that once the data is loaded we know that the machine can handle it and we expect changes to affect all ancestors of an object. ''Group'' and ''Study'' additionally contain subobjects. Calls to the superobjects are passed on to the subobjects. For example if a ''Study'' is loaded from the database its ''StudyPackages'' are loaded, too.


The restrictions on writing previously defined simulations aren't enforced in the library. You can actually delete existing simulations from the database, provided you don't violate a foreign key constraint. Also it's important to remember that the wrapper will execute any studies that are present in the database and not yet assigned to another instance of the wrapper. Thus, it is important to fully define all parameters of a study before committing it to the database. Writing operations as well as the wrapper's reading operation are secured by transactions, so you can write a whole study at once without caring about dirty reads.
The restrictions on writing previously defined simulations aren't enforced in the library. You can actually delete existing simulations from the database, provided you don't violate a foreign key constraint. Also it's important to remember that the wrapper will execute any studies that are present in the database and not yet assigned to another instance of the wrapper. Thus, it is important to fully define all parameters of a study before committing it to the database. Writing operations as well as the wrapper's reading operation are secured by transactions, so you can write a whole study at once without caring about dirty reads.

Revision as of 16:51, 25 February 2008

DistSim is provides tools to define parameters of simulations, exectute them on various hosts in a LAN and collect the results in a database. The parameters, results and references to the code used to perform the simulations are kept in a central MySQL data base, so that little manual organization of data is required and all results of past simulations are easily accessible for comparison. DistSim consists of the following components:

architecture
  • The simulation wrapper is a small program running on the hosts intended to execute the simulations. It regularly checks for changes in the database, fetches new simulation jobs and executes them.
  • The object-relational mapping library is a helper library to simplify storing the results of simulations. It can save (almost) arbitrary java objects into the database using a simple persistence mapping.
  • The configuration library provides routines to define simulations, groups of simulations and studies. It can be used to programmatically create simulations.
  • The configuration library is also the back end for the configuration client, a swing GUI for defining parameters of simulations.

The following guide shows how to set up a simple simulation and touches various aspects of DistSim. It is intended to be used as a starting point for more sophisticated setups.

Installation

In order to use DistSim you should set up a MySQL database. I won't go into great detail here, as there are other guides for that. You can execute the SQL script "tabellen" in the wrapper directory to create the necessary tables:

mysql> source wrapper/tabellen

Now you need to provide user accounts for various roles. One role is used for the configuration client, which needs write access to the configuration tables. Another one is used by the wrapper which needs write access to the results tables. You can reconfigure the database connection later, though. So if you just want to test the framework without caring much for security a simple

mysql> grant all on simulation to test_user identified by test_pwd 

and

mysql> grant all on simulation_results to test_user identified by test_pwd

should be enough.

Also you need a Java JDK which supports at least the Java language version 5.0. The wrapper and OR-mapping should also work with version 1.4, but the client won't. Additionally for this tutorial you need Apache Ant, preferably version 1.6.5 or newer. To access the source code for DistSim you need a subversion client.

The installation procedure for DistSim itself is fairly short and easy. First you have to obtain the latest version of DistSim from the sarforge subversion repository. To do that you should check out https://sarforge.informatik.hu-berlin.de/svn/berlinroofnet/BerlinRoofNet/trunk/simulation/distsim:

$ svn co https://sarforge.informatik.hu-berlin.de/svn/berlinroofnet/BerlinRoofNet/trunk/simulation/distsim

In the distsim directory you'll find four subdirectories. client for the configuration client and library, or-mapper for the object-relational mapping library, and wrapper for the simulation wrapper. The fourth one, text, contains the actual thesis I wrote about DistSim. In order to make use of DistSim you should at least start one wrapper. First you will want to compile the wrapper with

$ ant compile

in the wrapper directory. Then you should edit the wrapper.properties and host1.properties files in the wrapper directory. You need to provide database parameters for the results and configuration databases as well as various properties of the host itself. The example configuration looks like this:

host1.properties:

host.id 1 
host.description gurkenhannes
host.architecture x86_linux 
host.name brn-suse093-1

wrapper.properties:

definitions.host localhost
defintions.database simulation
definitions.username alve
definitions.password hannes
results.host localhost
results.database simulation_results
results.username alve
results.password hannes

When these properties all contain correct values you can start the wrapper by piping the properties into its standard input. For example like this:

$ cat host1.properties wrapper.properties | ant run

For this trick you need at least ant 1.6.5, otherwise the input won't be passed on. Now the wrapper is running and waiting for simulation jobs. To define these jobs you'll need the configuration client. The client needs to know the interface RemoteSimulation so that it can connect to the wrapper's RMI interface. This interface is part of the wrapper. in order to let the client know about RemoteSimulation, you need to build a .jar library of the wrapper and copy it to client/lib. In the wrapper directory this involves the following commands:

$ ant jar
$ cp wrapper.jar ../client/lib

Now you can start the client with

$ ant run

in the client directory.

defining a package

In order to actually simulate anything a package with the code to be executed needs to be available. I'm using JiST with a simple simulation as example. The Code for the example simulation can be found here. The example has to be copied to the directory src/jist/minisim in the JiST distribution. As the example simulation uses the OR-Mapping Library, it needs to be added to JiST's libraries. To do this, build the library with

ant jar

in the or-mapper directory and then move or-mapper.jar to JiST's libs directory. Additionally we need a small ant build file, which will start the simulation later. The content of the file can be found here. It needs to placed as build.xml in the root directory of the JiST distribution. Now, in order to compile the example, you need to call

ant compile

in the JiST root directory. Then the whole JiST root directory with our modifications needs to be packed as a .zip file, for example with:

zip -r distsim-test.zip jist-swans-1.0.6

The resulting package should then be uploaded to a place where it is accessible for the hosts executing the simulations. In the long run an FTP server might be a good solution. For now we can just leave it somewhere on the local disc and start the wrapper on the same computer. Let's define the package to be at /home/user/distsim-test.zip.

Now the package can be announced in the database. To do this, start the client, connect to the database and choose define packages. name and version are free form strings. For example you can call the package DistSim-Test and assign it the version 1.0. Using the architecture field you can define different functionally equivalent packages for different kinds of hosts. The actual content is also a free form string, but it is matched against the architecture string in the host configuration. The special architecture all matches any host architecture. As JiST and the test simulation are written in Java we can safely specify all here. The URL, finally should point to the place where the package can be found. In our special case this would be file:///home/user/distsim-test.zip. There is also an option to upload a local file to the specified URL, but this doesn't work very well because of limitations in Java's implementation of various URL handlers. define commits your input into the database. After that you can define more packages, but that isn't necessary for now.

defining a study

defining a study

Now that the code package is fully defined a study needs to be created. A study consists of many similar simulations, all run with the same code. To define a study, type its name in the combo box and define an initial version for the study. Later you can branch a study and create newer versions from it by changing the version number. All studies of the same tree are shown in the upper part of the window.

Now you can add the package to the study and define the path where it should be unpacked. You should pick a relative and unique path here because it will be deleted after the simulation. Each instance of the wrapper is running in its own directory, so if as long as a relative path is chosen here you can operate multiple wrappers in the same file system. You can actually place multiple packages in the same directory as they are all deleted after the simulation anyway. You shouldn't use the directory for anything else, though.

If you chose the path baum for our test package you can now specify the command to be executed in order to start the actual simulation. You do that by editing the respective cell in the studies table. The command would be

ant -f baum/jist-swans-1.0.6/build.xml run

Again, this only works with ant 1.6.5 or newer. When you have done all that, you can save your changes and switch to the next tab. If you save too early you won't be able to change the command or the packages as the client doesn't let you overwrite existing simulations in order to prevent a loss of association between parameters and results of simulations.

defining parameters

The same principle applies to parameter definition. The only exception is that you can define additional simulations even after saving the first time. The tables in the simulation tab allow you to define parameters in two different ways. Either you define a list of literals or you define a range of expressions. Literals can be numerical or strings, expressions are always numerical. Each column of the table represents one group of simulations which contains simulations for each possible combination of the specified parameters. For example, if you define parameter par1 to be a OR b and par2 to be 5 TO 10 BY 5. The result will be four simulations:

sim1: a, 5
sim2: a, 10
sim3: b, 5
sim4: b, 10

You can use arbitrary mathematical expressions and reference other parameters, literals or expressions, here. For example the following would be valid:

par1: 1 OR 2 OR 5.5 OR 11
par2: 0 TO 20 BY par1

The evaluation of such a range uses the lower bound as first actual parameter for a simulation. Then it increments by the step value until the result further from the upper bound than the previous value. So if par1 is 11 in the previous example, par2 will be 0, 11 and 22. All these parameters can also be referenced in the command line for the study by enclosing them in dollar signs. For example

ant -f baum/jist-swans-1.0.6/build.xml $target$

would call ant with a target depending on the actual simulation. $target$ would be substituted by the value of the parameter target when the simulation is started. As our example simulation doesn't need any configuration, we only need to create one group and one "dummy" list parameter. Then one value should be assigned to the parameter so that one simulation is generated.

In the input and output files tables you can specify additional configuration and results files that should be stored in the database. We don't need them for now.

When you are done with parameter definition, you can save and generate simulations from the group you have created. This will switch you to the monitoring tab. You may need to refresh the table in order to see the newly created simulation. Now the wrapper should find the simulation and execute it. After this is done, you can view the result in the evaluation tab.

using the configuration library

class structure

The configuration library is the back end of the client GUI and performs all actual functions. It can be found in the Java package client.data. The configuration is represented as a tree of DbBackedData objects. These objects can be loaded from, updated in and saved to the database and they contain Saveable child objects which are affected by certain changes in their parent object and may again have children. You can add child objects to potential parents and remove them again. commit Applies these changes to the database. loadFromDb loads a DbBackedData object with certain characteristics from the database. This method also loads all direct children of that object but not its grandchildren. For example loading a study will load all groups for that study, but not its simulations. Like this the potentially long delay caused by loading all simulation parameters from the database into memory is avoided. getDependent and certain constructors perform the same function. Details are explained in the javadoc comments. In contrast, methods writing to the database perform their function recursively over the whole object tree. The rationale behind that is that once the data is loaded we know that the machine can handle it and we expect changes to affect all ancestors of an object. Group and Study additionally contain subobjects. Calls to the superobjects are passed on to the subobjects. For example if a Study is loaded from the database its StudyPackages are loaded, too.

The restrictions on writing previously defined simulations aren't enforced in the library. You can actually delete existing simulations from the database, provided you don't violate a foreign key constraint. Also it's important to remember that the wrapper will execute any studies that are present in the database and not yet assigned to another instance of the wrapper. Thus, it is important to fully define all parameters of a study before committing it to the database. Writing operations as well as the wrapper's reading operation are secured by transactions, so you can write a whole study at once without caring about dirty reads.