Open MPI logo

Open Tool for Parameter Optimization (otpo) Documentation

  |   Home   |   Support   |   FAQ   |  

What is OTPO?

OTPO (Open Tool for Parameter Optimizations) is an Open MPI specific tool that is meant to explore the MCA parameter space. In Open MPI, the user can specify at run-time many values for MCA parameters; try, e.g. ompi_info --param all all. Alternatively, you can focus on a single aspect of parameters, e.g. the parameters of the OpenIB BTL (ompi_info --param btl openib).

OTPO is a tool that takes in a list of any MCA parameters, with a user specified range of values for those parameters, and for every combination of the MCA parameter values, OTPO executes an MPI job, measuring execution time (the only measurement available right now), bandwidth, etc. The tests used for the measurements are modular. Right now, OTPO supports:

OTPO outputs a list of the best parameter combinations for a certain test.

The main purpose of OTPO is to explore the effect of the MCA parameters on different machines with different architectures and configurations, and explore the dependencies between the MCA parameters themselves. OTPO is meant to run on the head node of a cluster, and it forks MPI jobs after exporting the current combination of MCA parameters on the nodes.

OTPO is built on top of ADCL (Abstract Data and Communication Library). ADCL is an application level communication library aiming at providing the highest possible performance for application level communication operations in a given execution environment. OTPO uses ADCL to provide the runtime selection logic and choosing the best combination of parameters.

How to build OTPO?

./autogen.sh
./configure (this will configure OTPO with the included ADCL library)
make 

How to run OTPO?

OTPO includes a copy of the ADCL library, but if the user has another copy of ADCL already installed on their machine, they can set the ADCL directory on configure. The first thing the user needs to specify is the file containing the parameters. Basically the file contains the name of the parameter and the following options for each parameter:

  • -d default value

Option to set the possible values manually:

  • -p {possible_values}: option for the user to explicitly list the possible values for the parameter

OR to specify a range with an increment by an operation with a specific RPN:

  • -r start_value end_value: specify the start and end value for the parameter
  • -t traversal_method arguments: The method to traverse the range of variables for the parameter. The increment method is only available now, which takes as arguments the operation and the operator.
  • -i rpn: RPN condition that the parameter combinations must satisfy.

A sample file (OpenIB_Parameters) is included for convenience. However, note that the MCA parameter space for Open MPI is always changing, so some parameters might be invalid or the values might not make sense. This is just to help with showing how the format of the input file is.

Next the user needs to have a benchmark compiled and ready to run somewhere. Currently, OTPO supports 3 benchmarks:

  • Netpipe
  • Skampi (5.0.1 is required for otpo to work with skampi).
  • NPB

However it's not hard to write a plugin for another benchmark, since the design is modular.

After specifying the list of parameters, the user is ready to run OTPO. The usage options for running OTPO are:

Required:

  • -p <InputFileName> (file that contains the parameters)
  • -t <test> (name of test; currently supported: Netpipe, NPB, Skampi)
  • -w <test_path> (path to the test on your system and the executable)

Example: -w /home/user1/Netpipe/NPmpi

Optional:

  • -d (debug output)
  • -v (verbose output)
  • -s (status output)
  • -n (silent/no output)
  • -l <message_length> (default is 1 byte)
  • -h <hostfile>
  • -m <mca_params> (mca parameters that you want set when running with OMPI. Note that those are not the parameters that you want to tune. Those are parameters that you want when runnning all the tests)
  • -f <format> (format of output, TXT)
  • -o <output_dir> (directory where the results will be placed, default: results)
  • -b <interrupt_file> (file to write intermediate data when interrupted, default: interrupt.txt)
  • -r <interrupt_file> (the file which contains the data to resume execution)
  • -c Collective operation number (if using Skampi). Valid numbers are:
    • 0 - Bcast
    • 1 - Barrier
    • 2 - Reduce
    • 3 - Allreduce
    • 4 - Gather
    • 5 - Allgather
    • 6 - Gatherv
    • 7 - Allgatherv
    • 8 - Alltoall
    • 9 - Alltoallv
    • 10 - Scatter
    • 11 - Scatterv
  • -a Number of processes (if using Skampi)
  • -e Operation for Reduce/Allreduce like MPI_MAX
  • -x generate an input file from an ouput result file

A sample run command would be:

./otpo -p OpenIB_Parameters -t Netpipe -w path_to_where_netpipe_is_compiled/NPmpi

The --generate_input_file (-x) option is a feature that allows a user to give OTPO previously generated result files. OTPO would then use these files to parse the parameters and the values noted as the best values for those parameters and generate a new input parameter file from them automatically. This can be done using UNION or INTERSECTION of the files, which should be specified with the operation parameter -e or --operations.

An example to run this feature on three result files (R1, R2, and R3):

./otpo -x R1 R2 R3 -e union -o union_input_file
./otpo -x R1 R2 R3 -e intersection -o intersection_input_file

NOTE: Using skampi to tune parameters in the COLL Hierarch Module works only with OMPI trunk and 1.5 and above. As for the COLL tuned module, currently it works in OMPI version where the flag use_dynamic_rules MCA parameter works correctly, as is the case for the 1.4 series starting from revision v1.4.2, the upcoming v1.5 series and trunk starting from revision 22510.

What are the results?

The results are placed in a sub-directory. Every single run of OTPO produces a file with a time stamp that contains the best attribute combinations. It gives the best combination around the best value that it found. These results files produced by OTPO are meant to be intermediate results to an analysis tool in OTPO that takes in any number of result files, does some sort of analysis, and gives the final analysis to the user. The analysis option in OTPO is still under development and research.

OTPO and ADCL:

We mentioned earlier that OTPO is built on top of ADCL. We have to note that ADCL is an MPI application, but we are not interested in the parts of ADCL where MPI is needed. So we created a dummy MPI library within ADCL that the user can use instead of the real MPI library (option set on configure). The other reason for the dummy MPI library is the fact that MPI and fork cause badness in the application. Another options in ADCL that need to be set on configure are the user level timings and number of tests. In short, if the user is using his own ADCL version, he must have the following configure options:

  • --enable-printf-tofile
  • --with-num-tests=1
  • --enable-userlevel-timings
  • --enable-dummy-mpi

To Do

OTPO still misses the analysis portion. It generates the results, but those results need to be interpreted and given to the user in a nicely formatted way. This is a challenge due to the fact that the result files may be very large, and may not have the same attributes.

For some benchmarks, such as the Skampi tests, it could be useful to execute the benchmark for more than one message length at once, and separate the results of the analysis on a per-message length basis.