From: Chudin, Eugene (eugene_chudin_at_[hidden])
Date: 2007-05-03 17:41:54


I was wondering if it is expected to have error messages from valgrind
when checking openmpi code?

For instance, I have following trivial code:

#include <mpi.h>
#include <iostream>

template <typename T>
void distribute_val(T& val, int _procid, int _np)
{
        MPI_Bcast(&val, sizeof(T), MPI_CHAR, 0, MPI_COMM_WORLD);
}

using namespace std;

int main(int argc, char** argv)
{
        int procid;
        int nproc;

        MPI_Init(&argc, &argv);
        MPI_Comm_rank (MPI_COMM_WORLD, &procid);
        MPI_Comm_size (MPI_COMM_WORLD, &nproc);
    double val = 0;
        if(procid == 0)
                val = 3.14159;
        distribute_val(val, procid, nproc);

        cout << "ProcID=\t" << procid << "\tval=" << val << endl;
        MPI_Finalize();
        return 0;
}

Which produces errors in valgrind if I run it on 2 processors connected
by network.
If I run it on 2 pocessors located on the same node then I get no errors
from valgrind.
In both cases code runs as expected, but I am still worried about causes
of valgrind errors.

Below is the output from valgrind:
> mpiCC -g -Wall test.cpp -o test
> mpirun -np 2 --machinefile ./mpd.2 --prefix /toolbox/openmpi valgrind
--leak-check=full ./test
==14823== Memcheck, a memory error detector.
==14823== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et
al.
==14823== Using LibVEX rev 1732, a library for dynamic binary
translation.
==14823== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==14823== Using valgrind-3.2.3, a dynamic binary instrumentation
framework.
==14823== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et
al.
==14823== For more details, rerun with: -v
==14823==
==13545== Memcheck, a memory error detector.
==13545== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et
al.
==13545== Using LibVEX rev 1732, a library for dynamic binary
translation.
==13545== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==13545== Using valgrind-3.2.3, a dynamic binary instrumentation
framework.
==13545== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et
al.
==13545== For more details, rerun with: -v
==13545==
==14823== Syscall param writev(vector[...]) points to uninitialised
byte(s)
==14823== at 0x59BFA86: do_writev (in /lib64/tls/libc.so.6)
==14823== by 0x831771E: mca_btl_tcp_frag_send (in
/toolbox64/openmpi/lib/openmpi/mca_btl_tcp.so)
==14823== by 0x83160C9: mca_btl_tcp_endpoint_send_handler (in
/toolbox64/openmpi/lib/openmpi/mca_btl_tcp.so)
==14823== by 0x4F50951: opal_event_base_loop (in
/toolbox64/openmpi/lib/libopen-pal.so.0.0.0)
==14823== by 0x4F509E4: opal_event_loop (in
/toolbox64/openmpi/lib/libopen-pal.so.0.0.0)
==14823== by 0x4F4AE50: opal_progress (in
/toolbox64/openmpi/lib/libopen-pal.so.0.0.0)
==14823== by 0x4C8014B: ompi_request_wait_all (in
/toolbox64/openmpi/lib/libmpi.so.0.0.0)
==14823== by 0x873412D: ompi_coll_tuned_bcast_intra_generic (in
/toolbox64/openmpi/lib/openmpi/mca_coll_tuned.so)
==14823== by 0x8734293: ompi_coll_tuned_bcast_intra_binomial (in
/toolbox64/openmpi/lib/openmpi/mca_coll_tuned.so)
==14823== by 0x872EA9F: ompi_coll_tuned_bcast_intra_dec_fixed (in
/toolbox64/openmpi/lib/openmpi/mca_coll_tuned.so)
==14823== by 0x4C957BA: PMPI_Bcast (in
/toolbox64/openmpi/lib/libmpi.so.0.0.0)
==14823== by 0x408A9D: void distribute_val<double>(double&, int, int)
(test.cpp:7)
==14823== Address 0x41EEE2C is not stack'd, malloc'd or (recently)
free'd
ProcID= 0 val=3.14159
ProcID= 1 val=3.14159
==13545==
==13545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 5)
==13545== malloc/free: in use at exit: 1,920 bytes in 1 blocks.
==13545== malloc/free: 1 allocs, 0 frees, 1,920 bytes allocated.
==13545== For counts of detected errors, rerun with: -v
==13545== searching for pointers to 1 not-freed blocks.
==13545== checked 1,155,400 bytes.
==13545==
==13545== LEAK SUMMARY:
==13545== definitely lost: 0 bytes in 0 blocks.
==13545== possibly lost: 0 bytes in 0 blocks.
==13545== still reachable: 1,920 bytes in 1 blocks.
==13545== suppressed: 0 bytes in 0 blocks.
==13545== Reachable blocks (those to which a pointer was found) are not
shown.
==13545== To see them, rerun with: --leak-check=full
--show-reachable=yes
==14823==
==14823== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 7 from 4)
==14823== malloc/free: in use at exit: 1,920 bytes in 1 blocks.
==14823== malloc/free: 1 allocs, 0 frees, 1,920 bytes allocated.
==14823== For counts of detected errors, rerun with: -v
==14823== searching for pointers to 1 not-freed blocks.
==14823== checked 1,158,440 bytes.
==14823==
==14823== LEAK SUMMARY:
==14823== definitely lost: 0 bytes in 0 blocks.
==14823== possibly lost: 0 bytes in 0 blocks.
==14823== still reachable: 1,920 bytes in 1 blocks.
==14823== suppressed: 0 bytes in 0 blocks.
==14823== Reachable blocks (those to which a pointer was found) are not
shown.
==14823== To see them, rerun with: --leak-check=full
--show-reachable=yes

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
and in Japan, as Banyu - direct contact information for affiliates is
available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and then
delete it from your system.

------------------------------------------------------------------------------