× Welcome to the Kunena forum!

Tell us and our members who you are, what you like and why you became a member of this site.
We welcome all new members and hope to see you around a lot!

Segmentation Fault (core dumped) error

  • MamtaMohan
  • MamtaMohan's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
8 years 3 months ago - 8 years 2 months ago #88 by MamtaMohan
Hello Cassandra team,

I am having trouble testing example for NPT for diethylether.

Installation for 8 core in Ubuntu 14.04 LTS with Makefile.gfortan.openMP was successful.

I am particularly interested in NPT simulation of the system.

I do see core dumped error listed in forum. However I do not see a satisfactory answer there.

To test above mentioned simulation.

I created a test directory. In the directory dee.ff, dee.pdb, dee.mcf, npt.inp, npt.inp.xyz and Read_Old along with cassandra_gfortran_openMP.exe is there.

I would appreciate your help.

Dear Cassandra team,

As suggested I used intel fortran compiler openMP (=8).

I tested NPT simulation and test failed again.

I am listing the result below:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cassandra_intel_o 000000000057F5B5 Unknown Unknown Unknown
cassandra_intel_o 000000000057D377 Unknown Unknown Unknown
cassandra_intel_o 000000000052FD64 Unknown Unknown Unknown
cassandra_intel_o 000000000052FB76 Unknown Unknown Unknown
cassandra_intel_o 00000000004E2829 Unknown Unknown Unknown
cassandra_intel_o 00000000004E8F30 Unknown Unknown Unknown
libpthread.so.0 00007F3D17A40340 Unknown Unknown Unknown
cassandra_intel_o 00000000004478F5 Unknown Unknown Unknown
cassandra_intel_o 000000000043D5A3 Unknown Unknown Unknown
cassandra_intel_o 000000000047E69E Unknown Unknown Unknown
cassandra_intel_o 0000000000403E3A Unknown Unknown Unknown
cassandra_intel_o 000000000040345E Unknown Unknown Unknown
libc.so.6 00007F3D1768CEC5 Unknown Unknown Unknown
cassandra_intel_o 0000000000403369 Unknown Unknown Unknown

I would appreciate your help.

Thank you.
Mamta
Last Edit: 8 years 2 months ago by MamtaMohan. Reason: compiled Cassandra with intel Fortran compilers and tested again.

Please Log in to join the conversation.

More
8 years 1 month ago #98 by piskuliche
Hello,

So I am not an admin/someone who works on this code, however I thought I might throw an answer up anyway. From my experience, Cassandra is very sensitive to which version of the Intel_Compiler you use. My group has found that using version 13 of this compiler seems to work, while other versions cause segmentation faults. It is also important that you have the intel compiler available not only when you Make Cassandra, but also anytime you run something with it (if you are using a system like Torque you can do this with the command module load intel_compiler)

Hope that helps,
Zeke

Please Log in to join the conversation.

More
8 years 1 month ago #99 by emarin
Hello Zeke,

We are aware that there is a problem with the gfortran compiler and we are trying to fix it. However, we have been not able to reproduce any error with the Intel compiler 15.0. Which version of this compiler you used? Could you send us more information to try to reproduce the error?

Thanks!

Please Log in to join the conversation.

More
8 years 1 month ago #100 by palafox
Hi There,

I was the one doing the testing. Just try to run any GEMC example ( I did Methane) for more than 12hrs with any compiler not being fort 13.01, and it gets caught in an infinite loop. However, I don't think this is the problem originally posted here. To solve that problem I added: ulimit -s unlimited , and the code runs.

I can share part of my discussion with our IT team. Please note that we were testing *both* versions V1.1 and V1.0. This added a layer of complexity while communicating with the IT for the new commands in V1.1. So here is part of the conversation. In my particular case it pointed out that there was a problem in the gemc_nvt_volume.f90 subroutine. This I assume has not being solve.

Best,

Pablo

Here is our discussion when testing icc 15.0.2, which was used after failed trials of ifort 15.0.2 and gfortran-gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC).


"Hi Pablo,

It doesn't look like changing the length of the output logs or using v1.1
helped any. Both running jobs, "7806110" and "7807311", are stuck at a futex
wait (often referred to as a "deadlock"). I was able to successfully run a
Cassandra GEMC simulation to completion using a single threaded version which
ran for ~30hrs. I was also able to run Cassandra under a debugger and determine
the code which is causing the deadlock. Below are my notes for replicating the
deadlock along with the debugger backtrace:

$ icc -v
icc version 15.0.2 (gcc version 4.4.7 compatibility)

$ cat /etc/issue
Red Hat Enterprise Linux Server release 6.4 (Santiago)

$ pwd
/home/wmason/Cassandra_V1.0/Src

$ make -f Makefile.intel.openMP

$ cp cassandra_intel_openMP.exe Methane; cd Methane

$ export OMP_NUM_THREADS=4

$ gdb ./cassandra_intel_openMP.exe

(gdb) run gemc_methane.inp
Starting program:
/home/wmason/Cassandra_V1.0/Src/Methane/cassandra_intel_openMP.exe
gemc_methane.inp
[Thread debugging using libthread_db enabled]
Beginning Cassandra Simulation
[New Thread 0x7ffff7ff9700 (LWP 21006)]
[New Thread 0x7fffee87e740 (LWP 21007)]
[New Thread 0x7ffff687e780 (LWP 21008)]
[New Thread 0x7ffff647d7c0 (LWP 21009)]
successfully inserted molecule 1
....
successfully inserted molecule 500

^C
Program received signal SIGINT, Interrupt.
0x00007ffff760b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.149.el6_6.5.x86_64 libgcc-4.4.7-3.el6.x86_64
(gdb) bt
#0 0x00007ffff760b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007ffff78c2e6e in __kmp_suspend_template (th_gtid=-156846204, flag=0x80)
at ../../src/z_Linux_util.c:1834
#2 __kmp_suspend_64 (th_gtid=-156846204, flag=0x80) at
../../src/z_Linux_util.c:1889
#3 0x00007ffff78629d8 in suspend (bt=4138121092, this_thr=0x80, gtid=1, tid=-1,
reduce=0x7ffff6a6b700, itt_sync_obj=0x0) at ../../src/kmp_wait_release.h:405
#4 __kmp_wait_template (bt=4138121092, this_thr=0x80, gtid=1, tid=-1,
reduce=0x7ffff6a6b700, itt_sync_obj=0x0) at ../../src/kmp_wait_release.h:224
#5 wait (bt=4138121092, this_thr=0x80, gtid=1, tid=-1, reduce=0x7ffff6a6b700,
itt_sync_obj=0x0) at ../../src/kmp_wait_release.h:414
#6 __kmp_hyper_barrier_gather(enum barrier_type, kmp_info_t *, int, int, void
(*)(void *, void *), void *) (bt=4138121092, this_thr=0x80, gtid=1, tid=-1,
reduce=0x7ffff6a6b700, itt_sync_obj=0x0)
at ../../src/kmp_barrier.cpp:510
#7 0x00007ffff7865dcd in __kmp_join_barrier (gtid=-156846204) at
../../src/kmp_barrier.cpp:1375
#8 0x00007ffff788d222 in __kmp_internal_join (id=0x7ffff6a6b784, gtid=128,
team=0x1) at ../../src/kmp_runtime.c:7247
#9 0x00007ffff7893d8e in __kmp_join_call (loc=0x7ffff6a6b784, gtid=128,
exit_teams=1) at ../../src/kmp_runtime.c:2349
#10 0x00007ffff78671cd in __kmpc_fork_call (loc=0x7ffff6a6b784, argc=128,
microtask=0x1) at ../../src/kmp_csupport.c:326
#11 0x000000000042b218 in energy_routines::compute_total_system_energy
(this_box=Cannot access memory at address 0xf6
) at energy_routines.f90:2439
#12 0x00000000004b82a9 in gemc_nvt_volume (box1=1, box2=Cannot access memory at
address 0x80
) at gemc_nvt_volume.f90:461
#13 0x00000000004d1328 in gemc_driver () at gemc_driver.f90:162
#14 0x0000000000406806 in L_MAIN___557__par_region0_2_0 () at main.f90:543
#15 0x0000000000403c4e in main ()
(gdb)

"

Please Log in to join the conversation.

Time to create page: 0.124 seconds