Re: List of SMP WUs with the "1 core usage" issue
Posted: Wed Aug 19, 2009 1:36 pm
List edited ... it's beginning to be a long list
Community driven support forum for Folding@home
https://foldingforum.org/
Code: Select all
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64
NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp
Making 1D domain decomposition 1 x 1 x 4
starting mdrun 'IBX in water'
7250000 steps, 14500.0 ps (continuing from step 7000000, 14000.0 ps).
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357
Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.
Variable ci has value -2147483269. It should have been within [ 0 .. 9464 ]
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357
Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.
Variable ci has value -2147483611. It should have been within [ 0 .. 256 ]
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Halting parallel program mdrun on CPU 0 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
Code: Select all
quad8.parkut.com
14:24:01 up 11 days, 11:25, 0 users, load average: 1.00, 1.00, 1.00
20077 99.6 20077 S ? 01:25:05 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -checkpoint 15 -verbose -lifeline 3086 -version 624
20080 0.3 20080 S ? 00:00:18 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -checkpoint 15 -verbose -lifeline 3086 -version 624
20078 0.0 20078 S ? 00:00:04 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -checkpoint 15 -verbose -lifeline 3086 -version 624
20079 0.0 20079 S ? 00:00:04 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -checkpoint 15 -verbose -lifeline 3086 -version 624
...
model name : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
cpu MHz : 3006.932
cache size : 4096 KB
Memory: 1.96 GB physical, 1.94 GB virtual
...
Client Version 6.24R3
Core: FahCore_a2.exe
Core Version 2.08 (Mon May 18 14:47:42 PDT 2009)
Current Work Unit
-----------------
Name: p2671_IBX in water
Tag: P2671R37C79G78
Download time: August 20 16:58:37
Due time: August 23 16:58:37
Progress: 1% [__________]
It's true, when one of these WUs broke my client which ran 2.07, after I deleted the core it downloaded 2.08 and restarted the (same) WU. And it did start and ran on only one core.ChasR wrote:... If you find a WU that hangs on 2.07 and runs on one core on 2.08, then you will have convinced me.
Code: Select all
[02:02:56] Connecting to http://171.64.65.56:8080/
[02:03:03] Posted data.
[02:03:03] Initial: 0000; - Receiving payload (expected size: 1508832)
[02:03:04] - Downloaded at ~1473 kB/s
[02:03:04] - Averaged speed for that direction ~1230 kB/s
[02:03:04] + Received work.
[02:03:04] Trying to send all finished work units
[02:03:04] + No unsent completed units remaining.
[02:03:04] + Closed connections
[02:03:04]
[02:03:04] + Processing work unit
[02:03:04] At least 4 processors must be requested.Core required: FahCore_a2.exe
[02:03:04] Core found.
[02:03:05] Working on queue slot 02 [August 21 02:03:05 UTC]
[02:03:05] + Working ...
[02:03:05] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -priority 96 -checkpoint 10 -verbose -lifeline 28432 -version 624'
[02:03:05]
[02:03:05] *------------------------------*
[02:03:05] Folding@Home Gromacs SMP Core
[02:03:05] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[02:03:05]
[02:03:05] Preparing to commence simulation
[02:03:05] - Ensuring status. Please wait.
[02:03:06] Called DecompressByteArray: compressed_data_size=1508320 data_size=23973757, decompressed_data_size=23973757 diff=0
[02:03:06] - Digital signature verified
[02:03:06]
[02:03:06] Project: 2669 (Run 13, Clone 29, Gen 178)
[02:03:06]
[02:03:06] Assembly optimizations on if available.
[02:03:06] Entering M.D.
[02:03:16] un 13, Clone 29, Gen 178)
[02:03:16]
[02:03:16] Entering M.D.
[02:03:53] Completed 0 out of 250000 steps (0%)