171.67.108.*** Problems getting workunits
Posted: Sat Mar 19, 2016 9:54 am
I'm looking for a little help with getting work units. Apologies if I'm in the wrong place - if so could Mods please move it?
I'm running a machine as spec'd below:
My machine has recently had trouble with getting work units (snippet of log below).
I wondered if there might be a lack of units for 32 cores so I implemented multiple slots of 12, 12, 8, leaving the initial slot set to -1. I immediately got work units, however the PPD appears much less than before.
Could somebody please help me work out what's going on and, if possible, how to resolve it. This machine, unlike my others, is dedicated and so I'd like to use it for the maximum benefit for the folding project. I'd appreciate any input on config.
Thanks very much.
L
I'm running a machine as spec'd below:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 16
Model: 9
Model name: AMD Opteron(tm) Processor 6128
Stepping: 1
CPU MHz: 1999.958
BogoMIPS: 4000.10
Virtualization: AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 5118K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
NUMA node2 CPU(s): 8-11
NUMA node3 CPU(s): 12-15
NUMA node4 CPU(s): 16-19
NUMA node5 CPU(s): 20-23
NUMA node6 CPU(s): 24-27
NUMA node7 CPU(s): 28-31
My machine has recently had trouble with getting work units (snippet of log below).
This is typical of performance before the work units stopped:******************************* Date: 2016-03-11 *******************************
******************************* Date: 2016-03-12 *******************************
******************************* Date: 2016-03-12 *******************************
******************************* Date: 2016-03-12 *******************************
******************************* Date: 2016-03-12 *******************************
******************************* Date: 2016-03-13 *******************************
******************************* Date: 2016-03-13 *******************************
******************************* Date: 2016-03-13 *******************************
20:47:23:WARNING:WU00:FS00:Server did not like results, dumping
******************************* Date: 2016-03-13 *******************************
22:06:39:WARNING:WU01:FS00:Server did not like results, dumping
23:18:13:WARNING:WU00:FS00:Server did not like results, dumping
00:25:10:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
00:25:11:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
00:25:11:ERROR:WU00:FS00:Exception: Could not get an assignment
00:25:12:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
00:25:12:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
00:25:12:ERROR:WU00:FS00:Exception: Could not get an assignment
00:26:12:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
00:26:13:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
00:26:13:ERROR:WU00:FS00:Exception: Could not get an assignment
00:33:33:WARNING:WU01:FS00:Server did not like results, dumping
01:49:09:WARNING:WU00:FS00:Server did not like results, dumping
02:52:02:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
02:52:02:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
02:52:02:ERROR:WU00:FS00:Exception: Could not get an assignment
...etc etc...
******************************* Date: 2016-03-15 *******************************
06:29:52:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
06:29:52:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
06:29:52:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2016-03-15 *******************************
12:29:52:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
12:29:52:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
12:29:52:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2016-03-15 *******************************
******************************* Date: 2016-03-16 *******************************
******************************* Date: 2016-03-16 *******************************
******************************* Date: 2016-03-16 *******************************
******************************* Date: 2016-03-16 *******************************
******************************* Date: 2016-03-17 *******************************
14:19:09:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
14:19:09:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
14:19:09:ERROR:WU00:FS00:Exception: Could not get an assignment
14:19:11:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
14:19:12:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
14:19:12:ERROR:WU00:FS00:Exception: Could not get an assignment
14:20:11:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
14:20:12:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Empty work server assignment
14:20:12:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2016-03-17 *******************************
******************************* Date: 2016-03-18 *******************************
******************************* Date: 2016-03-18 *******************************
******************************* Date: 2016-03-18 *******************************
******************************* Date: 2016-03-18 *******************************
******************************* Date: 2016-03-19 *******************************
******************************* Date: 2016-03-19 *******************************
14:34:07:WU01:FS00:0xa4:Completed 80000 out of 80000 steps (100%)
14:34:09:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
14:34:13:WU00:FS00:Download 69.15%
14:34:19:WU00:FS00:Download 74.16%
14:34:19:WU01:FS00:0xa4:
14:34:19:WU01:FS00:0xa4:Finished Work Unit:
14:34:19:WU01:FS00:0xa4:- Reading up to 4117896 from "01/wudata_01.trr": Read 4117896
14:34:19:WU01:FS00:0xa4:trr file hash check passed.
14:34:19:WU01:FS00:0xa4:- Reading up to 3189560 from "01/wudata_01.xtc": Read 3189560
14:34:19:WU01:FS00:0xa4:xtc file hash check passed.
14:34:19:WU01:FS00:0xa4:edr file hash check passed.
14:34:19:WU01:FS00:0xa4:logfile size: 19947
14:34:19:WU01:FS00:0xa4:Leaving Run
14:34:20:WU01:FS00:0xa4:- Writing 7329795 bytes of core data to disk...
14:34:22:WU01:FS00:0xa4:Done: 7329283 -> 7058738 (compressed to 96.3 percent)
14:34:22:WU01:FS00:0xa4: ... Done.
14:34:26:WU00:FS00:Download 81.17%
14:34:32:WU00:FS00:Download 89.19%
14:34:37:WU00:FS00:Download complete
14:34:37:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9752 run:1275 clone:0 gen:720 core:0xa4 unit:0x00000389ab40416355417363f2a22f96
14:40:36:WU01:FS00:0xa4:- Shutting down core
14:40:36:WU01:FS00:0xa4:
14:40:36:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
14:41:31:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:41:31:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9752 run:572 clone:0 gen:808 core:0xa4 unit:0x0000041aab4041635541726963e76164
14:41:31:WU01:FS00:Uploading 6.73MiB to 171.64.65.99
14:41:31:WU01:FS00:Connecting to 171.64.65.99:8080
14:41:31:WU00:FS00:Starting
14:41:31:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 704 -lifeline 1530 -checkpoint 15 -np 32
14:41:31:WU00:FS00:Started FahCore on PID 47248
14:41:31:WU00:FS00:Core PID:47252
14:41:31:WU00:FS00:FahCore 0xa4 started
14:41:31:WU00:FS00:0xa4:
14:41:31:WU00:FS00:0xa4:*------------------------------*
14:41:31:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
14:41:31:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
14:41:31:WU00:FS00:0xa4:
14:41:31:WU00:FS00:0xa4:Preparing to commence simulation
14:41:31:WU00:FS00:0xa4:- Looking at optimizations...
14:41:31:WU00:FS00:0xa4:- Created dyn
14:41:31:WU00:FS00:0xa4:- Files status OK
14:41:32:WU00:FS00:0xa4:- Expanded 6539191 -> 22431316 (decompressed 343.0 percent)
14:41:32:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=6539191 data_size=22431316, decompressed_data_size=22431316 diff=0
14:41:32:WU00:FS00:0xa4:- Digital signature verified
14:41:32:WU00:FS00:0xa4:
14:41:32:WU00:FS00:0xa4:Project: 9752 (Run 1275, Clone 0, Gen 720)
14:41:32:WU00:FS00:0xa4:
I wondered if there might be a lack of units for 32 cores so I implemented multiple slots of 12, 12, 8, leaving the initial slot set to -1. I immediately got work units, however the PPD appears much less than before.
Could somebody please help me work out what's going on and, if possible, how to resolve it. This machine, unlike my others, is dedicated and so I'd like to use it for the maximum benefit for the folding project. I'd appreciate any input on config.
Thanks very much.
L