Merged problems with projects 6903/6904, Part 1

Moderators: Site Moderators, FAHC Science Team

Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

According to Kasson the new server code does not allow nukeing them the way the old code did, when they try the old way the server regenerates the 512-byte download + missing file issue. I think he may be running into the same issue some of us are. I tried running one of them on my 4P I figured it might help, I was running the new WU's any way so it really was not going to be a big loss anyway as far as PPD goes. It took a little under 2 days to run it but it would not send it just died at the end (same error as harlam and probably Patriot). I do not know if Kason is having the same problem or not but I am sure he is working as fast as he can. I do not think there is anything any of us can do to help but if there is I am willing to do what I can.

MtM I do not think any new ones are being generated I think it is just the old ones are not getting completed and keep getting regenerated and as they time out on different folders accounts the problem just keeps increasing. Me thinks Kasson may need a bigger mouse trap. :eo
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
SKeptical_Thinker
Posts: 76
Joined: Tue Apr 29, 2008 11:02 pm
Hardware configuration: XP-32 Pro SP-3
Antec NSK-2480 with two Thermaltake 120mm Smart Fans
Gigabyte ga-ma78gm-s2h 780G IGP
BE-2350 with 10.5 x multiplier, 1.250V in BIOS, clock at 272 (2.856GHz)
EVGA 8800 GS
Ninja Mini CPU HS
GeIL 4GB (2 x 2GB) 240-Pin DDR2 SDRAM DDR2 800
Seagate 500GB SATA hard drive
ASUS 18X DVD±R DVD Burner PATA Model DRW-1814BL

WU dumped after completion and next one seems to be hung

Post by SKeptical_Thinker »

Code: Select all

*********************** Log Started 2012-02-10T21:40:33 ************************
21:40:33:************************* Folding@home Client *************************
21:40:33:    Website: http://folding.stanford.edu/
21:40:33:  Copyright: (c) 2009-2012 Stanford University
21:40:33:     Author: Joseph Coffland <[email protected]>
21:40:33:       Args: --child --lifeline 6630 /etc/fahclient/config.xml --run-as
21:40:33:             fahclient --pid-file=/var/run/fahclient.pid --daemon
21:40:33:     Config: /etc/fahclient/config.xml
21:40:33:******************************** Build ********************************
21:40:33:    Version: 7.1.43
21:40:33:       Date: Jan 2 2012
21:40:33:       Time: 04:27:48
21:40:33:    SVN Rev: 3223
21:40:33:     Branch: fah/trunk/client
21:40:33:   Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
21:40:33:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:40:33:             -fno-unsafe-math-optimizations -msse2
21:40:33:   Platform: linux2 2.6.18-164.11.1.el5
21:40:33:       Bits: 64
21:40:33:       Mode: Release
21:40:33:******************************* System ********************************
21:40:33:        CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
21:40:33:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
21:40:33:       CPUs: 24
21:40:33:     Memory: 47.13GiB
21:40:33:Free Memory: 46.89GiB
21:40:33:    Threads: POSIX_THREADS
21:40:33: On Battery: false
21:40:33: UTC offset: -5
21:40:33:        PID: 6637
21:40:33:        CWD: /var/lib/fahclient
21:40:33:         OS: Linux 2.6.38.2-f x86_64
21:40:33:    OS Arch: AMD64
21:40:33:       GPUs: 2
21:40:33:      GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
21:40:33:      GPU 1: UNSUPPORTED: ES1000
21:40:33:       CUDA: Not detected
21:40:33:***********************************************************************
21:40:33:Started thread 1 on PID 6637
21:40:33:<config>
21:40:33:  <!-- Client Control -->
21:40:33:  <cycle-rate v='4'/>
21:40:33:  <cycles v='-1'/>
21:40:33:  <data-directory v='.'/>
21:40:33:  <disable-project-lookup v='false'/>
21:40:33:  <exec-directory v='/usr/bin'/>
21:40:33:  <exit-when-done v='false'/>
21:40:33:  <threads v='4'/>
21:40:33:
21:40:33:  <!-- Configuration -->
21:40:33:  <config-rotate v='true'/>
21:40:33:  <config-rotate-dir v='configs'/>
21:40:33:  <config-rotate-max v='16'/>
21:40:33:
21:40:33:  <!-- Debugging -->
21:40:33:  <assignment-servers>
21:40:33:    assign3.stanford.edu:8080 assign4.stanford.edu:80
21:40:33:  </assignment-servers>
21:40:33:  <capture-directory v='capture'/>
21:40:33:  <capture-sockets v='false'/>
21:40:33:  <debug-sockets v='false'/>
21:40:33:  <exception-locations v='true'/>
21:40:33:  <gpu-assignment-servers>
21:40:33:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
21:40:33:  </gpu-assignment-servers>
21:40:33:  <stack-traces v='false'/>
21:40:33:
21:40:33:  <!-- Error Handling -->
21:40:33:  <max-slot-errors v='5'/>
21:40:33:  <max-unit-errors v='5'/>
21:40:33:
21:40:33:  <!-- FahCore Control -->
21:40:33:  <checkpoint v='15'/>
21:40:33:  <core-dir v='cores'/>
21:40:33:  <core-priority v='idle'/>
21:40:33:  <cpu-affinity v='false'/>
21:40:33:  <cpu-usage v='100'/>
21:40:33:  <no-assembly v='false'/>
21:40:33:
21:40:33:  <!-- Folding Slot Configuration -->
21:40:33:  <client-subtype v='LINUX'/>
21:40:33:  <client-type v='bigadv'/>
21:40:33:  <cpu-species v='X86_PENTIUM_II'/>
21:40:33:  <cpu-type v='AMD64'/>
21:40:33:  <cpus v='-1'/>
21:40:33:  <cuda-index v='0'/>
21:40:33:  <gpu v='false'/>
21:40:33:  <gpu-usage v='100'/>
21:40:33:  <max-packet-size v='big'/>
21:40:33:  <opencl-index v='0'/>
21:40:33:  <os-species v='UNKNOWN'/>
21:40:33:  <os-type v='LINUX'/>
21:40:33:  <project-key v='0'/>
21:40:33:  <smp v='true'/>
21:40:33:
21:40:33:  <!-- Logging -->
21:40:33:  <log v='log.txt'/>
21:40:33:  <log-color v='true'/>
21:40:33:  <log-crlf v='false'/>
21:40:33:  <log-date v='false'/>
21:40:33:  <log-date-periodically v='21600'/>
21:40:33:  <log-debug v='true'/>
21:40:33:  <log-domain v='false'/>
21:40:33:  <log-header v='true'/>
21:40:33:  <log-level v='true'/>
21:40:33:  <log-no-info-header v='true'/>
21:40:33:  <log-redirect v='false'/>
21:40:33:  <log-rotate v='true'/>
21:40:33:  <log-rotate-dir v='logs'/>
21:40:33:  <log-rotate-max v='16'/>
21:40:33:  <log-short-level v='false'/>
21:40:33:  <log-simple-domains v='true'/>
21:40:33:  <log-thread-id v='false'/>
21:40:33:  <log-thread-prefix v='true'/>
21:40:33:  <log-time v='true'/>
21:40:33:  <log-to-screen v='true'/>
21:40:33:  <log-truncate v='false'/>
21:40:33:  <verbosity v='7'/>
21:40:33:
21:40:33:  <!-- Network -->
21:40:33:  <proxy v=''/>
21:40:33:  <proxy-enable v='false'/>
21:40:33:  <proxy-pass v=''/>
21:40:33:  <proxy-user v=''/>
21:40:33:
21:40:33:  <!-- Process Control -->
21:40:33:  <child v='true'/>
21:40:33:  <daemon v='true'/>
21:40:33:  <pid v='false'/>
21:40:33:  <pid-file v='/var/run/fahclient.pid'/>
21:40:33:  <respawn v='false'/>
21:40:33:  <service v='false'/>
21:40:33:
21:40:33:  <!-- Remote Command Server -->
21:40:33:  <command-address v='0.0.0.0'/>
21:40:33:  <command-allow v='127.0.0.1'/>
21:40:33:  <command-allow-no-pass v='127.0.0.1'/>
21:40:33:  <command-deny v='0.0.0.0/0'/>
21:40:33:  <command-deny-no-pass v='0.0.0.0/0'/>
21:40:33:  <command-port v='36330'/>
21:40:33:
21:40:33:  <!-- Slot Control -->
21:40:33:  <max-shutdown-wait v='60'/>
21:40:33:  <pause-on-battery v='false'/>
21:40:33:  <pause-on-start v='false'/>
21:40:33:
21:40:33:  <!-- User Information -->
21:40:33:  <machine-id v='0'/>
21:40:33:  <passkey v='********************************'/>
21:40:33:  <team v='31574'/>
21:40:33:  <user v='Skeptical_Thinker'/>
21:40:33:
21:40:33:  <!-- Work Unit Control -->
21:40:33:  <dump-after-deadline v='true'/>
21:40:33:  <max-queue v='16'/>
21:40:33:  <max-units v='0'/>
21:40:33:  <next-unit-percentage v='99'/>
21:40:33:
21:40:33:  <!-- Folding Slots -->
21:40:33:</config>
21:40:33:Switching to user fahclient
21:40:33:Trying to access database...
21:40:33:Successfully acquired database lock
21:40:33:Enabled folding slot 00: READY smp:24
21:40:33:Started thread 4 on PID 6637
21:40:33:Started thread 3 on PID 6637
21:40:33:Started thread 5 on PID 6637
21:40:33:Started thread 6 on PID 6637
21:40:33:Started thread 7 on PID 6637
21:40:33:WU01:FS00:Starting
21:40:33:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 01 -suffix 01 -version 701 -checkpoint 15 -np 24
21:40:33:WU01:FS00:Started FahCore on PID 6645
21:40:33:Started thread 8 on PID 6637
21:40:33:WU01:FS00:Core PID:6649
21:40:33:WU01:FS00:FahCore 0xa5 started
21:40:34:WU01:FS00:0xa5:
21:40:34:WU01:FS00:0xa5:*------------------------------*
21:40:34:WU01:FS00:0xa5:Folding@Home Gromacs SMP Core
21:40:34:WU01:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
21:40:34:WU01:FS00:0xa5:
21:40:34:WU01:FS00:0xa5:Preparing to commence simulation
21:40:34:WU01:FS00:0xa5:- Ensuring status. Please wait.
21:40:43:WU01:FS00:0xa5:- Looking at optimizations...
21:40:43:WU01:FS00:0xa5:- Working with standard loops on this execution.
21:40:43:WU01:FS00:0xa5:- Previous termination of core was improper.
21:40:43:WU01:FS00:0xa5:- Going to use standard loops.
21:40:43:WU01:FS00:0xa5:- Files status OK
21:40:49:WU01:FS00:0xa5:- Expanded 57246854 -> 71846524 (decompressed 50.4 percent)
21:40:49:WU01:FS00:0xa5:Called DecompressByteArray: compressed_data_size=57246854 data_size=71846524, decompressed_data_size=71846524 diff=0
21:40:49:WU01:FS00:0xa5:- Digital signature verified
21:40:49:WU01:FS00:0xa5:
21:40:49:WU01:FS00:0xa5:Project: 6903 (Run 5, Clone 13, Gen 69)
21:40:49:WU01:FS00:0xa5:
21:40:50:WU01:FS00:0xa5:Entering M.D.
21:40:56:WU01:FS00:0xa5:Using Gromacs checkpoints
21:41:02:WU01:FS00:0xa5:Mapping NT from 24 to 24
21:41:42:WU01:FS00:0xa5:Resuming from checkpoint
21:41:45:WU01:FS00:0xa5:Verified 01/wudata_01.log
21:41:48:WU01:FS00:0xa5:Verified 01/wudata_01.trr
21:41:53:WU01:FS00:0xa5:Verified 01/wudata_01.xtc
21:41:54:WU01:FS00:0xa5:Verified 01/wudata_01.edr
21:41:54:WU01:FS00:0xa5:Completed 216335 out of 500000 steps  (43%)
22:29:19:WU01:FS00:0xa5:Completed 220000 out of 500000 steps  (44%)
23:33:32:WU01:FS00:0xa5:Completed 225000 out of 500000 steps  (45%)
00:36:41:WU01:FS00:0xa5:Completed 230000 out of 500000 steps  (46%)
01:40:12:WU01:FS00:0xa5:Completed 235000 out of 500000 steps  (47%)
02:43:37:WU01:FS00:0xa5:Completed 240000 out of 500000 steps  (48%)
******************************** Date: 11/02/12 ********************************
03:47:12:WU01:FS00:0xa5:Completed 245000 out of 500000 steps  (49%)
04:51:20:WU01:FS00:0xa5:Completed 250000 out of 500000 steps  (50%)
05:54:38:WU01:FS00:0xa5:Completed 255000 out of 500000 steps  (51%)
06:58:02:WU01:FS00:0xa5:Completed 260000 out of 500000 steps  (52%)
08:01:31:WU01:FS00:0xa5:Completed 265000 out of 500000 steps  (53%)
09:05:25:WU01:FS00:0xa5:Completed 270000 out of 500000 steps  (54%)
******************************** Date: 11/02/12 ********************************
10:08:37:WU01:FS00:0xa5:Completed 275000 out of 500000 steps  (55%)
11:12:08:WU01:FS00:0xa5:Completed 280000 out of 500000 steps  (56%)
12:15:46:WU01:FS00:0xa5:Completed 285000 out of 500000 steps  (57%)
13:19:16:WU01:FS00:0xa5:Completed 290000 out of 500000 steps  (58%)
14:21:31:WU01:FS00:Downloading project 6903 description
14:21:31:WU01:FS00:Connecting to fah-web.stanford.edu:80
14:21:32:WU01:FS00:Project 6903 description downloaded successfully
14:23:06:WU01:FS00:0xa5:Completed 295000 out of 500000 steps  (59%)
15:26:35:WU01:FS00:0xa5:Completed 300000 out of 500000 steps  (60%)
******************************** Date: 11/02/12 ********************************
16:30:17:WU01:FS00:0xa5:Completed 305000 out of 500000 steps  (61%)
17:34:10:WU01:FS00:0xa5:Completed 310000 out of 500000 steps  (62%)
18:37:42:WU01:FS00:0xa5:Completed 315000 out of 500000 steps  (63%)
19:40:54:WU01:FS00:0xa5:Completed 320000 out of 500000 steps  (64%)
20:45:00:WU01:FS00:0xa5:Completed 325000 out of 500000 steps  (65%)
21:48:13:WU01:FS00:0xa5:Completed 330000 out of 500000 steps  (66%)
******************************** Date: 11/02/12 ********************************
22:51:33:WU01:FS00:0xa5:Completed 335000 out of 500000 steps  (67%)
23:54:58:WU01:FS00:0xa5:Completed 340000 out of 500000 steps  (68%)
00:58:19:WU01:FS00:0xa5:Completed 345000 out of 500000 steps  (69%)
02:02:00:WU01:FS00:0xa5:Completed 350000 out of 500000 steps  (70%)
03:06:00:WU01:FS00:0xa5:Completed 355000 out of 500000 steps  (71%)
04:09:23:WU01:FS00:0xa5:Completed 360000 out of 500000 steps  (72%)
******************************** Date: 12/02/12 ********************************
05:13:53:WU01:FS00:0xa5:Completed 365000 out of 500000 steps  (73%)
06:17:01:WU01:FS00:0xa5:Completed 370000 out of 500000 steps  (74%)
07:20:12:WU01:FS00:0xa5:Completed 375000 out of 500000 steps  (75%)
08:23:30:WU01:FS00:0xa5:Completed 380000 out of 500000 steps  (76%)
09:27:29:WU01:FS00:0xa5:Completed 385000 out of 500000 steps  (77%)
10:31:15:WU01:FS00:0xa5:Completed 390000 out of 500000 steps  (78%)
******************************** Date: 12/02/12 ********************************
11:35:29:WU01:FS00:0xa5:Completed 395000 out of 500000 steps  (79%)
12:38:45:WU01:FS00:0xa5:Completed 400000 out of 500000 steps  (80%)
13:42:27:WU01:FS00:0xa5:Completed 405000 out of 500000 steps  (81%)
14:45:48:WU01:FS00:0xa5:Completed 410000 out of 500000 steps  (82%)
15:49:06:WU01:FS00:0xa5:Completed 415000 out of 500000 steps  (83%)
16:52:25:WU01:FS00:0xa5:Completed 420000 out of 500000 steps  (84%)
******************************** Date: 12/02/12 ********************************
17:55:36:WU01:FS00:0xa5:Completed 425000 out of 500000 steps  (85%)
18:59:04:WU01:FS00:0xa5:Completed 430000 out of 500000 steps  (86%)
20:02:58:WU01:FS00:0xa5:Completed 435000 out of 500000 steps  (87%)
21:06:13:WU01:FS00:0xa5:Completed 440000 out of 500000 steps  (88%)
22:09:35:WU01:FS00:0xa5:Completed 445000 out of 500000 steps  (89%)
23:12:29:WU01:FS00:0xa5:Completed 450000 out of 500000 steps  (90%)
******************************** Date: 13/02/12 ********************************
00:15:40:WU01:FS00:0xa5:Completed 455000 out of 500000 steps  (91%)
01:19:54:WU01:FS00:0xa5:Completed 460000 out of 500000 steps  (92%)
02:23:10:WU01:FS00:0xa5:Completed 465000 out of 500000 steps  (93%)
03:26:26:WU01:FS00:0xa5:Completed 470000 out of 500000 steps  (94%)
04:30:22:WU01:FS00:0xa5:Completed 475000 out of 500000 steps  (95%)
05:33:26:WU01:FS00:0xa5:Completed 480000 out of 500000 steps  (96%)
******************************** Date: 13/02/12 ********************************
06:37:17:WU01:FS00:0xa5:Completed 485000 out of 500000 steps  (97%)
07:40:32:WU01:FS00:0xa5:Completed 490000 out of 500000 steps  (98%)
08:44:18:WU01:FS00:0xa5:Completed 495000 out of 500000 steps  (99%)
08:44:19:WU00:FS00:Connecting to assign3.stanford.edu:8080
08:44:20:WU00:FS00:News: Welcome to Folding@Home
08:44:20:WU00:FS00:Assigned to work server 130.237.232.237
08:44:20:WU00:FS00:Requesting new work unit for slot 00: RUNNING smp:24 from 130.237.232.237
08:44:20:WU00:FS00:Connecting to 130.237.232.237:8080
08:44:31:WU00:FS00:Downloading 44.36MiB
08:44:37:WU00:FS00:Download 48.33%
08:44:41:WU00:FS00:Download complete
08:44:41:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:OK project:6904 run:2 clone:18 gen:54 core:0xa5 unit:0x0000005d52be746d4dfbca2cd51e4bf9
08:44:41:WU00:FS00:Downloading project 6904 description
08:44:41:WU00:FS00:Connecting to fah-web.stanford.edu:80
08:44:41:WU00:FS00:Project 6904 description downloaded successfully
09:47:35:WU01:FS00:0xa5:Completed 500000 out of 500000 steps  (100%)
09:48:02:WU01:FS00:0xa5:DynamicWrapper: Finished Work Unit: sleep=10000
09:48:12:WU01:FS00:0xa5:
09:48:12:WU01:FS00:0xa5:Finished Work Unit:
09:48:16:WU01:FS00:0xa5:- Reading up to 182433744 from "01/wudata_01.trr": Read 182433744
09:48:17:WU01:FS00:0xa5:trr file hash check passed.
09:48:23:WU01:FS00:0xa5:- Reading up to 207685912 from "01/wudata_01.xtc": Read 207685912
09:48:24:WU01:FS00:0xa5:xtc file hash check passed.
09:48:24:WU01:FS00:0xa5:edr file hash check passed.
09:48:24:WU01:FS00:0xa5:logfile size: 414859
09:48:24:WU01:FS00:0xa5:Leaving Run
09:48:28:WU01:FS00:0xa5:- Writing 390878507 bytes of core data to disk...
09:49:39:WU01:FS00:0xa5:Done: 390877995 -> 378477591 (compressed to 8.9 percent)
09:49:39:WU01:FS00:0xa5:- Compressed data size (378477591) exceeds limit.
09:49:39:WU01:FS00:0xa5:- Error: Could not write out results to file
09:49:39:WU01:FS00:0xa5:- Shutting down core
09:49:39:WU01:FS00:0xa5:
09:49:39:WU01:FS00:0xa5:Folding@home Core Shutdown: FILE_IO_ERROR
09:49:39:WU01:FS00:FahCore returned: FILE_IO_ERROR (117 = 0x75)
09:49:39:WARNING:WU01:FS00:Fatal error, dumping
09:49:39:WU01:FS00:Sending unit results: id:01 state:SEND error:DUMPED project:6903 run:5 clone:13 gen:69 core:0xa5 unit:0x0000005452be746d4de923422e50378d
09:49:39:WU01:FS00:Uploading 512B to 130.237.232.237
09:49:39:WU01:FS00:Connecting to 130.237.232.237:8080
09:49:39:WU00:FS00:Starting
09:49:39:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 00 -suffix 01 -version 701 -checkpoint 15 -np 24
09:49:39:WU00:FS00:Started FahCore on PID 13497
09:49:39:Started thread 9 on PID 6637
09:49:39:WU00:FS00:Core PID:13501
09:49:39:WU00:FS00:FahCore 0xa5 started
09:49:40:WU01:FS00:Upload complete
09:49:40:WU01:FS00:Server responded WORK_ACK (400)
09:49:40:WU01:FS00:Cleaning up
09:49:40:WU00:FS00:0xa5:
09:49:40:WU00:FS00:0xa5:*------------------------------*
09:49:40:WU00:FS00:0xa5:Folding@Home Gromacs SMP Core
09:49:40:WU00:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:49:40:WU00:FS00:0xa5:
09:49:40:WU00:FS00:0xa5:Preparing to commence simulation
09:49:40:WU00:FS00:0xa5:- Looking at optimizations...
09:49:40:WU00:FS00:0xa5:- Created dyn
09:49:40:WU00:FS00:0xa5:- Files status OK
09:49:45:WU00:FS00:0xa5:- Expanded 46509365 -> 71843392 (decompressed 62.1 percent)
09:49:45:WU00:FS00:0xa5:Called DecompressByteArray: compressed_data_size=46509365 data_size=71843392, decompressed_data_size=71843392 diff=0
09:49:45:WU00:FS00:0xa5:- Digital signature verified
09:49:45:WU00:FS00:0xa5:
09:49:45:WU00:FS00:0xa5:Project: 6904 (Run 2, Clone 18, Gen 54)
09:49:45:WU00:FS00:0xa5:
09:49:45:WU00:FS00:0xa5:Assembly optimizations on if available.
09:49:45:WU00:FS00:0xa5:Entering M.D.
09:49:53:WU00:FS00:0xa5:Mapping NT from 24 to 24
09:49:58:WU00:FS00:0xa5:Completed 0 out of 13750000 steps  (0%)
The last entry is nearly 8 hours old

This is what I see when I restart the service:

Code: Select all

cat /var/lib/fahclient/log.txt
*********************** Log Started 2012-02-13T17:24:16 ************************
17:24:16:************************* Folding@home Client *************************
17:24:16:    Website: http://folding.stanford.edu/
17:24:16:  Copyright: (c) 2009-2012 Stanford University
17:24:16:     Author: Joseph Coffland <[email protected]>
17:24:16:       Args: --child --lifeline 31158 /etc/fahclient/config.xml --run-as
17:24:16:             fahclient --pid-file=/var/run/fahclient.pid --daemon
17:24:16:     Config: /etc/fahclient/config.xml
17:24:16:******************************** Build ********************************
17:24:16:    Version: 7.1.43
17:24:16:       Date: Jan 2 2012
17:24:16:       Time: 04:27:48
17:24:16:    SVN Rev: 3223
17:24:16:     Branch: fah/trunk/client
17:24:16:   Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
17:24:16:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
17:24:16:             -fno-unsafe-math-optimizations -msse2
17:24:16:   Platform: linux2 2.6.18-164.11.1.el5
17:24:16:       Bits: 64
17:24:16:       Mode: Release
17:24:16:******************************* System ********************************
17:24:16:        CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
17:24:16:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
17:24:16:       CPUs: 24
17:24:16:     Memory: 47.13GiB
17:24:16:Free Memory: 44.68GiB
17:24:16:    Threads: POSIX_THREADS
17:24:16: On Battery: false
17:24:16: UTC offset: -5
17:24:16:        PID: 31165
17:24:16:Started thread 1 on PID 31165
17:24:16:        CWD: /var/lib/fahclient
17:24:16:         OS: Linux 2.6.38.2-f x86_64
17:24:16:    OS Arch: AMD64
17:24:16:       GPUs: 2
17:24:16:      GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
17:24:16:      GPU 1: UNSUPPORTED: ES1000
17:24:16:       CUDA: Not detected
17:24:16:***********************************************************************
17:24:16:<config>
17:24:16:  <!-- Client Control -->
17:24:16:  <cycle-rate v='4'/>
17:24:16:  <cycles v='-1'/>
17:24:16:  <data-directory v='.'/>
17:24:16:  <disable-project-lookup v='false'/>
17:24:16:  <exec-directory v='/usr/bin'/>
17:24:16:  <exit-when-done v='false'/>
17:24:16:  <threads v='4'/>
17:24:16:
17:24:16:  <!-- Configuration -->
17:24:16:  <config-rotate v='true'/>
17:24:16:  <config-rotate-dir v='configs'/>
17:24:16:  <config-rotate-max v='16'/>
17:24:16:
17:24:16:  <!-- Debugging -->
17:24:16:  <assignment-servers>
17:24:16:    assign3.stanford.edu:8080 assign4.stanford.edu:80
17:24:16:  </assignment-servers>
17:24:16:  <capture-directory v='capture'/>
17:24:16:  <capture-sockets v='false'/>
17:24:16:  <debug-sockets v='false'/>
17:24:16:  <exception-locations v='true'/>
17:24:16:  <gpu-assignment-servers>
17:24:16:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
17:24:16:  </gpu-assignment-servers>
17:24:16:  <stack-traces v='false'/>
17:24:16:
17:24:16:  <!-- Error Handling -->
17:24:16:  <max-slot-errors v='5'/>
17:24:16:  <max-unit-errors v='5'/>
17:24:16:
17:24:16:  <!-- FahCore Control -->
17:24:16:  <checkpoint v='15'/>
17:24:16:  <core-dir v='cores'/>
17:24:16:  <core-priority v='idle'/>
17:24:16:  <cpu-affinity v='false'/>
17:24:16:  <cpu-usage v='100'/>
17:24:16:  <no-assembly v='false'/>
17:24:16:
17:24:16:  <!-- Folding Slot Configuration -->
17:24:16:  <client-subtype v='LINUX'/>
17:24:16:  <client-type v='bigadv'/>
17:24:16:  <cpu-species v='X86_PENTIUM_II'/>
17:24:16:  <cpu-type v='AMD64'/>
17:24:16:  <cpus v='-1'/>
17:24:16:  <cuda-index v='0'/>
17:24:16:  <gpu v='false'/>
17:24:16:  <gpu-usage v='100'/>
17:24:16:  <max-packet-size v='big'/>
17:24:16:  <opencl-index v='0'/>
17:24:16:  <os-species v='UNKNOWN'/>
17:24:16:  <os-type v='LINUX'/>
17:24:16:  <project-key v='0'/>
17:24:16:  <smp v='true'/>
17:24:16:
17:24:16:  <!-- Logging -->
17:24:16:  <log v='log.txt'/>
17:24:16:  <log-color v='true'/>
17:24:16:  <log-crlf v='false'/>
17:24:16:  <log-date v='false'/>
17:24:16:  <log-date-periodically v='21600'/>
17:24:16:  <log-debug v='true'/>
17:24:16:  <log-domain v='false'/>
17:24:16:  <log-header v='true'/>
17:24:16:  <log-level v='true'/>
17:24:16:  <log-no-info-header v='true'/>
17:24:16:  <log-redirect v='false'/>
17:24:16:  <log-rotate v='true'/>
17:24:16:  <log-rotate-dir v='logs'/>
17:24:16:  <log-rotate-max v='16'/>
17:24:16:  <log-short-level v='false'/>
17:24:16:  <log-simple-domains v='true'/>
17:24:16:  <log-thread-id v='false'/>
17:24:16:  <log-thread-prefix v='true'/>
17:24:16:  <log-time v='true'/>
17:24:16:  <log-to-screen v='true'/>
17:24:16:  <log-truncate v='false'/>
17:24:16:  <verbosity v='7'/>
17:24:16:
17:24:16:  <!-- Network -->
17:24:16:  <proxy v=''/>
17:24:16:  <proxy-enable v='false'/>
17:24:16:  <proxy-pass v=''/>
17:24:16:  <proxy-user v=''/>
17:24:16:
17:24:16:  <!-- Process Control -->
17:24:16:  <child v='true'/>
17:24:16:  <daemon v='true'/>
17:24:16:  <pid v='false'/>
17:24:16:  <pid-file v='/var/run/fahclient.pid'/>
17:24:16:  <respawn v='false'/>
17:24:16:  <service v='false'/>
17:24:16:
17:24:16:  <!-- Remote Command Server -->
17:24:16:  <command-address v='0.0.0.0'/>
17:24:16:  <command-allow v='127.0.0.1'/>
17:24:16:  <command-allow-no-pass v='127.0.0.1'/>
17:24:16:  <command-deny v='0.0.0.0/0'/>
17:24:16:  <command-deny-no-pass v='0.0.0.0/0'/>
17:24:16:  <command-port v='36330'/>
17:24:16:
17:24:16:  <!-- Slot Control -->
17:24:16:  <max-shutdown-wait v='60'/>
17:24:16:  <pause-on-battery v='false'/>
17:24:16:  <pause-on-start v='false'/>
17:24:16:
17:24:16:  <!-- User Information -->
17:24:16:  <machine-id v='0'/>
17:24:16:  <passkey v='********************************'/>
17:24:16:  <team v='31574'/>
17:24:16:  <user v='Skeptical_Thinker'/>
17:24:16:
17:24:16:  <!-- Work Unit Control -->
17:24:16:  <dump-after-deadline v='true'/>
17:24:16:  <max-queue v='16'/>
17:24:16:  <max-units v='0'/>
17:24:16:  <next-unit-percentage v='99'/>
17:24:16:
17:24:16:  <!-- Folding Slots -->
17:24:16:</config>
17:24:16:Switching to user fahclient
17:24:16:Trying to access database...
17:24:16:Successfully acquired database lock
17:24:16:Enabled folding slot 00: READY smp:24
17:24:17:Started thread 3 on PID 31165
17:24:17:WU00:FS00:Starting
17:24:17:Started thread 5 on PID 31165
17:24:17:Started thread 6 on PID 31165
17:24:17:Started thread 4 on PID 31165
17:24:17:Started thread 7 on PID 31165
17:24:17:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 00 -suffix 01 -version 701 -checkpoint 15 -np 24
17:24:17:WU00:FS00:Started FahCore on PID 31173
17:24:17:Started thread 8 on PID 31165
17:24:17:WU00:FS00:Core PID:31177
17:24:17:WU00:FS00:FahCore 0xa5 started
17:24:17:WU00:FS00:0xa5:
17:24:17:WU00:FS00:0xa5:*------------------------------*
17:24:17:WU00:FS00:0xa5:Folding@Home Gromacs SMP Core
17:24:17:WU00:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
17:24:17:WU00:FS00:0xa5:
17:24:17:WU00:FS00:0xa5:Preparing to commence simulation
17:24:17:WU00:FS00:0xa5:- Looking at optimizations...
17:24:17:WU00:FS00:0xa5:- Files status OK
17:24:22:WU00:FS00:0xa5:- Expanded 46509365 -> 71843392 (decompressed 62.1 percent)
17:24:22:WU00:FS00:0xa5:Called DecompressByteArray: compressed_data_size=46509365 data_size=71843392, decompressed_data_size=71843392 diff=0
17:24:23:WU00:FS00:0xa5:- Digital signature verified
17:24:23:WU00:FS00:0xa5:
17:24:23:WU00:FS00:0xa5:Project: 6904 (Run 2, Clone 18, Gen 54)
17:24:23:WU00:FS00:0xa5:
17:24:23:WU00:FS00:0xa5:Assembly optimizations on if available.
17:24:23:WU00:FS00:0xa5:Entering M.D.
17:24:29:WU00:FS00:0xa5:Using Gromacs checkpoints
17:24:33:WU00:FS00:0xa5:Mapping NT from 24 to 24
17:24:42:WU00:FS00:0xa5:Resuming from checkpoint
17:24:58:WU00:FS00:0xa5:Verified 00/wudata_01.log
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.trr
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.xtc
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.edr
17:25:03:WU00:FS00:0xa5:Completed 24765 out of 13750000 steps  (0%)
Is this WU really expected to take 173.5 days?
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: WU dumped after completion and next one seems to be hung

Post by 7im »

No, the ETA's on V7 are currently hosed. From bug ticket history, it looks like the next beta version will have much improved ETA and PPD information.

It also looks like PG is still having issues with the 6903/6904 WUs. There was a combined thread on that here. Maybe watch that thread for updates/ideas.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Merged problems with projects 6903/6904

Post by kasson »

Bad WU's should be offline but the rest of the project should be up and assigning now. Please post if you see further problems. Thanks.
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Merged problems with projects 6903/6904

Post by Nathan_P »

6903 seems to be assigning fine, i had a string of SMP last night and this morning but picked up a 6903 about 7 hours ago
Image
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps

Post by Leonardo »

Project 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps, hung at 0 completed for nearly four hours.

Linux, Client Version 6.34

Code: Select all

 [14:46:51] Completed 250000 out of 250000 steps  (100%)
...
[14:48:11] Sending work to server
...
[14:48:13] Connecting to http://130.237.232.237:8080/
[14:48:25] Posted data.
[14:48:25] Initial: 0000; - Receiving payload (expected size: 46513362)
...
[14:49:22] Project: 6903 (Run 2, Clone 13, Gen 39)
[14:49:22]... 
[14:49:32] Completed 0 out of 10000000 steps  (0%)
I stopped the client at approximately 18:30 and deleted work, queue.dat, machinedependent.dat, and unitinfo.txt. Upon client restart, the system downloaded a fresh unit and started processing normally, as far as I could tell, based on CPU core temps and utilization.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps

Post by Grandpa_01 »

Congrats you are the winner, :mrgreen: you got the first on since the fix. Hopefully Kasson will be able to catch it and remove it before it spreads too far. Just delete it and move on. You could fold it if you wanted to it will only take a month or several then it will not send after it completes. Just joking about the last part. :D

Edit
I am surprised that one is still floating around I had it a while back the report was merged into another thread. I thought Kasson had jot all of them. viewtopic.php?f=19&t=20692&start=0#p206671
Last edited by Grandpa_01 on Sun Feb 19, 2012 7:10 pm, edited 1 time in total.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps

Post by Leonardo »

Is there a prize, Grandpa?

EDIT: Moderators, sorry about not posting in the existing thread. Without my morning coffee, I couldn't find the subject thread's location.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps

Post by Grandpa_01 »

Yes there is you get 0 points for the CPU cycles you used. :lol:
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Merged problems with projects 6903/6904

Post by kasson »

Thanks--stopped that one. Sorry you encountered it.
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.

Code: Select all

[20:48:38] Project: 6903 (Run 6, Clone 0, Gen 72)
[20:48:38] 
[20:48:38] Assembly optimizations on if available.
[20:48:38] Entering M.D.
[20:48:46] Mapping NT from 48 to 48 
[20:48:51] Completed 0 out of 500000 steps  (0%)
[21:05:30] g NT from 48 to 48 
[21:06:34] Resuming from checkpoint
[21:07:05] Verified work/wudata_06.log
[21:07:05] Verified work/wudata_06.trr
[21:07:05] Verified work/wudata_06.xtc
[21:07:05] Verified work/wudata_06.edr
[21:07:06] Completed 2615 out of 500000 steps  (0%)
[21:19:35] Completed 5000 out of 500000 steps  (1%)
[21:45:57] Completed 10000 out of 500000 steps  (2%)
[22:12:18] Completed 15000 out of 500000 steps  (3%)
[22:38:45] Completed 20000 out of 500000 steps  (4%)
[23:05:01] Completed 25000 out of 500000 steps  (5%)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Merged problems with projects 6903/6904

Post by bruce »

ChelseaOilman wrote:I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.
As I understand it, this is good. The bad projects had too many steps and they've been corrected to have only 500 000 steps.
Joe_H
Site Admin
Posts: 7929
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Merged problems with projects 6903/6904

Post by Joe_H »

bruce wrote:
ChelseaOilman wrote:I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.
As I understand it, this is good. The bad projects had too many steps and they've been corrected to have only 500 000 steps.
Not good as the correct number of steps is supposed to be 250,000. This was one of the least wrong at only 2x.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

Joe_H wrote:Not good as the correct number of steps is supposed to be 250,000. This was one of the least wrong at only 2x.
I believe that's correct. This WU has been out for a while. Dr. Kasson should have caught it on his first go around.
Schmidde
Posts: 1
Joined: Sun Feb 26, 2012 7:08 pm

Re: Merged problems with projects 6903/6904

Post by Schmidde »

No, the Project: 6903 (Run 6, Clone 0, Gen 72) is a "bad" WU.
I´m folding it actually self with that 500 000 steps. Normally are 250 000 steps.

Please fix, it´s disappointing folding 4-5 Days and only geht Base Points.
Post Reply