Page 1 of 1

9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Posted: Fri Oct 30, 2015 11:52 am
by parkut
Model Name: NVIDIA:5 GM206 [GeForce GTX 960] - CntOS Linux system
Driver Version: 352.41 - Gpu temp: 68C - Client Version: 7.3.6

Project: 9704 (Run 10, Clone 13, Gen 119) INTERRUPTED (102 = 0x66) at 25%
Restarted, but found a problem with the checkpoint file
ERROR:Guru Meditation #76e83436e7d7dcd.bb54b21bd5dbdf80 (41594500.41598750) '01/01/checkpointState.xml'
and :FahCore returned: BAD_WORK_UNIT (114 = 0x72)

Code: Select all

13:31:49:WU01:FS01:0x21:Project: 9704 (Run 10, Clone 13, Gen 119)
13:31:49:WU01:FS01:0x21:Unit: 0x000000a7ab404162553ebd50969d9738
13:31:49:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
13:31:49:WU01:FS01:0x21:Machine: 1
13:31:49:WU01:FS01:0x21:Reading tar file core.xml
13:31:49:WU01:FS01:0x21:Reading tar file system.xml
13:31:50:WU01:FS01:0x21:Reading tar file integrator.xml
13:31:50:WU01:FS01:0x21:Reading tar file state.xml
13:31:52:WU01:FS01:0x21:Digital signatures verified
13:31:52:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
13:31:52:WU01:FS01:0x21:Version 0.0.11
13:33:02:WU01:FS01:0x21:Completed 0 out of 640000 steps (0%)
13:33:02:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:36:48:WU01:FS01:0x21:Completed 6400 out of 640000 steps (1%)
13:40:23:WU01:FS01:0x21:Completed 12800 out of 640000 steps (2%)
13:43:58:WU01:FS01:0x21:Completed 19200 out of 640000 steps (3%)
13:47:33:WU01:FS01:0x21:Completed 25600 out of 640000 steps (4%)
13:51:08:WU01:FS01:0x21:Completed 32000 out of 640000 steps (5%)
13:54:43:WU01:FS01:0x21:Completed 38400 out of 640000 steps (6%)
13:58:18:WU01:FS01:0x21:Completed 44800 out of 640000 steps (7%)
14:01:53:WU01:FS01:0x21:Completed 51200 out of 640000 steps (8%)
14:05:28:WU01:FS01:0x21:Completed 57600 out of 640000 steps (9%)
14:09:02:WU01:FS01:0x21:Completed 64000 out of 640000 steps (10%)
14:12:37:WU01:FS01:0x21:Completed 70400 out of 640000 steps (11%)
14:16:13:WU01:FS01:0x21:Completed 76800 out of 640000 steps (12%)
14:20:02:WU01:FS01:0x21:Completed 83200 out of 640000 steps (13%)
14:23:38:WU01:FS01:0x21:Completed 89600 out of 640000 steps (14%)
14:27:12:WU01:FS01:0x21:Completed 96000 out of 640000 steps (15%)
14:30:47:WU01:FS01:0x21:Completed 102400 out of 640000 steps (16%)
14:34:22:WU01:FS01:0x21:Completed 108800 out of 640000 steps (17%)
14:37:57:WU01:FS01:0x21:Completed 115200 out of 640000 steps (18%)
14:41:32:WU01:FS01:0x21:Completed 121600 out of 640000 steps (19%)
14:45:07:WU01:FS01:0x21:Completed 128000 out of 640000 steps (20%)
14:48:42:WU01:FS01:0x21:Completed 134400 out of 640000 steps (21%)
14:52:17:WU01:FS01:0x21:Completed 140800 out of 640000 steps (22%)
14:55:52:WU01:FS01:0x21:Completed 147200 out of 640000 steps (23%)
14:59:27:WU01:FS01:0x21:Completed 153600 out of 640000 steps (24%)
15:03:02:WU01:FS01:0x21:Completed 160000 out of 640000 steps (25%)
15:03:15:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
15:03:15:WU01:FS01:Starting
15:03:15:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 703 -lifeline 1482 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
15:03:15:WU01:FS01:Started FahCore on PID 7101
15:03:15:WU01:FS01:Core PID:7105
15:03:15:WU01:FS01:FahCore 0x21 started
15:03:15:WU01:FS01:0x21:*********************** Log Started 2015-10-29T15:03:15Z ***********************
15:03:15:WU01:FS01:0x21:Project: 9704 (Run 10, Clone 13, Gen 119)
15:03:15:WU01:FS01:0x21:Unit: 0x000000a7ab404162553ebd50969d9738
15:03:15:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
15:03:15:WU01:FS01:0x21:Machine: 1
15:03:15:WU01:FS01:0x21:Digital signatures verified
15:03:15:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
15:03:15:WU01:FS01:0x21:Version 0.0.11
15:03:15:WU01:FS01:0x21:  Found a checkpoint file
15:03:27:WU01:FS01:0x21:ERROR:Guru Meditation #76e83436e7d7dcd.bb54b21bd5dbdf80 (41594500.41598750) '01/01/checkpointState.xml'
15:03:27:WU01:FS01:0x21:WARNING:Unexpected exit() call
15:03:27:WU01:FS01:0x21:WARNING:Unexpected exit from science code
15:03:27:WU01:FS01:0x21:Saving result file logfile_01.txt
15:03:27:WU01:FS01:0x21:Saving result file checkpt.crc
15:03:27:WU01:FS01:0x21:Saving result file log.txt
15:03:27:WU01:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:03:27:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)[0m[93m
15:03:27:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9704 run:10 clone:13 gen:119 core:0x21 unit:0x000000a7ab404162553ebd50969d9738
15:03:27:WU01:FS01:Uploading 3.75KiB to 171.64.65.98
15:03:27:WU01:FS01:Connecting to 171.64.65.98:8080
15:03:27:WU01:FS01:Upload complete
15:03:27:WU01:FS01:Server responded WORK_ACK (400)
15:03:27:WU01:FS01:Cleaning up

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Posted: Fri Oct 30, 2015 2:07 pm
by toTOW
Someone has been able to complete this WU ...

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Posted: Fri Oct 30, 2015 6:03 pm
by Joe_H
This is two WU's you have reported. Both logs show the processing of the WU being interrupted and then immediately being restarted. The immediate restart is not giving enough time for files to be closed, and is probably why the checkpoint files were not usable. If you can figure out what is interrupting WU's like this, that may be useful in avoiding this problem. It might also be a problem with the software.

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Posted: Fri Oct 30, 2015 6:41 pm
by toTOW
It might also be a good idea to update the client to 7.4.4 for better error handling ...