Page 1 of 1

16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 4:00 am
by legalalien
Just received Server responded WORK_QUIT (404) / Server did not like results, dumping .. there goes several days of GPU work :( The GPU is EVGA Nvidia GT 710 2GB, no overclocking.

I suspect it may have something to do with CORE_OUTDATED received in the middle of uploading the results:

**

Code: Select all

***************************** Date: 2020-05-15 *******************************
00:52:48:WU02:FS01:0x22:Completed 2475000 out of 2500000 steps (99%)
00:52:49:WU01:FS01:Connecting to 65.254.110.245:80
00:52:49:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
00:52:49:WU01:FS01:Connecting to 18.218.241.186:80
00:52:50:WU01:FS01:Assigned to work server 206.223.170.146
00:52:50:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK208 [GeForce GT 710 LP] from 206.223.170.146
00:52:50:WU01:FS01:Connecting to 206.223.170.146:8080
00:52:59:WU01:FS01:Downloading 35.98MiB
00:53:05:WU01:FS01:Download 44.64%
00:53:11:WU01:FS01:Download 79.73%
00:53:13:WU01:FS01:Download complete
00:53:13:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14251 run:114 clone:0 gen:11 core:0x22 unit:0x00000013cedfaa925eab0323e1367802
02:18:53:WU02:FS01:0x22:Completed 2500000 out of 2500000 steps (100%)
02:19:26:WU02:FS01:0x22:Saving result file ../logfile_01.txt
02:19:26:WU02:FS01:0x22:Saving result file checkpointState.xml
02:19:26:WU02:FS01:0x22:Saving result file checkpt.crc
02:19:26:WU02:FS01:0x22:Saving result file positions.xtc
02:19:27:WU02:FS01:0x22:Saving result file science.log
02:19:27:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
02:19:27:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:19:27:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16434 run:232 clone:3 gen:13 core:0x22 unit:0x0000001503854c135e9cbacc8fa1f2ea
02:19:27:WU02:FS01:Uploading 104.38MiB to 3.133.76.19
02:19:27:WU02:FS01:Connecting to 3.133.76.19:8080
02:19:27:WU01:FS01:Starting
02:19:27:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 982 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -tmax=80 -twait=900
02:19:27:WU01:FS01:Started FahCore on PID 18857
02:19:27:WU01:FS01:Core PID:18861
02:19:27:WU01:FS01:FahCore 0x22 started
02:19:28:WARNING:WU01:FS01:FahCore returned: CORE_OUTDATED (110 = 0x6e)
02:19:28:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
02:19:28:WU01:FS01:Connecting to cores.foldingathome.org:80
02:19:29:WU01:FS01:FahCore 22: Downloading 3.59MiB
02:19:29:WU01:FS01:FahCore 22: Download complete
02:19:29:WU01:FS01:Valid core signature
02:19:29:WU01:FS01:Unpacked 9.32MiB to cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22
02:19:30:WU01:FS01:Starting
02:19:30:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 982 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -tmax=80 -twait=900
02:19:30:WU01:FS01:Started FahCore on PID 18864
02:19:30:WU01:FS01:Core PID:18868
02:19:30:WU01:FS01:FahCore 0x22 started
02:19:30:WU01:FS01:0x22:*********************** Log Started 2020-05-15T02:19:30Z ***********************
02:19:30:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:19:30:WU01:FS01:0x22:       Type: 0x22
02:19:30:WU01:FS01:0x22:       Core: Core22
02:19:30:WU01:FS01:0x22:    Website: https://foldingathome.org/
02:19:30:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:19:30:WU01:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
02:19:30:WU01:FS01:0x22:             <[email protected]>
02:19:30:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 18864 -checkpoint 15
02:19:30:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
02:19:30:WU01:FS01:0x22:             0 -gpu 0 -tmax=80 -twait=900
02:19:30:WU01:FS01:0x22:     Config: <none>
02:19:30:WU01:FS01:0x22:************************************ Build *************************************
02:19:30:WU01:FS01:0x22:    Version: 0.0.5
02:19:30:WU01:FS01:0x22:       Date: Apr 22 2020
02:19:30:WU01:FS01:0x22:       Time: 03:57:11
02:19:30:WU01:FS01:0x22: Repository: Git
02:19:30:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:19:30:WU01:FS01:0x22:     Branch: HEAD
02:19:30:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:19:30:WU01:FS01:0x22:    Options: -std=c++11 -O3 -funroll-loops
02:19:30:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:19:30:WU01:FS01:0x22:       Bits: 64
02:19:30:WU01:FS01:0x22:       Mode: Release
02:19:30:WU01:FS01:0x22:************************************ System ************************************
02:19:30:WU01:FS01:0x22:        CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
02:19:30:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 15 Model 67 Stepping 3
02:19:30:WU01:FS01:0x22:       CPUs: 2
02:19:30:WU01:FS01:0x22:     Memory: 7.77GiB
02:19:30:WU01:FS01:0x22:Free Memory: 3.36GiB
02:19:30:WU01:FS01:0x22:    Threads: POSIX_THREADS
02:19:30:WU01:FS01:0x22: OS Version: 5.4
02:19:30:WU01:FS01:0x22:Has Battery: false
02:19:30:WU01:FS01:0x22: On Battery: false
02:19:30:WU01:FS01:0x22: UTC Offset: -5
02:19:30:WU01:FS01:0x22:        PID: 18868
02:19:30:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
02:19:30:WU01:FS01:0x22:         OS: Linux 5.4.0-29-generic x86_64
02:19:30:WU01:FS01:0x22:    OS Arch: AMD64
02:19:30:WU01:FS01:0x22:********************************************************************************
02:19:30:WU01:FS01:0x22:Project: 14251 (Run 114, Clone 0, Gen 11)
02:19:30:WU01:FS01:0x22:Unit: 0x00000013cedfaa925eab0323e1367802
02:19:30:WU01:FS01:0x22:Reading tar file core.xml
02:19:30:WU01:FS01:0x22:Reading tar file integrator.xml
02:19:30:WU01:FS01:0x22:Reading tar file state.xml
02:19:32:WU01:FS01:0x22:Reading tar file system.xml
02:19:33:WU01:FS01:0x22:Digital signatures verified
02:19:33:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:19:33:WU01:FS01:0x22:Version 0.0.5
02:21:38:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
02:21:38:WU02:FS01:Connecting to 3.133.76.19:80
02:21:45:WU02:FS01:Upload 0.06%
02:22:35:WU01:FS01:0x22:Completed 0 out of 500000 steps (0%)
02:22:36:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:22:43:WU02:FS01:Upload 0.12%
02:22:44:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
02:22:44:WU02:FS01:Trying to send results to collection server
02:22:44:WU02:FS01:Uploading 104.38MiB to 3.21.157.11
02:22:44:WU02:FS01:Connecting to 3.21.157.11:8080
02:22:50:WU02:FS01:Upload 11.50%
02:22:56:WU02:FS01:Upload 22.33%
02:23:02:WU02:FS01:Upload 32.33%
02:23:08:WU02:FS01:Upload 42.21%
02:23:14:WU02:FS01:Upload 52.57%
02:23:20:WU02:FS01:Upload 63.41%
02:23:26:WU02:FS01:Upload 75.14%
02:23:32:WU02:FS01:Upload 86.40%
02:23:38:WU02:FS01:Upload 98.97%
02:23:39:WU02:FS01:Upload complete
02:23:39:WU02:FS01:Server responded WORK_QUIT (404)
02:23:39:WARNING:WU02:FS01:Server did not like results, dumping
02:23:39:WU02:FS01:Cleaning up
Thoughts?

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 4:29 am
by bruce
No. The CORE_OUTDATED event had nothing to do with the server's rejection of the WU.

You're talking about project:16434 run:232 clone:3 gen:13 which was also identified as WU02:FS01. It was rejected by the Collection Server.

Perhaps it had expired You would have to look back in the previous logs to find when that WU was assigned and downloaded.

I happen to have personal experience with a GT710 which is a very slow GPU. It is very difficult to complete assignments before they expire.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 4:33 am
by legalalien
The WU was definitely completed before the deadline; I've been watching it very carefully.

ETA: Yes, I realize that the core was updated when the next WU started; I was wondering if core restarting in the middle of the completed unit submission had some effect on the submission itself.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 5:05 am
by bruce
When a WU is not returned quickly, duplicates are issued. Assuming your result was error-free, it should still have been accepted, even if a duplicate was completed before you uploaded your result. I'll have to check to see if that happened.

According to the public records, Two people returned it (not counting you). One had an error and the other completed it successfully.

Your WU as rejected at 02:23:39 ... presumably that on 2020-05-15
2020-05-15T02:23:39

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 5:16 am
by bruce
The successful WU as issued at 2020-05-09T06:28:32 and returned at 2020-05-09T18:18:46 and was credited at 2020-05-09T18:25:42
Your WU as rejected at 2020-05-15T02:23:39
Unfortunately I can't tell when your WU was assigned to you. Can you find that date/time?

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 1:21 pm
by legalalien
bruce wrote:The successful WU as issued at 2020-05-09T06:28:32 and returned at 2020-05-09T18:18:46 and was credited at 2020-05-09T18:25:42
Your WU as rejected at 2020-05-15T02:23:39
Unfortunately I can't tell when your WU was assigned to you. Can you find that date/time?
Back on May 8, it seems:

Code: Select all

******************************* Date: 2020-05-08 *******************************
18:37:35:WU01:FS01:0x22:Completed 960000 out of 1000000 steps (96%)
19:27:11:WU01:FS01:0x22:Completed 970000 out of 1000000 steps (97%)
20:16:56:WU01:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
20:18:44:WU00:FS00:0xa7:Completed 975000 out of 1250000 steps (78%)
21:05:50:WU01:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
21:05:50:WU02:FS01:Connecting to 65.254.110.245:80
[93m21:05:50:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration[0m
21:05:50:WU02:FS01:Connecting to 18.218.241.186:80
21:05:51:WU02:FS01:Assigned to work server 3.133.76.19
21:05:51:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK208 [GeForce GT 710 LP] from 3.133.76.19
21:05:51:WU02:FS01:Connecting to 3.133.76.19:8080
21:07:08:WU02:FS01:Downloading 67.19MiB
21:07:14:WU02:FS01:Download 60.00%
21:07:17:WU02:FS01:Download complete
21:07:17:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:16434 run:232 clone:3 gen:13 core:0x22 unit:0x0000001503854c135e9cbacc8fa1f2ea
Like you said earlier, GT 710 is a slow card by today's standards, but I was watching the progress very closely and the unit finished about 20 hours ahead of the expiration deadline (taking into account that the times are in UTC). It usually doesn't cut it *that* close, but indeed struggles to complete units before the "timeout".

<off-topic>
According to the public records, Two people returned it (not counting you). One had an error and the other completed it successfully.
This is an old Linux box that I had resurrected to fold, with the idea of it sitting quietly in a corner and not being used for anything else other than looking for a possible cure. Based on the above, even if my client returns complete WUs, it is likely that someone else has already finished the same WU. I'll have to rethink whether it's a rational use of electricity, as much as I would like to help.

I'm sure this question has been asked before .. will go do some soul-googling with the morning coffee.
</off-topic>

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri May 15, 2020 4:58 pm
by _r2w_ben
legalalien wrote:This is an old Linux box that I had resurrected to fold, with the idea of it sitting quietly in a corner and not being used for anything else other than looking for a possible cure. Based on the above, even if my client returns complete WUs, it is likely that someone else has already finished the same WU. I'll have to rethink whether it's a rational use of electricity, as much as I would like to help.
A lot of COVID19 projects run on the CPU so you can still contribute. Removing the GPU slot would let the CPU slot use both cores.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Wed Jun 17, 2020 9:04 pm
by bruce
I have a veriety of GPUs, including one GT710. (It's the only GPU that will fit in that slot.) FAH is doing some WU/GPU optimization studies and they are distributing some small proteins to slow GPUs. (They haven't said anything about the proposed deadlines, though.) I hope they do manage to send my GT710 WUs where it can do some good and avoid burdening it with big WU where it can't help.) I'll be very happy to have the GPU idle a lot of the time, just displaying my desktop the rest of the time.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Fri Jun 19, 2020 7:24 pm
by JohnChodera
Oh no! I'm so sorry this happened.

> I was wondering if core restarting in the middle of the completed unit submission had some effect on the submission itself.

I don't think this should impact things.

We're about to roll out a bunch of COVID Moonshot projects that work well for older GPUs and run in just a few hours. Hopefully this will help---we very much value these contributions!

~ John Chodera // MSKCC

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Tue Sep 29, 2020 11:19 pm
by legalalien
JohnChodera wrote:Oh no! I'm so sorry this happened.

We're about to roll out a bunch of COVID Moonshot projects that work well for older GPUs and run in just a few hours. Hopefully this will help---we very much value these contributions!

~ John Chodera // MSKCC
Revisiting the old subject, in case someone runs into this thread...not sure if the new (smaller) projects have been released, or if the new '13' core with support for CUDA is making a difference, but now my old GT710 is cranking through WUs in 3 hours or less, well before the timeout deadline. :D

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Posted: Tue Sep 29, 2020 11:53 pm
by PantherX
The CUDA optimizations on Nvidia GPUs does provide a decent speed-up from anywhere of ~15% to 100% depending on the simulation type. Traditional simulations are towards the ~15% improvement while the free energy calculations (primarily use in Moonshot Projects) are towards the 100% range. For more details, you can read the blog post: https://foldingathome.org/2020/09/28/fo ... a-support/