Page 2 of 2
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Tue Jun 30, 2020 4:49 pm
by bruce
@JimF it could be that all the eariler crash reporst have paid off. I understand the FAHCore V 0.0.11 has started being distributed and maybe you got a copy.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Tue Jun 30, 2020 5:46 pm
by JimF
I am still on Version 0.0.10 at the moment, so the new one should help further.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Tue Jun 30, 2020 5:50 pm
by Joe_H
I believe 0.0.11 is still in beta - viewtopic.php?f=66&t=35692, it should get pushed out to all if this version becomes a release version. Otherwise it might be 0.0.12 or later. JohnChodera will announce the status if it changes.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Tue Jun 30, 2020 9:34 pm
by Crawdaddy79
JimF wrote:I have not had a P13415 error (of any kind) in almost two days on my two RX 570s, one on Win7 and the other on Ubuntu 18.04 (running 24/7).
It looks like they have found the major problems.
EDIT: Misread your post initially - my brain excluded the word "error". That one word makes my initial post irrelevant. Good day to you.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 3:23 am
by JohnChodera
All: Thanks for the reports here. 13415 is quite small, with a short processing time, but necessary for predicting binding affinities for potential therapies for the COVID Moonshot (
http://postera.ai/covid).
We simulate each one of these RUNs for quality control checks before deploying them to Folding@home, but due to either issues we don't yet understand with the system setup, some fraction of both 13414 and 13415 have RUNs that eventually fail partway through the calculation. Because 13415 is short, you're likely to see more failures. We've been working hard to reduce failure rates with each new pair of 134xx projects for (Moonshot sprints), and we do analyze all the failures that get uploaded back to the servers to systematically reduce failure rates.
Thanks so much for bearing with us!
~ John Chodera // MSKCC
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 3:49 am
by JohnChodera
Due to the issues with 13415, I've also increased the points by 10% to help make up for the failure rate. We have a few more thousand RUNs (of 100 CLONEs each) we need to make it through in the next couple of days to get data.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 5:38 am
by aetch
Not a failure but probably notable.
I had 2 units for 13415 go through my laptop at the same time with wildly different folding times and different gpu loads. My laptop has dual GTX 965M gpus.
Project: 13415 (Run 2919, Clone 43, Gen 1) tpf ~50 seconds (the run time on this is a bit skewed as my laptop was down for about an hour while I did some work on it). gpu load ~60%
Project: 13415 (Run 3415, Clone 29, Gen 1) tpf ~2 minutes 30 seconds (this one still has 1 hour to run). gpu load ~40%
When run 2919 finished I moved 3415 over to the other gpu in case the gpu was the issue. It still ran at gpu load ~40% and tpf ~2m 30s
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 5:45 am
by bruce
Project 13415 is not a uniform project where all WUs are identical. Each run is unique and results are expected to vary.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 6:15 am
by aetch
I wouldn't have thought that one unit running at 3 times the speed of the other, for the same project on the same hardware, was an acceptable variation.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 7:00 am
by JohnChodera
> Project 13415 is not a uniform project where all WUs are identical. Each run is unique and results are expected to vary.
What's weird here is that we built all the RUNs with the same number of atoms, but for reasons we still don't quite understand, the time is indeed highly variable from RUN to RUN. We're still investigating and hope to improve this in the next 134xx project set.
~ John Chodera // MSKCC
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 8:05 am
by aetch
I suppose I'd better give you my logs. It seems run 3415 got to the end of its run, tried to upload to one server, got a short response, tried to upload to another server (which is current showing at capacity on the server stats page) and the project has now been dumped. I've looked up the WU status page, I don't appear there so I've no idea what happened to the unit.
The log is a mess
FS00 is not relevant
Run 3415 started on FS02 and I moved it to FS01 part way through, mainly to check if it was the unit or the gpu at issue.
Code: Select all
*********************** Log Started 2020-07-01T00:31:29Z ***********************
00:31:29:Trying to access database...
00:31:29:Successfully acquired database lock
00:31:29:Read GPUs.txt
00:31:29:Enabled folding slot 01: PAUSED gpu:0:GM204 [GeForce GTX 965M] 1945 (by user)
00:31:29:****************************** FAHClient ******************************
00:31:29: Version: 7.6.13
00:31:29: Author: Joseph Coffland <[email protected]>
00:31:29: Copyright: 2020 foldingathome.org
00:31:29: Homepage: https://foldingathome.org/
00:31:29: Date: Apr 27 2020
00:31:29: Time: 21:21:01
00:31:29: Revision: 5a652817f46116b6e135503af97f18e094414e3b
00:31:29: Branch: master
00:31:29: Compiler: Visual C++ 2008
00:31:29: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:31:29: Platform: win32 10
00:31:29: Bits: 32
00:31:29: Mode: Release
00:31:29: Args: --open-web-control
00:31:29: Config: C:\Users\Helen\AppData\Roaming\FAHClient\config.xml
00:31:29:******************************** CBang ********************************
00:31:29: Date: Apr 24 2020
00:31:29: Time: 17:07:55
00:31:29: Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
00:31:29: Branch: master
00:31:29: Compiler: Visual C++ 2008
00:31:29: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:31:29: Platform: win32 10
00:31:29: Bits: 32
00:31:29: Mode: Release
00:31:29:******************************* System ********************************
00:31:29: CPU: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
00:31:29: CPU ID: GenuineIntel Family 6 Model 71 Stepping 1
00:31:29: CPUs: 8
00:31:29: Memory: 15.96GiB
00:31:29: Free Memory: 12.62GiB
00:31:29: Threads: WINDOWS_THREADS
00:31:29: OS Version: 6.2
00:31:29: Has Battery: true
00:31:29: On Battery: false
00:31:29: UTC Offset: 1
00:31:29: PID: 13444
00:31:29: CWD: C:\Users\Helen\AppData\Roaming\FAHClient
00:31:29: Win32 Service: false
00:31:29: OS: Windows 10 Enterprise
00:31:29: OS Arch: AMD64
00:31:29: GPUs: 2
00:31:29: GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 965M] 1945
00:31:29: GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 965M] 1945
00:31:29: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:5.2 Driver:10.2
00:31:29: CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:5.2 Driver:10.2
00:31:29:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:442.19
00:31:29:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:442.19
00:31:29:******************************* libFAH ********************************
00:31:29: Date: Apr 15 2020
00:31:29: Time: 14:53:14
00:31:29: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
00:31:29: Branch: master
00:31:29: Compiler: Visual C++ 2008
00:31:29: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:31:29: Platform: win32 10
00:31:29: Bits: 32
00:31:29: Mode: Release
00:31:29:***********************************************************************
Multiple Configs removed
00:33:31:<config>
00:33:31: <!-- Configuration -->
00:33:31: <config-rotate-max v='0'/>
00:33:31:
00:33:31: <!-- HTTP Server -->
00:33:31: <allow v='127.0.0.1 192.168.0.0/24'/>
00:33:31:
00:33:31: <!-- Logging -->
00:33:31: <log-rotate-max v='0'/>
00:33:31:
00:33:31: <!-- Network -->
00:33:31: <proxy v=':8080'/>
00:33:31:
00:33:31: <!-- Remote Command Server -->
00:33:31: <command-allow-no-pass v='127.0.0.1 192.168.0.0/24'/>
00:33:31:
00:33:31: <!-- Slot Control -->
00:33:31: <pause-on-start v='true'/>
00:33:31: <power v='full'/>
00:33:31:
00:33:31: <!-- User Information -->
00:33:31: <passkey v='*****'/>
00:33:31: <user v='Aetch'/>
00:33:31:
00:33:31: <!-- Folding Slots -->
00:33:31: <slot id='0' type='CPU'>
00:33:31: <cpus v='4'/>
00:33:31: </slot>
00:33:31: <slot id='1' type='GPU'/>
00:33:31: <slot id='2' type='GPU'/>
00:33:31:</config>
FS00 Removed
00:51:03:FS01:Unpaused
00:51:03:WU01:FS01:Connecting to assign1.foldingathome.org:80
00:51:03:WU01:FS01:Assigned to work server 18.188.125.154
00:51:03:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM204 [GeForce GTX 965M] 1945 from 18.188.125.154
00:51:03:WU01:FS01:Connecting to 18.188.125.154:8080
00:51:04:WU01:FS01:Downloading 432.52KiB
00:51:04:WU01:FS01:Download complete
00:51:04:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13415 run:2919 clone:43 gen:1 core:0x22 unit:0x0000000112bc7d9a5ef1ae49511d2a88
Core_22 Update
00:51:59:WU01:FS01:Starting
00:51:59:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Helen\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 13444 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
00:51:59:WU01:FS01:Started FahCore on PID 9036
00:52:00:WU01:FS01:Core PID:9244
00:52:00:WU01:FS01:FahCore 0x22 started
00:52:00:WU01:FS01:0x22:*********************** Log Started 2020-07-01T00:52:00Z ***********************
00:52:00:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:52:00:WU01:FS01:0x22: Core: Core22
00:52:00:WU01:FS01:0x22: Type: 0x22
00:52:00:WU01:FS01:0x22: Version: 0.0.10
00:52:00:WU01:FS01:0x22: Author: Joseph Coffland <[email protected]>
00:52:00:WU01:FS01:0x22: Copyright: 2020 foldingathome.org
00:52:00:WU01:FS01:0x22: Homepage: https://foldingathome.org/
00:52:00:WU01:FS01:0x22: Date: Jun 16 2020
00:52:00:WU01:FS01:0x22: Time: 14:33:22
00:52:00:WU01:FS01:0x22: Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
00:52:00:WU01:FS01:0x22: Branch: core22-0.0.10
00:52:00:WU01:FS01:0x22: Compiler: Visual C++ 2015
00:52:00:WU01:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:52:00:WU01:FS01:0x22: Platform: win32 10
00:52:00:WU01:FS01:0x22: Bits: 64
00:52:00:WU01:FS01:0x22: Mode: Release
00:52:00:WU01:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
00:52:00:WU01:FS01:0x22: <[email protected]>
00:52:00:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 9036 -checkpoint 15
00:52:00:WU01:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
00:52:00:WU01:FS01:0x22: 1 -gpu 1
00:52:00:WU01:FS01:0x22:************************************ libFAH ************************************
00:52:01:WU01:FS01:0x22: Date: Jun 15 2020
00:52:01:WU01:FS01:0x22: Time: 18:05:04
00:52:01:WU01:FS01:0x22: Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
00:52:01:WU01:FS01:0x22: Branch: HEAD
00:52:01:WU01:FS01:0x22: Compiler: Visual C++ 2015
00:52:01:WU01:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:52:01:WU01:FS01:0x22: Platform: win32 10
00:52:01:WU01:FS01:0x22: Bits: 64
00:52:01:WU01:FS01:0x22: Mode: Release
00:52:01:WU01:FS01:0x22:************************************ CBang *************************************
00:52:01:WU01:FS01:0x22: Date: Jun 16 2020
00:52:01:WU01:FS01:0x22: Time: 14:31:33
00:52:01:WU01:FS01:0x22: Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
00:52:01:WU01:FS01:0x22: Branch: HEAD
00:52:01:WU01:FS01:0x22: Compiler: Visual C++ 2015
00:52:01:WU01:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:52:01:WU01:FS01:0x22: Platform: win32 10
00:52:01:WU01:FS01:0x22: Bits: 64
00:52:01:WU01:FS01:0x22: Mode: Release
00:52:01:WU01:FS01:0x22:************************************ System ************************************
00:52:01:WU01:FS01:0x22: CPU: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
00:52:01:WU01:FS01:0x22: CPU ID: GenuineIntel Family 6 Model 71 Stepping 1
00:52:01:WU01:FS01:0x22: CPUs: 8
00:52:01:WU01:FS01:0x22: Memory: 15.96GiB
00:52:01:WU01:FS01:0x22:Free Memory: 12.37GiB
00:52:01:WU01:FS01:0x22: Threads: WINDOWS_THREADS
00:52:01:WU01:FS01:0x22: OS Version: 6.2
00:52:01:WU01:FS01:0x22:Has Battery: true
00:52:01:WU01:FS01:0x22: On Battery: false
00:52:01:WU01:FS01:0x22: UTC Offset: 1
00:52:01:WU01:FS01:0x22: PID: 9244
00:52:01:WU01:FS01:0x22: CWD: C:\Users\Helen\AppData\Roaming\FAHClient\work
00:52:01:WU01:FS01:0x22:********************************************************************************
00:52:01:WU01:FS01:0x22:Project: 13415 (Run 2919, Clone 43, Gen 1)
00:52:01:WU01:FS01:0x22:Unit: 0x0000000112bc7d9a5ef1ae49511d2a88
00:52:01:WU01:FS01:0x22:Reading tar file core.xml
00:52:01:WU01:FS01:0x22:Reading tar file integrator.xml
00:52:01:WU01:FS01:0x22:Reading tar file state.xml
00:52:01:WU01:FS01:0x22:Reading tar file system.xml
00:52:01:WU01:FS01:0x22:Digital signatures verified
00:52:01:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:52:01:WU01:FS01:0x22:Version 0.0.10
00:52:01:WU01:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
00:52:01:WU01:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
00:52:01:WU01:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
00:52:01:WU01:FS01:0x22: Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
00:52:09:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
00:52:57:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
00:52:59:WU00:FS00:0xa7:Completed 35000 out of 250000 steps (14%)
00:53:45:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
00:54:01:WU00:FS00:0xa7:Completed 37500 out of 250000 steps (15%)
00:54:33:WU01:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
02:06:27:FS02:Unpaused
02:06:27:WU02:FS02:Connecting to assign1.foldingathome.org:80
02:06:28:WU02:FS02:Assigned to work server 18.188.125.154
02:06:28:WU02:FS02:Requesting new work unit for slot 02: READY gpu:1:GM204 [GeForce GTX 965M] 1945 from 18.188.125.154
02:06:28:WU02:FS02:Connecting to 18.188.125.154:8080
02:06:30:WU02:FS02:Downloading 435.27KiB
02:06:30:WU02:FS02:Download complete
02:06:30:WU02:FS02:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13415 run:3415 clone:29 gen:1 core:0x22 unit:0x0000000712bc7d9a5ef50d2dc42cc1c3
02:06:30:WU02:FS02:Starting
02:06:30:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Helen\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 706 -lifeline 13144 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
02:06:30:WU02:FS02:Started FahCore on PID 9740
02:06:30:WU02:FS02:Core PID:8788
02:06:30:WU02:FS02:FahCore 0x22 started
02:06:30:WU02:FS02:0x22:*********************** Log Started 2020-07-01T02:06:30Z ***********************
02:06:30:WU02:FS02:0x22:*************************** Core22 Folding@home Core ***************************
02:06:30:WU02:FS02:0x22: Core: Core22
02:06:30:WU02:FS02:0x22: Type: 0x22
02:06:30:WU02:FS02:0x22: Version: 0.0.10
02:06:30:WU02:FS02:0x22: Author: Joseph Coffland <[email protected]>
02:06:30:WU02:FS02:0x22: Copyright: 2020 foldingathome.org
02:06:30:WU02:FS02:0x22: Homepage: https://foldingathome.org/
02:06:30:WU02:FS02:0x22: Date: Jun 16 2020
02:06:30:WU02:FS02:0x22: Time: 14:33:22
02:06:30:WU02:FS02:0x22: Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
02:06:30:WU02:FS02:0x22: Branch: core22-0.0.10
02:06:30:WU02:FS02:0x22: Compiler: Visual C++ 2015
02:06:30:WU02:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:06:30:WU02:FS02:0x22: Platform: win32 10
02:06:30:WU02:FS02:0x22: Bits: 64
02:06:30:WU02:FS02:0x22: Mode: Release
02:06:30:WU02:FS02:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
02:06:30:WU02:FS02:0x22: <[email protected]>
02:06:30:WU02:FS02:0x22: Args: -dir 02 -suffix 01 -version 706 -lifeline 9740 -checkpoint 15
02:06:30:WU02:FS02:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
02:06:30:WU02:FS02:0x22: 0 -gpu 0
02:06:30:WU02:FS02:0x22:************************************ libFAH ************************************
02:06:30:WU02:FS02:0x22: Date: Jun 15 2020
02:06:30:WU02:FS02:0x22: Time: 18:05:04
02:06:30:WU02:FS02:0x22: Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
02:06:30:WU02:FS02:0x22: Branch: HEAD
02:06:30:WU02:FS02:0x22: Compiler: Visual C++ 2015
02:06:30:WU02:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:06:30:WU02:FS02:0x22: Platform: win32 10
02:06:30:WU02:FS02:0x22: Bits: 64
02:06:30:WU02:FS02:0x22: Mode: Release
02:06:30:WU02:FS02:0x22:************************************ CBang *************************************
02:06:30:WU02:FS02:0x22: Date: Jun 16 2020
02:06:30:WU02:FS02:0x22: Time: 14:31:33
02:06:30:WU02:FS02:0x22: Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
02:06:30:WU02:FS02:0x22: Branch: HEAD
02:06:30:WU02:FS02:0x22: Compiler: Visual C++ 2015
02:06:30:WU02:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:06:30:WU02:FS02:0x22: Platform: win32 10
02:06:30:WU02:FS02:0x22: Bits: 64
02:06:30:WU02:FS02:0x22: Mode: Release
02:06:30:WU02:FS02:0x22:************************************ System ************************************
02:06:30:WU02:FS02:0x22: CPU: Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
02:06:30:WU02:FS02:0x22: CPU ID: GenuineIntel Family 6 Model 71 Stepping 1
02:06:30:WU02:FS02:0x22: CPUs: 8
02:06:30:WU02:FS02:0x22: Memory: 15.96GiB
02:06:30:WU02:FS02:0x22:Free Memory: 12.91GiB
02:06:30:WU02:FS02:0x22: Threads: WINDOWS_THREADS
02:06:30:WU02:FS02:0x22: OS Version: 6.2
02:06:30:WU02:FS02:0x22:Has Battery: true
02:06:30:WU02:FS02:0x22: On Battery: false
02:06:30:WU02:FS02:0x22: UTC Offset: 1
02:06:30:WU02:FS02:0x22: PID: 8788
02:06:30:WU02:FS02:0x22: CWD: C:\Users\Helen\AppData\Roaming\FAHClient\work
02:06:30:WU02:FS02:0x22:********************************************************************************
02:06:30:WU02:FS02:0x22:Project: 13415 (Run 3415, Clone 29, Gen 1)
02:06:30:WU02:FS02:0x22:Unit: 0x0000000712bc7d9a5ef50d2dc42cc1c3
02:06:30:WU02:FS02:0x22:Reading tar file core.xml
02:06:30:WU02:FS02:0x22:Reading tar file integrator.xml
02:06:30:WU02:FS02:0x22:Reading tar file state.xml
02:06:30:WU02:FS02:0x22:Reading tar file system.xml
02:06:30:WU02:FS02:0x22:Digital signatures verified
02:06:30:WU02:FS02:0x22:Folding@home GPU Core22 Folding@home Core
02:06:30:WU02:FS02:0x22:Version 0.0.10
02:06:30:WU02:FS02:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
02:06:30:WU02:FS02:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
02:06:30:WU02:FS02:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
02:06:30:WU02:FS02:0x22: Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
02:06:37:WU02:FS02:0x22:Completed 0 out of 1000000 steps (0%)
03:25:30:WU01:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
03:25:30:WU01:FS01:0x22:Average performance: 307.473 ns/day
03:25:30:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
03:25:30:WU01:FS01:0x22:Saving result file checkpointState.xml
03:25:30:WU01:FS01:0x22:Saving result file globals.csv
03:25:30:WU01:FS01:0x22:Saving result file positions.xtc
03:25:30:WU01:FS01:0x22:Saving result file science.log
03:25:30:WU01:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
03:25:31:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
03:25:31:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13415 run:2919 clone:43 gen:1 core:0x22 unit:0x0000000112bc7d9a5ef1ae49511d2a88
03:25:31:WU01:FS01:Uploading 550.40KiB to 18.188.125.154
03:25:31:WU01:FS01:Connecting to 18.188.125.154:8080
03:25:32:WU01:FS01:Upload complete
03:25:32:WU01:FS01:Server responded WORK_ACK (400)
03:25:32:WU01:FS01:Final credit estimate, 9348.00 points
03:25:32:WU01:FS01:Cleaning up
I removed FS02 to migrate run 3415 to FS01 and added FS02 back in. The run moved but it didn't perform any better on FS01 than it had on FS02.
Code: Select all
06:15:44:WU02:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
06:17:00:WU01:FS02:0x22:Completed 680000 out of 2000000 steps (34%)
06:18:01:WU02:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
06:18:40:WU00:FS00:0xa7:Completed 55000 out of 250000 steps (22%)
06:20:26:WU02:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
06:20:26:WU02:FS01:0x22:Average performance: 122.207 ns/day
06:20:26:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
06:20:26:WU02:FS01:0x22:Saving result file checkpointState.xml
06:20:26:WU02:FS01:0x22:Saving result file globals.csv
06:20:26:WU02:FS01:0x22:Saving result file positions.xtc
06:20:26:WU02:FS01:0x22:Saving result file science.log
06:20:26:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
06:20:26:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
06:20:26:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13415 run:3415 clone:29 gen:1 core:0x22 unit:0x0000000712bc7d9a5ef50d2dc42cc1c3
06:20:26:WU02:FS01:Uploading 550.63KiB to 18.188.125.154
06:20:26:WU02:FS01:Connecting to 18.188.125.154:8080
06:20:34:WU02:FS01:Upload 11.62%
06:21:27:WU01:FS02:0x22:Completed 700000 out of 2000000 steps (35%)
06:22:01:WU00:FS00:0xa7:Completed 57500 out of 250000 steps (23%)
06:24:18:WU02:FS01:Upload 69.74%
06:25:07:WU00:FS00:0xa7:Completed 60000 out of 250000 steps (24%)
06:25:55:WU01:FS02:0x22:Completed 720000 out of 2000000 steps (36%)
06:27:42:WARNING:WU02:FS01:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
06:27:42:WU02:FS01:Trying to send results to collection server
06:27:42:WU02:FS01:Uploading 550.63KiB to 3.21.157.11
06:27:42:WU02:FS01:Connecting to 3.21.157.11:8080
06:27:48:WU02:FS01:Upload complete
06:27:48:WU02:FS01:Server responded WORK_QUIT (404)
06:27:48:WARNING:WU02:FS01:Server did not like results, dumping
06:27:48:WU02:FS01:Cleaning up
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Wed Jul 01, 2020 11:57 am
by Crawdaddy79
After many days of 13415, got my first failure today.
https://apps.foldingathome.org/wu?p=134 ... ne=1&gen=0
Also will edit my above post because I misread the statement that I quoted.
Log (Summary - particle coordinate is nan)
Code: Select all
07:35:21:WU02:FS01:FahCore 0x22 started
07:35:22:WU02:FS01:0x22:*********************** Log Started 2020-07-01T07:35:21Z ***********************
07:35:22:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:35:22:WU02:FS01:0x22: Core: Core22
07:35:22:WU02:FS01:0x22: Type: 0x22
07:35:22:WU02:FS01:0x22: Version: 0.0.11
07:35:22:WU02:FS01:0x22: Author: Joseph Coffland <[email protected]>
07:35:22:WU02:FS01:0x22: Copyright: 2020 foldingathome.org
07:35:22:WU02:FS01:0x22: Homepage: https://foldingathome.org/
07:35:22:WU02:FS01:0x22: Date: Jun 26 2020
07:35:22:WU02:FS01:0x22: Time: 19:49:16
07:35:22:WU02:FS01:0x22: Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
07:35:22:WU02:FS01:0x22: Branch: core22-0.0.11
07:35:22:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:35:22:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:35:22:WU02:FS01:0x22: Platform: win32 10
07:35:22:WU02:FS01:0x22: Bits: 64
07:35:22:WU02:FS01:0x22: Mode: Release
07:35:22:WU02:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
07:35:22:WU02:FS01:0x22: <[email protected]>
07:35:22:WU02:FS01:0x22: Args: -dir 02 -suffix 01 -version 706 -lifeline 5608 -checkpoint 15
07:35:22:WU02:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
07:35:22:WU02:FS01:0x22:************************************ libFAH ************************************
07:35:22:WU02:FS01:0x22: Date: Jun 26 2020
07:35:22:WU02:FS01:0x22: Time: 19:47:12
07:35:22:WU02:FS01:0x22: Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
07:35:22:WU02:FS01:0x22: Branch: HEAD
07:35:22:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:35:22:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:35:22:WU02:FS01:0x22: Platform: win32 10
07:35:22:WU02:FS01:0x22: Bits: 64
07:35:22:WU02:FS01:0x22: Mode: Release
07:35:22:WU02:FS01:0x22:************************************ CBang *************************************
07:35:22:WU02:FS01:0x22: Date: Jun 26 2020
07:35:22:WU02:FS01:0x22: Time: 19:46:11
07:35:22:WU02:FS01:0x22: Revision: f8529962055b0e7bde23e429f5072ff758089dee
07:35:22:WU02:FS01:0x22: Branch: master
07:35:22:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:35:22:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:35:22:WU02:FS01:0x22: Platform: win32 10
07:35:22:WU02:FS01:0x22: Bits: 64
07:35:22:WU02:FS01:0x22: Mode: Release
07:35:22:WU02:FS01:0x22:************************************ System ************************************
07:35:22:WU02:FS01:0x22: CPU: AMD Ryzen 7 2700X Eight-Core Processor
07:35:22:WU02:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
07:35:22:WU02:FS01:0x22: CPUs: 16
07:35:22:WU02:FS01:0x22: Memory: 31.95GiB
07:35:22:WU02:FS01:0x22:Free Memory: 22.81GiB
07:35:22:WU02:FS01:0x22: Threads: WINDOWS_THREADS
07:35:22:WU02:FS01:0x22: OS Version: 6.2
07:35:22:WU02:FS01:0x22:Has Battery: false
07:35:22:WU02:FS01:0x22: On Battery: false
07:35:22:WU02:FS01:0x22: UTC Offset: -4
07:35:22:WU02:FS01:0x22: PID: 2868
07:35:22:WU02:FS01:0x22: CWD: G:\FoldingData\work
07:35:22:WU02:FS01:0x22:********************************************************************************
07:35:22:WU02:FS01:0x22:Project: 13415 (Run 4900, Clone 1, Gen 0)
07:35:22:WU02:FS01:0x22:Unit: 0x0000000412bc7d9a5efc076d16ed8cc2
07:35:22:WU02:FS01:0x22:Reading tar file core.xml
07:35:22:WU02:FS01:0x22:Reading tar file integrator.xml
07:35:22:WU02:FS01:0x22:Reading tar file state.xml
07:35:22:WU02:FS01:0x22:Reading tar file system.xml
07:35:22:WU02:FS01:0x22:Digital signatures verified
07:35:22:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:35:22:WU02:FS01:0x22:Version 0.0.11
07:35:22:WU02:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
07:35:22:WU02:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
07:35:22:WU02:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
07:35:22:WU02:FS01:0x22: Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
07:35:24:WU01:FS01:Upload complete
07:35:24:WU01:FS01:Server responded WORK_ACK (400)
07:35:24:WU01:FS01:Final credit estimate, 53038.00 points
07:35:24:WU01:FS01:Cleaning up
07:35:30:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
07:36:49:WU02:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
07:38:06:WU02:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
07:39:07:WU00:FS00:0xa7:Completed 390000 out of 500000 steps (78%)
07:39:24:WU02:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
07:40:10:WU02:FS01:0x22:An exception occurred at step 35892: Particle coordinate is nan
07:40:10:WU02:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
07:40:10:WU02:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
07:40:10:WARNING:WU02:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
07:40:11:WU02:FS01:Starting
07:40:11:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" G:\FoldingData\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 706 -lifeline 10216 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
07:40:11:WU02:FS01:Started FahCore on PID 9064
07:40:11:WU02:FS01:Core PID:14384
07:40:11:WU02:FS01:FahCore 0x22 started
07:40:11:WU02:FS01:0x22:*********************** Log Started 2020-07-01T07:40:11Z ***********************
07:40:11:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:40:11:WU02:FS01:0x22: Core: Core22
07:40:11:WU02:FS01:0x22: Type: 0x22
07:40:11:WU02:FS01:0x22: Version: 0.0.11
07:40:11:WU02:FS01:0x22: Author: Joseph Coffland <[email protected]>
07:40:11:WU02:FS01:0x22: Copyright: 2020 foldingathome.org
07:40:11:WU02:FS01:0x22: Homepage: https://foldingathome.org/
07:40:11:WU02:FS01:0x22: Date: Jun 26 2020
07:40:11:WU02:FS01:0x22: Time: 19:49:16
07:40:11:WU02:FS01:0x22: Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
07:40:11:WU02:FS01:0x22: Branch: core22-0.0.11
07:40:11:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:40:11:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:40:11:WU02:FS01:0x22: Platform: win32 10
07:40:11:WU02:FS01:0x22: Bits: 64
07:40:11:WU02:FS01:0x22: Mode: Release
07:40:11:WU02:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
07:40:11:WU02:FS01:0x22: <[email protected]>
07:40:11:WU02:FS01:0x22: Args: -dir 02 -suffix 01 -version 706 -lifeline 9064 -checkpoint 15
07:40:11:WU02:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
07:40:11:WU02:FS01:0x22:************************************ libFAH ************************************
07:40:11:WU02:FS01:0x22: Date: Jun 26 2020
07:40:11:WU02:FS01:0x22: Time: 19:47:12
07:40:11:WU02:FS01:0x22: Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
07:40:11:WU02:FS01:0x22: Branch: HEAD
07:40:11:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:40:11:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:40:11:WU02:FS01:0x22: Platform: win32 10
07:40:11:WU02:FS01:0x22: Bits: 64
07:40:11:WU02:FS01:0x22: Mode: Release
07:40:11:WU02:FS01:0x22:************************************ CBang *************************************
07:40:11:WU02:FS01:0x22: Date: Jun 26 2020
07:40:11:WU02:FS01:0x22: Time: 19:46:11
07:40:11:WU02:FS01:0x22: Revision: f8529962055b0e7bde23e429f5072ff758089dee
07:40:11:WU02:FS01:0x22: Branch: master
07:40:11:WU02:FS01:0x22: Compiler: Visual C++ 2015
07:40:11:WU02:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
07:40:11:WU02:FS01:0x22: Platform: win32 10
07:40:11:WU02:FS01:0x22: Bits: 64
07:40:11:WU02:FS01:0x22: Mode: Release
07:40:11:WU02:FS01:0x22:************************************ System ************************************
07:40:11:WU02:FS01:0x22: CPU: AMD Ryzen 7 2700X Eight-Core Processor
07:40:11:WU02:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
07:40:11:WU02:FS01:0x22: CPUs: 16
07:40:11:WU02:FS01:0x22: Memory: 31.95GiB
07:40:11:WU02:FS01:0x22:Free Memory: 22.83GiB
07:40:11:WU02:FS01:0x22: Threads: WINDOWS_THREADS
07:40:11:WU02:FS01:0x22: OS Version: 6.2
07:40:11:WU02:FS01:0x22:Has Battery: false
07:40:11:WU02:FS01:0x22: On Battery: false
07:40:11:WU02:FS01:0x22: UTC Offset: -4
07:40:11:WU02:FS01:0x22: PID: 14384
07:40:11:WU02:FS01:0x22: CWD: G:\FoldingData\work
07:40:11:WU02:FS01:0x22:********************************************************************************
07:40:11:WU02:FS01:0x22:Project: 13415 (Run 4900, Clone 1, Gen 0)
07:40:11:WU02:FS01:0x22:Unit: 0x0000000412bc7d9a5efc076d16ed8cc2
07:40:11:WU02:FS01:0x22:Digital signatures verified
07:40:11:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:40:11:WU02:FS01:0x22:Version 0.0.11
07:40:11:WU02:FS01:0x22: Checkpoint write interval: 50000 steps (5%) [20 total]
07:40:11:WU02:FS01:0x22: JSON viewer frame write interval: 10000 steps (1%) [100 total]
07:40:11:WU02:FS01:0x22: XTC frame write interval: 250000 steps (25%) [4 total]
07:40:11:WU02:FS01:0x22: Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
07:40:19:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
07:41:38:WU02:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
07:42:55:WU02:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
07:43:19:WU00:FS00:0xa7:Completed 395000 out of 500000 steps (79%)
07:44:13:WU02:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
07:44:59:WU02:FS01:0x22:An exception occurred at step 35892: Particle coordinate is nan
07:44:59:WU02:FS01:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
07:44:59:WU02:FS01:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
07:44:59:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
07:44:59:WU02:FS01:0x22:Saving result file globals.csv
07:44:59:WU02:FS01:0x22:Saving result file science.log
07:44:59:WU02:FS01:0x22:Saving result file state.xml
07:44:59:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
07:45:00:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:45:00:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13415 run:4900 clone:1 gen:0 core:0x22 unit:0x0000000412bc7d9a5efc076d16ed8cc2
07:45:00:WU02:FS01:Uploading 344.01KiB to 18.188.125.154
07:45:00:WU02:FS01:Connecting to 18.188.125.154:8080
07:45:00:WU02:FS01:Upload complete
07:45:00:WU02:FS01:Server responded WORK_ACK (400)
07:45:00:WU02:FS01:Cleaning up
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Thu Jul 02, 2020 11:28 pm
by wuchzael
Jensen at home... My Vega gets this Project 13415 WU over and over again... (takes 1h 16min for 14k credits with ridiculous GPU utilization) while my turing card gets one 50K WU after another. I wouldn't be bothered if the Vega card would not outperform the Turing card in many of the bigger WUs, but the assignment seems to be very unbalanced and very nvidia biased right now.
Re: Project 13415 problematic or WU dumping a new hobby?
Posted: Fri Jul 03, 2020 1:11 am
by sklivas
My Vega 64 and p106's are getting very poor performance on these WU's, I believe due to low GPU utilization, <70% reported. I'm getting less than 1/4 the PPD than I normally would on both systems.
On the bright side, my house is no longer a sauna, which is a refreshing change