Page 1 of 1

Project 13405 (542, 31, 0)

Posted: Tue May 12, 2020 5:17 pm
by Bill_Kirchner

Code: Select all

13:01:25:WU02:FS02:0x22:Completed 570000 out of 1000000 steps (57%)
13:04:06:WU02:FS02:0x22:Completed 580000 out of 1000000 steps (58%)
13:04:25:WU00:FS01:0x22:Completed 460000 out of 1000000 steps (46%)
13:04:53:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
13:04:53:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
13:05:06:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
13:05:06:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
13:05:18:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
13:05:18:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
13:05:18:WU00:FS01:0x22:ERROR:114: Max Retries Reached
13:05:18:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
13:05:18:WU00:FS01:0x22:Saving result file badstate-0.xml
13:05:18:WU00:FS01:0x22:Saving result file badstate-1.xml
13:05:18:WU00:FS01:0x22:Saving result file badstate-2.xml
13:05:18:WU00:FS01:0x22:Saving result file checkpointState.xml
13:05:19:WU00:FS01:0x22:Saving result file checkpt.crc
13:05:19:WU00:FS01:0x22:Saving result file globals.csv
13:05:19:WU00:FS01:0x22:Saving result file positions.xtc
13:05:19:WU00:FS01:0x22:Saving result file science.log
13:05:19:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
13:05:20:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
13:05:20:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13405 run:542 clone:31 gen:0 core:0x22 unit:0x0000000412bc7d9a5eb97d3e17645957
13:05:20:WU00:FS01:Uploading 4.97MiB to 18.188.125.154
13:05:20:WU00:FS01:Connecting to 18.188.125.154:8080
13:05:20:WU01:FS01:Connecting to 65.254.110.245:8080
13:05:20:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:05:20:WU01:FS01:Connecting to 18.218.241.186:80
13:05:20:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:05:20:ERROR:WU01:FS01:Exception: Could not get an assignment
13:05:20:WU01:FS01:Connecting to 65.254.110.245:8080
13:05:21:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:05:21:WU01:FS01:Connecting to 18.218.241.186:80
13:05:21:WU01:FS01:Assigned to work server 18.188.125.154
13:05:21:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:TU116 [GeForce GTX 1660 Ti] from 18.188.125.154
13:05:21:WU01:FS01:Connecting to 18.188.125.154:8080
13:05:22:WU01:FS01:Downloading 6.26MiB
13:05:23:WU00:FS01:Upload complete
13:05:23:WU00:FS01:Server responded WORK_ACK (400)
13:05:23:WU00:FS01:Cleaning up
13:05:25:WU01:FS01:Download complete
13:05:25:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13404 run:425 clone:28 gen:1 core:0x22 unit:0x0000000412bc7d9a5eb58474bf46c928
13:05:25:WU01:FS01:Starting
13:05:25:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kirch\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 5076 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
13:05:25:WU01:FS01:Started FahCore on PID 4864
13:05:25:WU01:FS01:Core PID:11796
My system is NOT overclocked. It is running on Microsoft Windows 10 PRO that is currently up to date on the latest Updates. The CPU is an Intel 9900 KS system running at 5.0 GHZ – which is normal for that particular CPU. 8 CPU’s with 16 Threads – All 8 CPU’s can run at 5.0 GHZ simultaneously. I use two EVGA Video cards. One is an EVGA 1080, and the other is an EVGA 1660 Ti – and is the one that detected the fault. Neither card is overclocked.

Re: Project 13405 (542, 31, 0)

Posted: Wed May 13, 2020 4:35 am
by Nuitari
That WU has been declared FAULTY by 5 folders, so I don't think its your system.

Re: Project 13405 (542, 31, 0)

Posted: Wed May 13, 2020 4:39 am
by Bastiaan_NL
I've had this 5 times the past 24 hours on 3 different cards.
All with 13404 and 13405 units. One is overclocked(never had an issue with that), the others are at stock clocks.
I'm pretty sure it's not our systems indeed.

Re: Project 13405 (542, 31, 0)

Posted: Wed May 13, 2020 8:27 am
by PantherX
Welcome to the F@H Forum Bill_Kirchner,

Please note that Project 13404 and Project 13405 are highly experimental (what those two projects are doing hasn't been attempted before by F@H) and this is what the researcher has to say:
JohnChodera wrote:...we're testing out some new workloads that help us prioritize compounds for synthesis via the COVID Moonshot (https://covid.postera.ai/covid/submissions/compounds) and are continuing to refine our process to make everything more stable!
The next batch of projects should make significant improvements over the first batch.

Thanks so much for your patience!

~ John Chodera // MSKCC