Page 1 of 2

Problem with WU 18806 (19, 5, 167)

Posted: Fri Apr 05, 2024 12:23 am
by jdkdomain
My computer is getting "stuck" on this WU at 99% and not finishing. I can't find a way to abort or end this work unit and the workstation cant seem to get a new WU. Tried shutting down and restarting but did not fix. Tried to go into FAHControl but dont see anything to about WU. Any suggestions on how to fix? Thanks.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Fri Apr 05, 2024 12:45 am
by jdkdomain
The WU finally reset after about another 15 mins after I sent this - computer appears to be working on a new WU now.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 1:03 am
by Badsinger
I got a 18806 this evening with the 0xa9 core and the WU would not even start. After several fails it reported the failure and downloaded a 0x23 WU. I've copied the 0xa9 portion of the log. Do I need to send it somewhere or is the info it reported back sufficient? Thanks.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 2:01 am
by BobWilliams757
Badsinger,

Please check your log to see if you got a work unit starting from a checkpoint. See the below, it sounds similar.


https://foldingforum.org/viewtopic.php?t=41731


I wasn't sure if it was a project problem or a server problem. In either case it is also been reported by another user on the discord channel.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 4:32 am
by Badsinger
BobWilliams757 wrote: Wed Jul 24, 2024 2:01 am Badsinger,

Please check your log to see if you got a work unit starting from a checkpoint. See the below, it sounds similar.


https://foldingforum.org/viewtopic.php?t=41731


I wasn't sure if it was a project problem or a server problem. In either case it is also been reported by another user on the discord channel.
My problem, with different points is identical.
FSO1:0xa9:steps first=38500000 total=38750000

same error message after start and after a few tries it went back to the 0x23 core and tasks.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 5:40 am
by BobWilliams757
Interesting. I know that the user on Discord had the same issue, and that the specific work unit he had also failed for another user. You might want to see if the one you have also failed for someone else.


https://apps.foldingathome.org/wu

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 6:42 am
by BobWilliams757
And just passed for info, I've emailed and PM'd the scientist via the links here. I'll update if I get any response. BUT, I'm not sure if this has anything to do with that project, or somehow is related to the core update, and I have no idea who is the driving force behind that update, thus no way to contact them.

If anyone gets any scoop, please pass it on.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Jul 24, 2024 7:24 pm
by muziqaz
Researcher has been informed, though it is possible it is not their fault

Re: Problem with WU 18806 (19, 5, 167)

Posted: Thu Jul 25, 2024 6:59 pm
by Badsinger
18806 has just failed for me again on a different machine. The core downloads fine, it fails to start 3 times then grabs a different WU and off it goes.

18:48:01:WU00:FS01:0xa9:********************************************************************************
18:48:01:WU00:FS01:0xa9:Project: 18806 (Run 29, Clone 59, Gen 8)
18:48:01:WU00:FS01:0xa9:Reading tar file core.xml
18:48:01:WU00:FS01:0xa9:Reading tar file frame8.tpr
18:48:01:WU00:FS01:0xa9:Digital signatures verified
18:48:01:WU00:FS01:0xa9:Calling: mdrun -c frame8.gro -s frame8.tpr -x frame8.xtc -cpt 15 -nt 1 -ntmpi 1 -update gpu -nb gpu -bonded gpu -pme gpu -pmefft gpu -gpu_id 0
18:48:01:WU00:FS01:0xa9:Steps: first=2000000 total=2250000
18:48:01:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
18:48:01:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ERROR (121 = 0x79)
18:48:01:WARNING:WU00:FS01:Too many errors, failing
18:48:01:WU00:FS01:Sending unit results: id:00 state:SEND error:FAILED project:18806 run:29 clone:59 gen:8 core:0xa9 unit:0x080000003b0000001d00000076490000

Re: Problem with WU 18806 (19, 5, 167)

Posted: Thu Jul 25, 2024 7:30 pm
by muziqaz
I sent the researcher an email. Hopefully they read emails more frequently than our internal communication channels :D

Re: Problem with WU 18806 (19, 5, 167)

Posted: Fri Jul 26, 2024 6:38 pm
by muziqaz
Things should be fixed now.
Hopefully

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Oct 30, 2024 8:33 pm
by Demmers
Thought i'd "re-open" this thread, as I had this WU fail on me today. Unfortunately I haven't been able to copy the logs, but I do know that at around 90% I noticed it had stopped (was monitoring on my phone), and upon checking the logs, "bad unit" was mentioned. Interestingly, before this happened, the last thing I noticed today when I left the PC was that FAHClient.exe was using a mammoth amount of memory, about 6GB, but fortunately I run with 16. I've never seen that before, but I didn't want to change anything in case I myself failed the WU. But for what ever reason, it eventually crashed itself.

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Oct 30, 2024 9:04 pm
by muziqaz
High RAM usage has been observed before. This is Windows only, though.
Funny enough, system which was showing high RAM usage before now is showing 1.2MB used for Fahclient.exe

Re: Problem with WU 18806 (19, 5, 167)

Posted: Wed Oct 30, 2024 9:08 pm
by muziqaz
Demmers wrote: Wed Oct 30, 2024 8:33 pm Thought i'd "re-open" this thread, as I had this WU fail on me today. Unfortunately I haven't been able to copy the logs, but I do know that at around 90% I noticed it had stopped (was monitoring on my phone), and upon checking the logs, "bad unit" was mentioned. Interestingly, before this happened, the last thing I noticed today when I left the PC was that FAHClient.exe was using a mammoth amount of memory, about 6GB, but fortunately I run with 16. I've never seen that before, but I didn't want to change anything in case I myself failed the WU. But for what ever reason, it eventually crashed itself.
If you could report your full system details in this issue tracker, that would be great:
https://github.com/FoldingAtHome/fah-cl ... issues/291

Alternatively, you can list your system details here, in this thread (please include OS + build, CPU, GPU, motherboard and RAM used)

Thanks

Re: Problem with WU 18806 (19, 5, 167)

Posted: Thu Oct 31, 2024 12:50 am
by Demmers
muziqaz wrote: Wed Oct 30, 2024 9:08 pm If you could report your full system details in this issue tracker, that would be great:
https://github.com/FoldingAtHome/fah-cl ... issues/291

Alternatively, you can list your system details here, in this thread (please include OS + build, CPU, GPU, motherboard and RAM used)

Thanks
Thanks, i've submitted details in github.