Page 1 of 1

Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACHINE

Posted: Sun Aug 26, 2012 9:13 am
by Rum@NoV
After checking my logs I found one error which occurred the 17th of August 2012. This machine has completed other P8004's without any problems.

Code: Select all

18:12:40:WU01:FS00:Connecting to assign3.stanford.edu:8080
18:12:40:WU01:FS00:News: Welcome to Folding@Home
18:12:40:WU01:FS00:Assigned to work server 171.67.108.59
18:12:40:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:4 from 171.67.108.59
18:12:40:WU01:FS00:Connecting to 171.67.108.59:8080
18:12:41:WU01:FS00:Downloading 48.84KiB
18:12:42:WU01:FS00:Download complete
18:12:42:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8004 run:24 clone:27 gen:211 core:0xa4 unit:0x000001316652edcb4ee8fedd0e216613
18:12:54:WU01:FS00:Starting
18:12:54:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 960 -checkpoint 30 -np 4 -forceasm
18:12:54:WU01:FS00:Started FahCore on PID 5656
18:12:54:WU01:FS00:Core PID:5660
18:12:54:WU01:FS00:FahCore 0xa4 started
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:*------------------------------*
18:12:55:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
18:12:55:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Preparing to commence simulation
18:12:55:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
18:12:55:WU01:FS00:0xa4:- Not checking prior termination.
18:12:55:WU01:FS00:0xa4:- Expanded 49499 -> 1305600 (decompressed 2637.6 percent)
18:12:55:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=49499 data_size=1305600, decompressed_data_size=1305600 diff=0
18:12:55:WU01:FS00:0xa4:- Digital signature verified
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Project: 8004 (Run 24, Clone 27, Gen 211)
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Assembly optimizations on if available.
18:12:55:WU01:FS00:0xa4:Entering M.D.
18:13:01:WU01:FS00:0xa4:mdrun returned 255
18:13:01:WU01:FS00:0xa4:Going to send back what have done -- stepsTotalG=250000
18:13:01:WU01:FS00:0xa4:Work fraction=906238099456.0000 steps=250000.
18:13:05:WU01:FS00:0xa4:logfile size=6836 infoLength=6836 edr=25 trr=1
18:13:05:WU01:FS00:0xa4:logfile size: 6836 info=6836 bed=25 hdr=1
18:13:05:WU01:FS00:0xa4:- Writing 7374 bytes of core data to disk...
18:13:05:WU01:FS00:0xa4:Done: 6862 -> 2452 (compressed to 35.7 percent)
18:13:05:WU01:FS00:0xa4:  ... Done.
18:13:05:WU01:FS00:0xa4:
18:13:05:WU01:FS00:0xa4:Folding@home Core Shutdown: UNSTABLE_MACHINE
18:13:05:WU01:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
18:13:05:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:8004 run:24 clone:27 gen:211 core:0xa4 unit:0x000001316652edcb4ee8fedd0e216613
18:13:05:WU01:FS00:Uploading 2.89KiB to 171.67.108.59
18:13:05:WU01:FS00:Connecting to 171.67.108.59:8080
18:13:05:WU01:FS00:Upload complete
18:13:05:WU01:FS00:Server responded WORK_ACK (400)
18:13:05:WU01:FS00:Cleaning up

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Aug 26, 2012 11:02 am
by bollix47
Thank you for reporting.

That work unit hasn't been completed by a number of folders. I've marked it bad.

The WU (P8004,R24,C27,G211) has been reported as a bad WU. Note that the list of reported WUs are stopped daily at 8am pacific time.

Posted: Sat Oct 13, 2012 10:52 pm
by hnougher
I seems that every time I get a Project 8004 WU my entire system has freezes at random intervals.

Is there a way to cancel my current WU and not get any more of project 8004?

Re:

Posted: Sat Oct 13, 2012 11:09 pm
by P5-133XL
hnougher wrote:I seems that every time I get a Project 8004 WU my entire system has freezes at random intervals.

Is there a way to cancel my current WU and not get any more of project 8004?
There is no control at the client level not to get a specific project so your only solution is to permanently remove the SMP or uniprocessor slot.

On a normally running machine, a WU should not cause freezes or lockups. If that is occurring, I would be looking at your OC and/or temps but there could be other HW issues like RAM or a flaky MB.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Oct 14, 2012 7:42 am
by bruce
All projects are not identical. All overclocking benchmarks are not identical.

If your machine is unstable when running the project or benchmark that puts the highest workload on your system (which, according to you is P8004) then your hardware is not stable. Reduce your overclock until your machine is stable under any benchmark or any WU or make other changes to your system so it doesn't crash. From past experience, I'd say you will reduce your total PPD very little, and certainly less than you're going to lose from hangs/crashes.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Oct 14, 2012 12:17 pm
by hnougher
My machine has never been overclocked and has been folding SMP and GPU for over 2 years.
It has only recently started collecting 8004 WUs which is when this freeze problem started.
Only just last week it had a run of other SMP WUs which always completed without a hitch.
They came after a running a 8004 and freezing a number of times before it finally completed.

So my way out here is to let this SMP WU timeout while doing GPU WUs and hope not to get another 8004 later.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Oct 14, 2012 12:49 pm
by bollix47
Your ATI HD5850 GPU requires a full CPU core by itself. Have you tried setting your SMP slot in v7 to use only 3 cores?

If you're still using v6 then use the -smp 3 argument.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Oct 14, 2012 6:11 pm
by Joe_H
To add to what has been said on stability, my machines have processed hundreds of the Project 8004 WU's without any issues such as freezing. If you are seeing this with all 8004's, then the issue is with hardware or software on your machine. I would recommend you either follow bollix47's suggestion or try folding with the GPU slot paused while doing these WU's.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Sun Oct 14, 2012 8:49 pm
by hnougher
Recently due to the freezing I have had GPU paused in case it was the cause (new AMD drivers and such) but made no difference.
I suppose it could be a bug in recent windows update.. though I have not seen any problems in other applications/games.

Is there any good way to test all combinations of CPU instructions rather than load testing it on a few?

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Mon Oct 15, 2012 2:44 am
by bruce
hnougher wrote:Is there any good way to test all combinations of CPU instructions rather than load testing it on a few?
Every overclocking benchmark makes at attempt to test a representative sampling of CPU instructions but the best option is to run several benchmarks in the hopes that you'll get one that represents the software in question. Stresscpu2 was built around the stressful portion of the actual GROMACS code and comes awfully close to the SMP or Uniprocessor code.

I did check on the status of Project: 8004 (Run 24, Clone 27, Gen 211) and several people have returned it for 0 points so it's what we call a bad WU, not a problem in your machine. That trajectory seems to have been suspended quite some time ago. I don't understand how it got assigned to you recently.

Posted: Tue Oct 16, 2012 11:21 am
by hnougher
I have just finished running Stresscpu2 for a day and had no problems.

Also I have never given the exact RCG numbers since I think its a number of different ones but I can say it was not (Run 24, Clone 27, Gen 211).

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Posted: Tue Oct 16, 2012 1:38 pm
by bruce
In this particular forum: "Problems with a specific WU," the conventional recommendation is to start a new topic with the title specifying the particular WU and it's problem. Rum@NoV has done that and I was repying to his original post.

If you have a series of problems then it's a more general issue and it should be preported in one of the other forums.

See Troubleshooting "Bad WUs" at the top of this forum.

Posted: Tue Oct 16, 2012 8:51 pm
by hnougher
Oh.. sorry. I found this topic via the search and did not know about the entire forum just for this stuff. I will get it right next time.