Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACHINE

Moderators: Site Moderators, FAHC Science Team

Post Reply
Rum@NoV
Posts: 133
Joined: Tue Dec 25, 2007 1:29 pm
Hardware configuration: Orion (SMP):
---------------
CPU: Intel Core i7-2600
Memory: 8192 MBytes PC12800 DDR3 SDRAM
Graphics: Sapphire RADEON HD 5850, 1024 MB GDDR5 SDRAM
OS: Microsoft Windows 7 Home Premium (x64) Build 7601

Deepcore_II (SMP):
----------------------
CPU: Intel Core 2 Quad Q9650
Memory: 4096 MBytes PC6400 DDR2-SDRAM
OS: Ubuntu Server 10.04

Nostromo (SMP+GPU):
--------------------------
CPU: Intel Core 2 Duo E8500
Memory: 4096 MBytes PC6400 DDR2-SDRAM
Graphics: EVGA e-GeForce 8800 GT, 512 MB GDDR3 SDRAM
OS: Microsoft Windows 7 Home Premium (x64) Build 7601
Location: Flanders, Belgium

Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACHINE

Post by Rum@NoV »

After checking my logs I found one error which occurred the 17th of August 2012. This machine has completed other P8004's without any problems.

Code: Select all

18:12:40:WU01:FS00:Connecting to assign3.stanford.edu:8080
18:12:40:WU01:FS00:News: Welcome to Folding@Home
18:12:40:WU01:FS00:Assigned to work server 171.67.108.59
18:12:40:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:4 from 171.67.108.59
18:12:40:WU01:FS00:Connecting to 171.67.108.59:8080
18:12:41:WU01:FS00:Downloading 48.84KiB
18:12:42:WU01:FS00:Download complete
18:12:42:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8004 run:24 clone:27 gen:211 core:0xa4 unit:0x000001316652edcb4ee8fedd0e216613
18:12:54:WU01:FS00:Starting
18:12:54:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 960 -checkpoint 30 -np 4 -forceasm
18:12:54:WU01:FS00:Started FahCore on PID 5656
18:12:54:WU01:FS00:Core PID:5660
18:12:54:WU01:FS00:FahCore 0xa4 started
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:*------------------------------*
18:12:55:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
18:12:55:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Preparing to commence simulation
18:12:55:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
18:12:55:WU01:FS00:0xa4:- Not checking prior termination.
18:12:55:WU01:FS00:0xa4:- Expanded 49499 -> 1305600 (decompressed 2637.6 percent)
18:12:55:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=49499 data_size=1305600, decompressed_data_size=1305600 diff=0
18:12:55:WU01:FS00:0xa4:- Digital signature verified
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Project: 8004 (Run 24, Clone 27, Gen 211)
18:12:55:WU01:FS00:0xa4:
18:12:55:WU01:FS00:0xa4:Assembly optimizations on if available.
18:12:55:WU01:FS00:0xa4:Entering M.D.
18:13:01:WU01:FS00:0xa4:mdrun returned 255
18:13:01:WU01:FS00:0xa4:Going to send back what have done -- stepsTotalG=250000
18:13:01:WU01:FS00:0xa4:Work fraction=906238099456.0000 steps=250000.
18:13:05:WU01:FS00:0xa4:logfile size=6836 infoLength=6836 edr=25 trr=1
18:13:05:WU01:FS00:0xa4:logfile size: 6836 info=6836 bed=25 hdr=1
18:13:05:WU01:FS00:0xa4:- Writing 7374 bytes of core data to disk...
18:13:05:WU01:FS00:0xa4:Done: 6862 -> 2452 (compressed to 35.7 percent)
18:13:05:WU01:FS00:0xa4:  ... Done.
18:13:05:WU01:FS00:0xa4:
18:13:05:WU01:FS00:0xa4:Folding@home Core Shutdown: UNSTABLE_MACHINE
18:13:05:WU01:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
18:13:05:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:8004 run:24 clone:27 gen:211 core:0xa4 unit:0x000001316652edcb4ee8fedd0e216613
18:13:05:WU01:FS00:Uploading 2.89KiB to 171.67.108.59
18:13:05:WU01:FS00:Connecting to 171.67.108.59:8080
18:13:05:WU01:FS00:Upload complete
18:13:05:WU01:FS00:Server responded WORK_ACK (400)
18:13:05:WU01:FS00:Cleaning up
Image
bollix47
Posts: 2963
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by bollix47 »

Thank you for reporting.

That work unit hasn't been completed by a number of folders. I've marked it bad.

The WU (P8004,R24,C27,G211) has been reported as a bad WU. Note that the list of reported WUs are stopped daily at 8am pacific time.
hnougher
Posts: 13
Joined: Sat Apr 23, 2011 1:57 am
Hardware configuration: Win7 x86_64
Core2 Quad 2.5GHz
6GiB RAM
ATI HD5850 1GiB

Post by hnougher »

I seems that every time I get a Project 8004 WU my entire system has freezes at random intervals.

Is there a way to cancel my current WU and not get any more of project 8004?
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re:

Post by P5-133XL »

hnougher wrote:I seems that every time I get a Project 8004 WU my entire system has freezes at random intervals.

Is there a way to cancel my current WU and not get any more of project 8004?
There is no control at the client level not to get a specific project so your only solution is to permanently remove the SMP or uniprocessor slot.

On a normally running machine, a WU should not cause freezes or lockups. If that is occurring, I would be looking at your OC and/or temps but there could be other HW issues like RAM or a flaky MB.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by bruce »

All projects are not identical. All overclocking benchmarks are not identical.

If your machine is unstable when running the project or benchmark that puts the highest workload on your system (which, according to you is P8004) then your hardware is not stable. Reduce your overclock until your machine is stable under any benchmark or any WU or make other changes to your system so it doesn't crash. From past experience, I'd say you will reduce your total PPD very little, and certainly less than you're going to lose from hangs/crashes.
hnougher
Posts: 13
Joined: Sat Apr 23, 2011 1:57 am
Hardware configuration: Win7 x86_64
Core2 Quad 2.5GHz
6GiB RAM
ATI HD5850 1GiB

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by hnougher »

My machine has never been overclocked and has been folding SMP and GPU for over 2 years.
It has only recently started collecting 8004 WUs which is when this freeze problem started.
Only just last week it had a run of other SMP WUs which always completed without a hitch.
They came after a running a 8004 and freezing a number of times before it finally completed.

So my way out here is to let this SMP WU timeout while doing GPU WUs and hope not to get another 8004 later.
bollix47
Posts: 2963
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by bollix47 »

Your ATI HD5850 GPU requires a full CPU core by itself. Have you tried setting your SMP slot in v7 to use only 3 cores?

If you're still using v6 then use the -smp 3 argument.
Joe_H
Site Admin
Posts: 7939
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by Joe_H »

To add to what has been said on stability, my machines have processed hundreds of the Project 8004 WU's without any issues such as freezing. If you are seeing this with all 8004's, then the issue is with hardware or software on your machine. I would recommend you either follow bollix47's suggestion or try folding with the GPU slot paused while doing these WU's.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
hnougher
Posts: 13
Joined: Sat Apr 23, 2011 1:57 am
Hardware configuration: Win7 x86_64
Core2 Quad 2.5GHz
6GiB RAM
ATI HD5850 1GiB

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by hnougher »

Recently due to the freezing I have had GPU paused in case it was the cause (new AMD drivers and such) but made no difference.
I suppose it could be a bug in recent windows update.. though I have not seen any problems in other applications/games.

Is there any good way to test all combinations of CPU instructions rather than load testing it on a few?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by bruce »

hnougher wrote:Is there any good way to test all combinations of CPU instructions rather than load testing it on a few?
Every overclocking benchmark makes at attempt to test a representative sampling of CPU instructions but the best option is to run several benchmarks in the hopes that you'll get one that represents the software in question. Stresscpu2 was built around the stressful portion of the actual GROMACS code and comes awfully close to the SMP or Uniprocessor code.

I did check on the status of Project: 8004 (Run 24, Clone 27, Gen 211) and several people have returned it for 0 points so it's what we call a bad WU, not a problem in your machine. That trajectory seems to have been suspended quite some time ago. I don't understand how it got assigned to you recently.
hnougher
Posts: 13
Joined: Sat Apr 23, 2011 1:57 am
Hardware configuration: Win7 x86_64
Core2 Quad 2.5GHz
6GiB RAM
ATI HD5850 1GiB

Post by hnougher »

I have just finished running Stresscpu2 for a day and had no problems.

Also I have never given the exact RCG numbers since I think its a number of different ones but I can say it was not (Run 24, Clone 27, Gen 211).
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 8004 (Run 24, Clone 27, Gen 211): UNSTABLE_MACH

Post by bruce »

In this particular forum: "Problems with a specific WU," the conventional recommendation is to start a new topic with the title specifying the particular WU and it's problem. Rum@NoV has done that and I was repying to his original post.

If you have a series of problems then it's a more general issue and it should be preported in one of the other forums.

See Troubleshooting "Bad WUs" at the top of this forum.
hnougher
Posts: 13
Joined: Sat Apr 23, 2011 1:57 am
Hardware configuration: Win7 x86_64
Core2 Quad 2.5GHz
6GiB RAM
ATI HD5850 1GiB

Post by hnougher »

Oh.. sorry. I found this topic via the search and did not know about the entire forum just for this stuff. I will get it right next time.
Post Reply