Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Moderators: Site Moderators, FAHC Science Team

Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by Ivoshiee »

When ever you have the same WU fail multiple times at the same point with UNKNOWN error, 0x0, 0x1 or something else than EUE then you should back up the WU data by stopping it just a bit before when it will error out and try to run it on some other computer. You can post it for someone else to test it out as well. This will make it possible to improve the FAH core files to detect those errors and classify those as EUEs.

Why is it needed? If for no other reason then for the points - WUs with UNKNOWN errors, 0x0 nor 0x1 will get you no points, but EUE will get partial credit.


For example:
http://foldingforum.org/viewtopic.php?t=258

Also:
http://fahwiki.net/index.php/Common_Error_Messages
http://fahwiki.net/index.php/Error_0x0_and_0x1

Note: When ever you have an excessive amount of the WU failures, you should test your computer for errors - memory (http://www.memtest.org/), temperatures, ...
klasseng
Posts: 126
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System 1: Mac Studio, M1 Max,
System 2: Mac Mini, M2
Location: Canada

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by klasseng »

Ivoshiee:

I've just had a WU failure (multiple times at the same point) and came across this post . . . but I need more information about:
a) "you should back up the WU" . . . just how is that done?
b) "run it on some other computer" . . . how does that get done?
c) "you can post it for someone else to test" . . . post what and where?

peace,
klasseng
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by codysluder »

klasseng wrote:Ivoshiee:

I've just had a WU failure (multiple times at the same point) and came across this post . . . but I need more information about:
a) "you should back up the WU" . . . just how is that done?
b) "run it on some other computer" . . . how does that get done?
c) "you can post it for someone else to test" . . . post what and where?

peace,
klasseng
a) copy the entire installation directory somewhere else (or, you can be more selective and copy less data if you know what you're doing. (See the WIKI instructions for "sneakernetting.")

b) If you have more than one computer running the same OS, see the WIKI for instructions about "sneakernetting"

c) If you have only one computer, contact someone else with the same OS and see if they can process your backup. "What" is the same thing backed up in step a. "Where" depends on whether you have your own website or if you need to upload the data to one of the advertising supported hosts. In some cases, the data can be emailed but there are often limitations on the size of email attachments that prohibit this method.
MacBozo
Posts: 1
Joined: Sun Jan 13, 2008 4:15 am

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by MacBozo »

I keep getting the following on Protein: p3045_FORMIN BINDING PROTEIN WUs:
[09:00:39] Completed 585000 out of 1500000 steps (39)
Warning: 1-4 interaction at distance larger than 3.24
These are ignored for the rest of the simulation
turn on -debug for more information
[09:08:46] CoreStatus = 0 (0)
[09:08:46] Client-core communications error: ERROR 0x0
[09:08:46] Deleting current work unit & continuing...
I've successfully completed other WUs without problem, but these 3045s keep cutting out at the same point with the same error. Is there a way to block them from being downloaded? Mac OS X 10.5.1, Client v6.o text (terminal)

Thanks,
Michael
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by bruce »

MacBozo wrote:Is there a way to block them from being downloaded?
There's nothing that YOU can do to block them except to make the kind of report you just made (preferably with a title "Project 3404 Run xxxx Clone xxx Gen XX"indicating the specific WU you're having trouble with. The Pande Group has already taken a few Run/Clone combinations off-line when it's clear that something is wrong with that WU.
Oldhat
Posts: 30
Joined: Mon Dec 03, 2007 11:42 am
Location: Auckland

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by Oldhat »

Ivoshiee wrote:When ever you have the same WU fail multiple times at the same point with UNKNOWN error, 0x0, 0x1 or something else than EUE then you should back up the WU data by stopping it just a bit before when it will error out and try to run it on some other computer. You can post it for someone else to test it out as well. This will make it possible to improve the FAH core files to detect those errors and classify those as EUEs.
You mention stopping the client prior to the error and then trying it on a different computer.

With the Linux client I have found that merely stopping it at any point prior to the error and restarting is normally sufficient to allow successful completion of the WU.

Only a few times has this been unsuccessful.
LookN2Find
Posts: 1
Joined: Mon Feb 25, 2008 6:21 am

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by LookN2Find »

So, what's going on, axactly? I cannot complete a WU on my laptop or GPU clients for a couple of weeks now. The GPU client was running great, and then all of the sudden it started tossing out blank WU's over and over reapeatedly, and now gives everything a sense of false measurement. My PS3 is folding fine, and another friend of mine's PS3 is folding fine, but our Core 2 Duo's will not complete a WU in the Conolse Client to save anyones life (hm, literally). I have not tried running the graphical client for our CPUs yet.

I am running an ATI X1950 Pro GPU/Video Card. I am running a 1.66Ghz Core 2 Duo that I have been folding with for almost a solid year and a half, 24/7. I am also having problems on a Pentium D unit, and a Celeron D unit. All of them failed within the same time frame of 24 hours, and none of them will complete a work unit. I have re-installed clients. I have tried Beta's and standards. I have read forums. I have changed settings, and back tracked Video Catalysts to recommended versions, etc. I think I have done everything that can possibly be done. I am about to attempt to fire up another GPU and an E6850 Core 2, but is F@H having software issues on both clients, now??

None of my units will run. I would very much appreciate if someone would let me know if we're waiting for a release of another "soon to come" client...? I am just so exhausted with reinstalling 4 different versions, on 5 individual clients, and times that toward changing setting 3-4 different times per unit, and the downloading etc, and you got what could estimate to a minimum of 60, and maximum of 80 attempts/failures (not to mention how many WU's failed within each test of settings and clients). I'm seriously exhausted from trying to fold :shock: .

Someone please help so I can help!
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by Ivoshiee »

LookN2Find wrote:So, what's going on, axactly? I cannot complete a WU on my laptop or GPU clients for a couple of weeks now. The GPU client was running great, and then all of the sudden it started tossing out blank WU's over and over reapeatedly, and now gives everything a sense of false measurement. My PS3 is folding fine, and another friend of mine's PS3 is folding fine, but our Core 2 Duo's will not complete a WU in the Conolse Client to save anyones life (hm, literally). I have not tried running the graphical client for our CPUs yet.

I am running an ATI X1950 Pro GPU/Video Card. I am running a 1.66Ghz Core 2 Duo that I have been folding with for almost a solid year and a half, 24/7. I am also having problems on a Pentium D unit, and a Celeron D unit. All of them failed within the same time frame of 24 hours, and none of them will complete a work unit. I have re-installed clients. I have tried Beta's and standards. I have read forums. I have changed settings, and back tracked Video Catalysts to recommended versions, etc. I think I have done everything that can possibly be done. I am about to attempt to fire up another GPU and an E6850 Core 2, but is F@H having software issues on both clients, now??

None of my units will run. I would very much appreciate if someone would let me know if we're waiting for a release of another "soon to come" client...? I am just so exhausted with reinstalling 4 different versions, on 5 individual clients, and times that toward changing setting 3-4 different times per unit, and the downloading etc, and you got what could estimate to a minimum of 60, and maximum of 80 attempts/failures (not to mention how many WU's failed within each test of settings and clients). I'm seriously exhausted from trying to fold :shock: .

Someone please help so I can help!
Have you checked the GPU temperatures?
I had over 5000 WUs EUE during a course of 2 days because of failed GPU cooler.
spazzcat
Posts: 6
Joined: Thu Apr 10, 2008 1:17 am

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by spazzcat »

I think I'm having the same issue? The WU was ambda5_99sbExtra SSE boost OK.

[17:37:50] Warning: long 1-4 interactions
[17:37:54] CoreStatus = 0 (0)
[17:37:54] Client-core communications error: ERROR 0x0
[17:37:54] Deleting current work unit & continuing...
[17:42:22] - Warning: Could not delete all work unit files (1): Core returned invalid code
[17:42:22] Trying to send all finished work units
[17:42:22] + No unsent completed units remaining.
[17:42:22] - Preparing to get new work unit...
[17:42:22] + Attempting to get work packet
[17:42:22] - Will indicate memory of 3894 MB
[17:42:22] - Connecting to assignment server
[17:42:22] Connecting to http://assign.stanford.edu:8080/
[17:42:23] Posted data.
[17:42:23] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:42:23] + News From Folding@Home: Welcome to Folding@Home
[17:42:23] Loaded queue successfully.
[17:42:23] Connecting to http://171.64.65.63:8080/
[17:42:23] Posted data.
[17:42:23] Initial: 0000; + Could not connect to Work Server
[17:42:23] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[17:42:34] + Attempting to get work packet
[17:42:34] - Will indicate memory of 3894 MB
[17:42:34] - Connecting to assignment server
[17:42:34] Connecting to http://assign.stanford.edu:8080/
[17:42:34] Posted data.
[17:42:34] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:42:34] + News From Folding@Home: Welcome to Folding@Home
[17:42:34] Loaded queue successfully.
[17:42:34] Connecting to http://171.64.65.63:8080/
Image
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by jrweiss »

I've been away from home for a week. I noticed by monitoring my stats that my SMP setup is not producing. i was finally to check it remotely from Amsterdam (I've been WAY more remote than that!) and have found that it keeps downloading 3062 5/6/93 and gets EUEs at 44%. I cannot run it on another machine.

I'll open a new topic or look for a current one, and post the logs.

How do I ensure this particular WU doesn't just re-appear yet again?
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by Ivoshiee »

jrweiss wrote:I've been away from home for a week. I noticed by monitoring my stats that my SMP setup is not producing. i was finally to check it remotely from Amsterdam (I've been WAY more remote than that!) and have found that it keeps downloading 3062 5/6/93 and gets EUEs at 44%. I cannot run it on another machine.

I'll open a new topic or look for a current one, and post the logs.

How do I ensure this particular WU doesn't just re-appear yet again?
As the 0x0 will not get reported the FAH DC system should assign the WU again to you about 3-5 times before moving on. If you insist not having the WU again the you have nothing more to do than to keep dumping the WU until you'll get another one.
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by jrweiss »

Well, some of us don't have the luxury of moving a WU to another machine, and some of us can't dump a WU by remote control from half way around the world...

Maybe the EUE re-assignment process should be rethunk, so it doesn't go back to the same computer...
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Sunin
Posts: 22
Joined: Fri Apr 11, 2008 3:50 pm

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by Sunin »

The way it works now, i believe, is until the WU expires you will be regiven that same failed WU... I've gotten numerous identical WUs to rechug that had EUE... and of course everything before them worked great and everything after has worked flawlessly... but for a few days and maybe a set # of failures before it reassigns them.
rada
Posts: 5
Joined: Sun Dec 02, 2007 4:06 am

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by rada »

you can be more selective and copy less data if you know what you're doing. (See the WIKI instructions for "sneakernetting.") <to figure out what to back up and what to post to help troubleshooters>

Well, missed it on first few reads, but it seems to say queue.dat and all of work/ directory is enough.

Unfortunately, that barely cut a couple % from compressed size of whole folding directory on my problem unit. So I will archive all of folding dir with tar + bzipping or bzip2'ng unless it creates problems. Anyway figured I'd post that here since it was buried deep in the wiki text.
jayrex
Posts: 3
Joined: Wed Jul 30, 2008 12:36 pm

Re: Multiple failures at the same point - UNKNOWN, 0x0 or 0x1.

Post by jayrex »

Hi there,

I'm not sure if it's multiply errors I'm getting. But it certainly is one big error.

I'm just finished doing my second workunit when my software goes to download the next work unit. It attempts to download bytes and says

'Core download error (#1), waiting before retry'

This attempt will happen more than once #2, #3, #4 and so on.

What should I do?
Locked