Page 1 of 1

How does the WU system work?

Posted: Sat Jan 23, 2010 10:50 pm
by chumbucket843
i'm confused on this. is folding at home embarrassingly parallel? and then you parallelize the simulation over all systems or am i doing the same WU over and over with different results each time like rosetta? what kind of fault tolerance system does F@H have?

you dont have to go into great detail. i have been thinking about how this works and its confusing me.

thanks

Re: How does the WU system work?

Posted: Sat Jan 23, 2010 11:20 pm
by Nathan_P
From what i iunderstand - taken from the wiki
Each project consists of multiple WUs, and each WU calculates a slightly different portion of a trajectory for a particular protein. These trajectory parts are identified by the Run, Clone and Generation numbers.

Once someone has folded gen 1 of a run & clone e.g r42 C21 G1 - the next gen, G2 will be sent out to be folded and so on. Each WU only gets processed once unless:

1. You missed the preferred deadline - WU will be resent to someone
2. If the WU goes back early because of an error (i think this is true - can someone please confirm)

If the WU is processed normally, sent back OK and you get credit then thats it.

PG spent several years validating the science before they started to use the results sent back to them and they have no published in excess of 70 papers so there must be a ton of fault tolerance built into the system at various points

If you want a bit more info try the wiki: http://fahwiki.net

Re: How does the WU system work?

Posted: Sun Jan 24, 2010 12:21 am
by 7im
Both serial and parallel at the same time.

Serial in that Generation 2 work units are not created until the results from Generation 1 are returned to Stanford.

Parallel in that Project 1234 and Project 1235 might be working on the same protein, but with slightly different environment settings. Hot, colder, more water, etc.

Here is a good fah wiki article... http://fahwiki.net/index.php/Runs,_Clones_and_Gens

Re: How does the WU system work?

Posted: Sun Jan 24, 2010 3:24 am
by bruce
Nathan_P wrote:2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Sort of true.

each individual Project/Run/Clone starts with an assumed random distribution of atomic velocities and continues to process with a degree of randomness due to thermal motions. Some of those combinations lead to rational motions that fold quickly, some to rational motions that simply stay at approximately the same shape, and some lead to motions that would never happen in nature -- and to an error. Thus some trajectories are very useful, some produce no results, and some must be discarded. That combination means that's a form of parallelism that FAH has been able to exploit because in real life,there are long periods of time when nothing useful is happening and then folding happens abruptly, due to those random thermal motions. After discarding the "bad WUs" the overall statistics lead to useful results.

On the other hand, if someone's hardware is malfunctioning due to overclocking, overheating or a defect of some kind, it will generate an error. There is no way to know whether the error is due to hardware or if it's due to the randomness mentioned above. Those WUs are reassigned and if they fail again, the error was a random bad WU. If that other computer does not produce an error, it was due to overclocking/etc. This is an important part of the overall FAH system redundancy. The scientists use redundancy wherever it makes sense but also minimize it wherever possible so that more work can be completed.

Re: How does the WU system work?

Posted: Sat Jan 30, 2010 6:03 pm
by whynot
Nathan_P wrote: 2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Here is at least one EUE that was sent to two different donors. I can't find a thread, however I was told once that failed WUs are resent (sometimes immediately; look through the server-problems forum, there lots of threads about that). Although, I've got the impression that there're different strategies for different types of fails (I can be wrong about this).

Shortly, I dare to state (I'm not in authority to, to be honest) none successful WU is re-issued (ever) .

Re: How does the WU system work?

Posted: Sat Jan 30, 2010 11:17 pm
by Nathan_P
whynot wrote:
Nathan_P wrote: 2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Here is at least one EUE that was sent to two different donors. I can't find a thread, however I was told once that failed WUs are resent (sometimes immediately; look through the server-problems forum, there lots of threads about that). Although, I've got the impression that there're different strategies for different types of fails (I can be wrong about this).

Shortly, I dare to state (I'm not in authority to, to be honest) none successful WU is re-issued (ever) .
You are correct in that statement with one exception, they will sometimes reissue completed projects on new core's to validate that the new core isn't returning garbage. You can see it with the new a3 core, most of the projects are reissues of the existing smp a1/a2 projects

Re: How does the WU system work?

Posted: Sat Feb 06, 2010 1:42 pm
by whynot
Nathan_P wrote: You are correct in that statement with one exception, they will sometimes reissue completed projects on new core's to validate that the new core isn't returning garbage. You can see it with the new a3 core, most of the projects are reissues of the existing smp a1/a2 projects
Please note, that's a new core what's validated but WU. Thanks for correction.