Page 1 of 1
How Does F@H Really Work
Posted: Wed Oct 29, 2008 6:02 am
by KE1HA
I dont understand something, daft as it may be.
I do understand that FAH is not a true parallel application, yet we employ MPICH (an application designed specificlly for parallel computing / message passing) to distribute WU segments to multiple cores which decreases process time (at least I think thats what we're doing with SMP). We've employed not 4 or 8 or 16 CPU cores, but hndreads of Steam Processors ( all on a single die, mind you ) in these GPU cards, which does the same thing, I think, more cores (processors) = faster process times.
I'm assuming, though I've not tried it, we can employ a SMP Client in an MP machine with say, 8 MP Processors, and have ourselves a 32 core folding machine, using SMP-MPICH. Yes / No ??
If we can use MPICH on a single motherboard (Multiple CPU's, MP or otherwise) to process a WU segments, why can't we build a listening deamon (Program) for use on a second machine to double the number of cores that process a particular WU segment? Why does the processing have to be resident on the Client computer ?
MOD/Admin: If this is the wrong thread, please move as needed.
.
Re: How Does F@H Really Work
Posted: Wed Oct 29, 2008 7:16 am
by 7im
The fah SMP client was hard coded for 4 cores, to simplify developement, troubleshooting, and to standardize the data for comparison, etc.
The GPU2 client was not hard coded for a set number, but designed to run with a much higher stream count, which is different in code and architecture.
Yes, MPICH was originally designed to run across multiple computers, but the the MPICH in the fah client is not configured that way. The fah client sends too much data between the fahcores when processing work units for the slow connections between computers to be helpful. Besides, fah already breaks up work units to send to different computers, so in essence, fah is a big computer cluster. No need for you to make another cluster.
Re: How Does F@H Really Work
Posted: Fri Oct 31, 2008 6:26 am
by codysluder
KE1HA wrote:I do understand that FAH is not a true parallel application, yet . . . .
There is more than one level of parallelism. I want to be sure you understand that some WU assignments must be done serially and others can be assigned in parallel, but
within a specific WU, some of the calculations can be done in parallel and some cannot.
Start a single trajectory as follows: Assume an initial position and velocity for every atom in a protein. Suppose it takes many months to compute the motions during successive time intervals to reach a folded state. If you break up that series of calculations into day-sized chunks, it will still take just as long to finish the many months worth of work but the successive day-sized chunks can be sent out to a series of different people. Of course you can't start any chunk until the results of the previous chunk is completed, so nothing is really gained by assigning it to different people.
Now, instead of starting a single trajectory, you start 120 of them from different positions or different velocities. Now 120 people can all be working on the same project at the same time but it still takes many months to compute all 120 trajectories.
What if you want the results sooner than "many months"? As I've already said even if you have 240 people willing to work on that project it will not go any faster because only 120 can work on it at a time. . . . but if the protein has 2000 atoms and the computer has 4 cores, you can assign 500 atoms to each of the cores. After one time interval, the new position/velocity of all 2000 atoms can be the starting point for the next time step. You cannot assign 500 atoms to four different people though, because you need the results from three other people before you can proceed. Thus SMP can only work in an environment where the inter-CPU communications is extremely rapid.
If a GPU has 100 processors, each one can work on 20 atoms out of the 2000, but before they can proceed to the next time step, all must stop and wait for new data from the other 99 processors. As you divide a single protein up into smaller and smaller pieces, the computations being done on a single trajectory take less and less time but the time spent waiting for data becomes a larger and larger fraction of the total time for that WU. A GPU that has 400 of the same type of processors and operates at the same clock speed will be faster, but less than four times as fast.
Re: How Does F@H Really Work
Posted: Fri Oct 31, 2008 6:55 am
by sdack
I wonder, because it is possible to carry numerical errors from one iteration to another, is there any statistical analyse involved on top, too? I know that WUs get send to more than one client, but that is not what I mean.
Are we all working on one large simulation of a protein or are we working on multiple simulations of the same protein?
I imagine that one way to parallelize a simulation like this is to run multiple simulations (i.e. with variations in time offsets) and to combine the different results with statistical analysis.
But if the answer is that it is one giant act of magic then this would be ok, too. It would allow me to believe in magic.
Re: How Does F@H Really Work
Posted: Fri Oct 31, 2008 7:24 am
by codysluder
When a real protein folds in the lab, there is a stastical analysis involved on to. Many protein molecules start from stastically similar conditions which are not identical. The folded result is a statistical average.
I would think that the overall FAH simulation would reflect similar statistics based on the simulation of multiple trajectories. FAH is self-checking, so most errors are detected, discarded, and recalculated. If you assume a single undetected error, the other trajectores would still average out to the correct answer. If you assume a series of similar errors rather than one error, the average results might be wrong, but the results could not be confirmed by lab tests.