Page 2 of 3
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Dec 27, 2013 12:05 pm
by Napoleon
I started a separate discussion ( viewtopic.php?f=16&t=25464 ) about updating the main FAQ.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Dec 27, 2013 6:59 pm
by abdulwahidc
@netblazer I've changed the project mode so you shouldn't be assigned WUs from 7085 (and also prevent others from running into the same issue).
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Dec 27, 2013 8:01 pm
by netblazer
abdulwahidc wrote:@netblazer I've changed the project mode so you shouldn't be assigned WUs from 7085 (and also prevent others from running into the same issue).
Thank you.
I don't mind working on that project. Just give me something that I have a chance in hell to finish so that we don't waste the collective resources.
I can spare the 1$ per week in added electric bill coming this way
.
P.S., just to add an idea in the melting pot : Would it be possible to start the same project in 2 "classes" / versions?
You could have 1 version with say 10 M steps like this one has. And then the other class with 1M steps and give the 1M steps WU to older, lower end machines while the i5s, i7s could take care of the bigger steps. And then when you see a generation getting too far behind, assign them to a faster machine to catch up a little bit. This logic is very simple to code and requires very little testing (compared to the amount of work required to change your FAHcores).
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Dec 27, 2013 11:02 pm
by PantherX
Dynamically changing the number of steps within the same Project isn't supported AFAIK. Not sure why that it. Thus, all WUs within the same Project have the same number of steps.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 1:00 am
by netblazer
PantherX wrote:Dynamically changing the number of steps within the same Project isn't supported AFAIK. Not sure why that it. Thus, all WUs within the same Project have the same number of steps.
Hence, new solution that may be interesting to consider
.
My crappy old laptop has absolutely no problem / bottleneck running this 1221 P WU. It just takes too darn long. I can run maybe at 1 M steps per day. It's stupidly simple, but having completed 10 projects for 1221 points feels a lot better than only 1 for the same amount of points (over 10 days without any gratification for completing & achieving something). And will keep me more interesting in being an active participant as I feel like I'm really contributing something (psychology of number is really interesting).
An even much simpler solution is just to take the same project and cut it all down to 2M step runs for everybody (and the i7s don't lose out). That doesn't require any coding & testing and can work unchanged for the next 20 years, not to say forever.
There's just no reason to throw out those older ACTIVE systems...
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 1:59 am
by bruce
There's has never been a way to dynamically subdivide a WU. If the WU consists of 10M steps, finishing 1M steps means you've only done 10% of it and it won't be returned until it's finished. The PI associated with the project, could create an independent project representing 1M steps, but then every WU assigned from that project would be 1M steps, loading up the communications links with 10x as many uploads and downloads. Obviously some policy settings can overcome such issues. One way or another, there need to be projects that you can complete, and the assignment logic needs to take your hardware into account. At least temporarily, restricting machines like yours from this particular project is probably the best way to handle it.
By the way, 1 step of one protein is not the same as 1 step of another protein. As the number of atoms increases, each step needs a lot more computing.
Are your power-saving settings set for maximum performance, including Never Sleep?
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 2:33 am
by netblazer
Yes perf mode is on. Only the screen goes off, but the fahcores keep working... actually this is when I get the best TPF, I check that multiple times daily as well.
I've gone to viper's site to turn off all the unwanted services. I monitor the task manager regularly to wipe out anything I don't absolutely need. I'm stuck working with this thing right now because I lost my own. This is as tuned as it's ever going to get without over clocking it. I'm an SQL DBA and I specialize in performance tuning (hardware included). I'm not saying I'm perfect, but I don't think this laptop will go that much faster unless I flat out stop using it & start overclocking it... and it was worth like 300$ 5 years ago. So you get what you pay for!
What I meant is that Server side, you get 2 lengths of the WU (same everything, just adjust the step count). In some sequences you go in steps of 10 M, and in other sections you go in steps of 1 M and assign to correct hardware (obviously divide the PP WU by 10 as well).
In the generation script, all that it takes is a second counter with an if MOD @CounterVar = 0 (10 small units) else (1 big unit) to split the work according to what the folders need the repartition to be. That part is rather easy to code (just call the same function and change 1 parameter with step count). I don't know the impact on the rest of the pipeline but it could be negligible or even nonexistent depending on how you coded your stuff. It clearly has no impact on the fahcores nor the client. The uploading process stays the same. It requires a new field in the assignment table to send to the correct folder, but this is negligible DB size and coding size. Then processing the zipped file should work just fine since you didn't touch the client. You still have the same fields to link from preceding PRCG to the next one. Etc. This sounds really feasible without rebuilding the entire system, but this is as far as I can go without looking at the code, which is usually when all hell breaks loose.
I've thought about the bandwidth issue, but the fact is that your service has to be download bound rather than upload bound unlike "most" web servers. So uploading more wouldn't be much of an issue (but it has to be considered as it will indeed increase load, but only for slow folders, so again might be a non-issue). However since only 10% of the work would have been completed (all things considered), the download should be roughly 9-10 times smaller. Processing would then be also 9-10 times faster. The cycles would be shorter and could be resent sooner. So the work increase there would be very small, if actually noticeable.
This might also alleviate some pressure and leave you free to not change the bigadv requisites (again).
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 2:44 am
by 7im
netblazer wrote:... and can work unchanged for the next 20 years, not to say forever.
There's just no reason to throw out those older ACTIVE systems...
And in just 10 years we've gone from a single threaded Celeron 500 Mhz to as many as 128 2.6GHz threads in a 4Processor server. That's over 660x the speed, not to mention SSE speed doubled in that same time frame, making it 1330x the speed, or a 133x speed increase each year, and that's just in hardware. FAH software has increase 10x the speed during that time. From non-existent GPUs, to GPUs rivaling even the faster CPUs in folding performance.
Yes, there is good reason to throw out older active systems. They become boat anchors in 3-5 years, slowing down the overall speed of results, slowing down finding the cures for diseases that people die from each and every day. One disease that killed my father last year. Another that I survived with extreme treatments just this year. In the US, 1 in 4 die from Cancer. If you have a sibling and both parents, which one will it be? Both grandparents? You get 2 guesses...
PG has discussed adaptive sampling, but not for slower systems. They want to do it with the fastest systems to get longer trajectories very quickly. It's like scouting the trail ahead to see if they are heading in the right direction to find the best answers. And if not, they can change their direction sooner, making it a shorter path, or at least a more direct path to the final answers. The faster the better.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 3:54 am
by netblazer
And those adaptive sampling would be on bigadv only or anybody with say i7 and up?
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Dec 28, 2013 4:47 am
by bruce
As with any potential change adaptive sampling (if its value to improved scientific throughput can be demonstrated), don't expect an announcement until it is ready for a roll-out. Until that time, if it ever happens, details about which projects would be included are unknown as that's likely to depend on the preliminary test data they'd have at that time.
At the present time, the servers know very little about your hardware, basing it's decisions primarily on core count (threads or CPUs) reported by your OS. That is admittedly a rather poor predictor of speed, except perhaps for uniprocessor systems.
See
Upcoming changes to bigadv threshold
kasson wrote:We also recognize that core count is not the most robust metric of machine capability, but given our current infrastructure it is the most straightforward surrogate to evaluate.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sun Dec 29, 2013 11:32 am
by PantherX
netblazer wrote:...There's just no reason to throw out those older ACTIVE systems...
You are correct. We tend to support old hardware for as long as possible and there are two ways. The first being your suggestion of breaking up the larger projects into smaller ones. The second one is to have projects targeted for these older/smaller system from the beginning. The second method is what PG is currently using and so far, it has worked very well. You can see that your CPU is still being assigned valid WUs and you can finish it before the Preferred Deadline. Occasionally, there can be a server glitch or incorrectly configured server which may result in WUs not designed to run on your system, getting them. It has happened and was rectified, like now. Moreover, the benchmark is a dedicated system so its results will never be the same to a non-dedicated system. AFAIK, CPUs not supporting SSE2 are now no longer getting WUs since FahCore_78 isn't being assigned from roughly August. Still awaiting on if it will be fixed or officially deprecated.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sun Dec 29, 2013 4:35 pm
by 7im
Correct. PG does try to support clients and hardware as long as possible. Active systems can run as long as they can...but not 20 years, not past it's usable life. Power and efficiency increase too quickly. At some point, even with active systems, it's cheaper to replace with more efficient hardware just for the energy savings.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Jan 17, 2014 10:25 am
by netblazer
abdulwahidc wrote:@netblazer I've changed the project mode so you shouldn't be assigned WUs from 7085 (and also prevent others from running into the same issue).
Same problem, different project :
project:7083 run:0 clone:494 gen:26 core:0xa4 unit:0x000000e60001329c4fe0eac412bf4aae
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Fri Jan 17, 2014 12:19 pm
by netblazer
netblazer wrote:abdulwahidc wrote:@netblazer I've changed the project mode so you shouldn't be assigned WUs from 7085 (and also prevent others from running into the same issue).
Same problem, different project :
project:7083 run:0 clone:494 gen:26 core:0xa4 unit:0x000000e60001329c4fe0eac412bf4aae
And 7084 too while you're at it. They seem to have the exact same demand on CPU power.
Re: Project: 7085 (Run 0, Clone 695, Gen 16)
Posted: Sat Jan 18, 2014 3:31 am
by bruce
You reported CeleronM CPU 560 @ 2.13GHz. If you exclude whatever downtime you may typically experience (including SLEEP/HIBERNATE time) and any time when you're running some other program which puts heavy demands on the CPU such as video encoding, other DC projects, etc., over the course of a week, how many hours can FAH expect to find your computer essentially idle except for FAH? Does your computer run a pretty screensaver or something like "dark screen"?