Page 3 of 4

Re: Suggestion

Posted: Thu Nov 13, 2008 5:02 pm
by MtM
RMAC9.5 wrote:Bruce and 7im,
I am new to Folding but I am also a DC oldtimer and some of what you say makes no sense to me. I am a dial up user and the fastest way for me to finish and return a WU is to allow me to cache ONE extra WU. For example, I have two ATI Radeon 3850 video cards which I recently bought for Folding. They take any where from 8 to 16 hours to complete a GPU WU depending on how many other DC processes are running in parallel. If I could cache ONE extra GPU WU per PC, I would configure these two PCs so that the video cards would run flat out and they would finish 2.5 to 3 GPU WUs per day. Currently, I can't/won't do this because I am not willing for both the video card and the CPU to sit idle for multiple hours per day. Instead of finishing 5 - 6 GPU WUs per day these two PCs finish 2 - 3 GPU WUs per day.

I also have 4 other PCs with empty PCI-E video card slots that could be used for Folding, but the management effort needed to make sure that each GPU folding run finishes in the morning before I go to work or in the evening after I come home from work is simply too great.
There are 3rd party tools like AFC which will make it easier for the gpu clients to work in paralell with other dc projects, and there is allot of information about this on the forums as well.

I run a single gpu and dual linux smp's in vmware, I never iddle, only the gpu has had the '24h pause' due to now resolved problems with the newest core's. What you're describing is not what should and frankly what does happen, you're having problems which 99% of the donor community does not experience. So what you need to do imho, is open a thread about your issues and ask for help, and not blame the project for your particular configuration problems.

Re: Suggestion

Posted: Thu Nov 13, 2008 5:05 pm
by MtM
Rattledagger wrote:
7im wrote:I think we have had too much speculation about BOINC vs. FAH already. Bringing it up again doesn't change the previous answers Rattledagger.

BOINC would not be faster. It's SMP support is to download 4 work units, one for each of the 4 processors, NOT to process a single work unit 4 times faster like FAH does. I don't see how Max WUs = 1 changes that or makes BOINC faster. :roll:
BOINC v6.3.xx supports both SMP and GPU-crunching, meaning there's no problem to run an application that example uses 5 cpu's. Projects can even specify they're using fractional cpu, example "use 1 GPU + 0.5 cpu".

This together with "max_wus_in_progress", that can be used to disable caching of wu's in a particular project, is fairly new features, not present during the "beta"-test back in 2005.

The only info posted by Pandegroup is "some issues", with no specifications of that these "issues" are...
And since some of the BOINC projects I've seen have to process a single work unit as many as 2 or 3 times to verify correct computations, it is 2 or 3 times more wasteful than FAH, because WUs only need to be processed once for FAH.
Hmm, that has this to do with whatever scheduling-needs FAH would have?

Various projects chooses the replication they needs to make sure the results is scientifically usable, this includes running some projects with 1 result/wu, just like FAH does, but also with AFAIK the max being 5 results/wu issued and needing 3 validated results/wu like LHC@home uses since they needs the results back ASAP.

Projects can even choose to use another fairly-new feature called "adaptive replication", a method that should give around 1.1 results/wu instead of 2 results/wu that is common if needs replication. Atleast one project uses this method, while SETI@home is actually thinking about this, and is currently testing-it on their beta-test.
Please list me the number of current projects supporting gpu crunching with BOINC? I know only one, and their not nearly as far as f@h is in client development and therefore scientific results.

Also please read the reply I posted above considering f@h's current approach vs the old uniprocessor approach boinc uses, it doesn't matter you specify x amounts of cpu's, their all just single uniprocessor clients working on their own core. That's not SMP.

Re: Suggestion

Posted: Thu Nov 13, 2008 7:25 pm
by Rattledagger
MtM wrote:Please list me the number of current projects supporting gpu crunching with BOINC? I know only one, and their not nearly as far as f@h is in client development and therefore scientific results.
There's only one currently available for public, with atleast 2 other projects working on CUDA/GPU, including one for Mac.
Also please read the reply I posted above considering f@h's current approach vs the old uniprocessor approach boinc uses, it doesn't matter you specify x amounts of cpu's, their all just single uniprocessor clients working on their own core. That's not SMP.
It wouldn't make sence for a project to specify more than 1 cpu if they've not got a multi-threaded application that can effectively use more than 1 cpu to speed-up crunching of a single wu... BOINC v6.3.xx handles the scheduling of GPU and cpu, including SMP-applications, but it's up to the individial projects to program the actual applications. Currently no projects has yet made SMP-application available for the public, but it's been used internally.

Re: Suggestion

Posted: Thu Nov 13, 2008 11:41 pm
by MtM
Rattledagger wrote:
MtM wrote:Please list me the number of current projects supporting gpu crunching with BOINC? I know only one, and their not nearly as far as f@h is in client development and therefore scientific results.
There's only one currently available for public, with atleast 2 other projects working on CUDA/GPU, including one for Mac.
Also please read the reply I posted above considering f@h's current approach vs the old uniprocessor approach boinc uses, it doesn't matter you specify x amounts of cpu's, their all just single uniprocessor clients working on their own core. That's not SMP.
It wouldn't make sence for a project to specify more than 1 cpu if they've not got a multi-threaded application that can effectively use more than 1 cpu to speed-up crunching of a single wu... BOINC v6.3.xx handles the scheduling of GPU and cpu, including SMP-applications, but it's up to the individial projects to program the actual applications. Currently no projects has yet made SMP-application available for the public, but it's been used internally.
If there are no projects active, you have no point to make.

Re: Suggestion

Posted: Fri Nov 14, 2008 5:41 am
by RMAC9.5
MtM wrote,
What you're describing is not what should and frankly what does happen, you're having problems which 99% of the donor community does not experience. So what you need to do imho, is open a thread about your issues and ask for help, and not blame the project for your particular configuration problems.
We seem to be having what I would call a classic "failure to communicate" situation.

First, while dial up users like myself are probably a small and growing smaller part of the donor community, I rather doubt that we have dropped all the way down to 1%. I will also argue that I actually represent all donors who don't have an "always connected" connection to the Internet.

Second, I am not complaining or blaming the project for my particular configuration problems. Acting on the assumption that the Pande group wants to maximize their donor productivity, I am suggesting a policy change that I believe will increase the total number of GPU WUs crunched and decrease the average time needed to download, process, and upload GPU WUs.

Re: Suggestion

Posted: Fri Nov 14, 2008 8:18 am
by MtM
RMAC9.5 wrote:MtM wrote,
What you're describing is not what should and frankly what does happen, you're having problems which 99% of the donor community does not experience. So what you need to do imho, is open a thread about your issues and ask for help, and not blame the project for your particular configuration problems.
We seem to be having what I would call a classic "failure to communicate" situation.

First, while dial up users like myself are probably a small and growing smaller part of the donor community, I rather doubt that we have dropped all the way down to 1%. I will also argue that I actually represent all donors who don't have an "always connected" connection to the Internet.

Second, I am not complaining or blaming the project for my particular configuration problems. Acting on the assumption that the Pande group wants to maximize their donor productivity, I am suggesting a policy change that I believe will increase the total number of GPU WUs crunched and decrease the average time needed to download, process, and upload GPU WUs.
Possible scenario: they want to make sure the clients foundation works well before going into other things, one of which might be setting up project ranges to run on fast hw which don't need the quick return times.
Plausible outcome: by the time they are ready for this, the group on dailup which is going to get smaller and smaller with time might have disolved completely?

As to your second part, what you say goes directly against the serial nature of the project at present with the state of clients/servers/cores. You will maybe cause a small, almost negliable increase in the number of wu's crunched but you will also introduce lag into a system which at present depends on return times rather then sheer volume.

Sorry about the failure to communicate, I got an autistic spectrum disorder it seems and it does get in the way of my social and communicative skills allot :)

Re: Suggestion

Posted: Fri Nov 14, 2008 4:26 pm
by 7im
The suggestion for WU caching has been a perpetual suggestion. There have been many threads on this topic already. Thank you for reminding the project that it is still a feature request, though growing less needed as more move to broadband connections. Unfortunately, this is not high on the priority list due to the limited resources at Pande Group, and the higher broadband usage over time which tends to resolve the problem moving forward.

Alternately, I would comment that the use of a dial-up connection with multiple high performance clients is counterintuitive, although I am sure the project appreciates the efforts you make to maximize your contributions.

Re: Suggestion

Posted: Sat Nov 15, 2008 10:06 am
by codysluder
The BOINC project has done an excellent job of selling their donors on the fact that the BOINC concepts are universally the best way to do things and those donors come here periodically to the to sell FAH on doing it their way. That seems to be happening here. How about we accept the fact that the systems are intentionally designed differently and stop trying to convince anyone of anything.

The original question was what about dial-up? There are two parts of the question.
1) How long the client needs to wait for a connection to be established. If the modem is configured to dial whenever a connection is needed, that time is minimized.
2) How long is the cpu idle waiting for the data to be transferred? For a moderately fast connection, the issue is unimportant. In my case, the CPU is idle for about 20 seconds every couple of days. You can't convince me that's worth making any system-wide changes. As general rule, a slow connection should select small WUs to maximize the ratio of processing time to data transfer time.

If the Pande Group decides to improve the software in this area, good. If not, then the best we can do is work with the recommended configurations that minimize wasted computing time. Additional arguing over it will not convince the software gurus of anything. 7im has already said the suggestion has been received.

Re: Suggestion

Posted: Sat Nov 15, 2008 7:29 pm
by MtM
codysluder wrote:The BOINC project has done an excellent job of selling their donors on the fact that the BOINC concepts are universally the best way to do things and those donors come here periodically to the to sell FAH on doing it their way. That seems to be happening here. How about we accept the fact that the systems are intentionally designed differently and stop trying to convince anyone of anything.

The original question was what about dial-up? There are two parts of the question.
1) How long the client needs to wait for a connection to be established. If the modem is configured to dial whenever a connection is needed, that time is minimized.
2) How long is the cpu idle waiting for the data to be transferred? For a moderately fast connection, the issue is unimportant. In my case, the CPU is idle for about 20 seconds every couple of days. You can't convince me that's worth making any system-wide changes. As general rule, a slow connection should select small WUs to maximize the ratio of processing time to data transfer time.

If the Pande Group decides to improve the software in this area, good. If not, then the best we can do is work with the recommended configurations that minimize wasted computing time. Additional arguing over it will not convince the software gurus of anything. 7im has already said the suggestion has been received.
Repeating arguments will not help either.

Re: Suggestion

Posted: Sun Nov 16, 2008 2:13 pm
by Ren02
codysluder wrote: 2) How long is the cpu idle waiting for the data to be transferred? For a moderately fast connection, the issue is unimportant. In my case, the CPU is idle for about 20 seconds every couple of days. You can't convince me that's worth making any system-wide changes. As general rule, a slow connection should select small WUs to maximize the ratio of processing time to data transfer time.
I have cable connection with 2Mbps download and 256Kbps upload. Don't know if you consider this moderate or agonizingly slow. ;)
The result files of A2 SMP WUs are ~25MB.
My connection to Stanford is usually 25-26KB/s so uploading takes about 16 minutes.
If I forget to turn off bittorrent for the upload period then the connection is just 6-7KB/s, that's about 1 hour of idle time. It takes me 18 hours to complete the WU.

The proportion of idle time is not that insignificant actually. Still, I don't think caching is the solution. If Stanford would download new WU before uploading the results of the previous one, I'd be pretty happy already. It's not going to happen though, because "it has been suggested before". ;)

Re: Suggestion

Posted: Wed Nov 19, 2008 5:29 am
by codysluder
MtM wrote:Repeating arguments will not help either.
Ren02 wrote:The proportion of idle time is not that insignificant actually. Still, I don't think caching is the solution. If Stanford would download new WU before uploading the results of the previous one, I'd be pretty happy already. It's not going to happen though, because "it has been suggested before". ;)
Such a pessimist. There is no advantage to repeatedly complaining about something that's already on the suggestion list, but that doesn't mean it is not going to happen. It just means that they're busy fixing the serious issues with the gpu and smp and ps3 clients and this doesn't get as high a priority. I'm confident that it'll happen someday, but not on a schedule that you can influence.

Re: Suggestion

Posted: Wed Nov 26, 2008 3:36 pm
by Mr.Nosmo
After reading this tread, mostly about different DC clients, I understand and accepts that PandaGroup have a lot to fix that are more important (getting the ATI GPU to use all the power in the cards, I hope), BUT I must say that it would be nice if just you could get a new WU before the old finished uploading, because I have the fastest internet-connection on offer here in Marbella, Spain, but it still takes me 12+ minutes to upload a SMP WU!
PandaGroup talks a lot about fast returns, and for a good reason, but they still ask us to run only 1 SMP WU on a Intel QuadCore - For me that is not perfect, because the Intel-design is #ยค%&! (only Dual Dual core until a few days ago when Core i7 was released) and the FSB-overhead there is to use all 4 cores is too big - Fx. a 2400MHz Q6600, running 2xSMP will finish in the same time as a 2000MHz will finish one. For me this mean that the "We want fast returns" is not looking at the reality, because this also means that a lot idle CPU is not used and a lot of extra science is "lost"!
In my "perfect" world and as it have been said before, please let us get a new WU as soon as the "old" have finished, so we can start working on the new one while the result is uploading and consider to official support that people are using 2xSMP clients & AffinityChanger to make more calculations (at least until most people are running on real Quad/Octo/Hex-core CPU's) - This way we can do more science in the same time!

Maybe I'm out of line, but if I look at reality in my line of work (logistics), then the company I work for have limited all our trucks to only drive max. 85KM/h (speed-limit is normal 80KM/h, but most trucks drives at the electronic-limit of 90KM/h), because we save 11% fuel (less CO2). Put in another way: When you overclock your PC and need to increase Vcore the heat/energy-use increases a lot more than the MHz-% - Scaled to the extreme: We might find a cure for cancer, but we died from global-warming! What can we learn from this? I think: Fast is not always the best!

Re: Suggestion

Posted: Wed Nov 26, 2008 7:56 pm
by Veix
So maybe we could discuss how to implement(with verification) "download new WU before uploading the results of the previous one" theory ? Or is it too inside info in PG.

Re: Suggestion

Posted: Wed Nov 26, 2008 11:36 pm
by 7im
Go ahead and discuss it. Having a good logic tree for the process would certainly make it easier for Pande Group to impliment, if they ever do get around to doing this. But again the priority is low because as broadband density and speeds increase, the small delays get smaller, so the problem tends to solve itself. And there are problems yet to solve that would have a much larger impact on the project, so they get higher priority.

Alternately, if an easy to impliment and use method were found, it might also get worked in during some other critical update. Have at it. ;)

Re: Suggestion [or how to eliminate slack time between WUs?]

Posted: Thu Nov 27, 2008 2:27 am
by shatteredsilicon
You're not supposed to be pre-loading units as a lot of people citing PG advisories will tell you, because "it's bad for the project".

But since we don't live in an ideal world - you can pre-load a WU by using 2 clients for each "resource" (CPU or GPU). You run the client with -oneunit and set up a process to monitor the current WU progress. When it hits some threshold (e.g. 99%), it fires up the secondary client and lets the previous one finish. The optimal threshold would need tuning according to your CPU/GPU speed and internet connection. Pretty easy to implement if you are running on Linux and are familiar with shell scripting.