Page 9 of 10

Re: New Assignment Server feedback/problem

Posted: Fri Oct 31, 2014 5:06 pm
by Breach
I have just received a core 17:

Code: Select all

16:05:49:WU01:FS01:Connecting to 171.67.108.200:80
16:05:50:WU01:FS01:Assigned to work server 171.67.108.52
16:05:50:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.67.108.52
16:05:50:WU01:FS01:Connecting to 171.67.108.52:8080
16:05:52:WU01:FS01:Downloading 1.53MiB
16:05:53:WU01:FS01:Download complete
16:05:53:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:562 clone:3 gen:2 core:0x17 unit:0x0000000b6652edc45399ec2237cfa30d
As evident it comes from a different WS: 171.67.108.52. So not sure whether it's a problem of the AS, or simply 171.64.65.105 is out of Core 17 WUs.

Re: New Assignment Server feedback/problem

Posted: Fri Oct 31, 2014 5:21 pm
by Joe_H
According to the Project Summaries, you should not be expecting to get Core_17 WU's from 171.64.65.105. Just Core_15 projects show up as being from that WS.

Re: New Assignment Server feedback/problem

Posted: Fri Oct 31, 2014 7:22 pm
by Breach
Thanks. So the next logical question is whether the AS assigning us to this WS considered expected behaviour?

Re: New Assignment Server feedback/problem

Posted: Fri Oct 31, 2014 7:48 pm
by Joe_H
What is expected is that if the AS can not connect a Windows computer with a GPU requesting a new WU to a WS with Core_17 work, then it will get assigned to 171.64.65.105 and be assigned a Core_15 WU. Linux GPU WU requests should get an "Empty work server" message if the AS can not connect to a WS with available Core_17 WU's.

Re: New Assignment Server feedback/problem

Posted: Fri Oct 31, 2014 7:59 pm
by JimF
Joe_H wrote:What is expected is that if the AS can not connect a Windows computer with a GPU requesting a new WU to a WS with Core_17 work, then it will get assigned to 171.64.65.105 and be assigned a Core_15 WU.
I just finished my sole Core_17, and was assigned another Core_15, so I guess it can not connect to a WS with Core_17 work.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 10:50 am
by Breach
JimF wrote:
Joe_H wrote:What is expected is that if the AS can not connect a Windows computer with a GPU requesting a new WU to a WS with Core_17 work, then it will get assigned to 171.64.65.105 and be assigned a Core_15 WU.
I just finished my sole Core_17, and was assigned another Core_15, so I guess it can not connect to a WS with Core_17 work.
After so much time on FAH I have just discovered this page ;-) :

http://fah-web.stanford.edu/pybeta/serverstat.html

According to that: 171.67.108.52 is 'full' (in full operation, should be giving out WUs), but is then marked as 'Blue' ("Blue - if the AS has decided not to assign to that machine, eg. the AS thinks it is down or out of jobs (blue means iced)". The WU stats for this WS are null - guess it's either out of work or there's another reason the AS considers it not available...

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 11:15 am
by JimF
Breach wrote:According to that: 171.67.108.52 is 'full' (in full operation, should be giving out WUs), but is then marked as 'Blue' ("Blue - if the AS has decided not to assign to that machine, eg. the AS thinks it is down or out of jobs (blue means iced)". The WU stats for this WS are null - guess it's either out of work or there's another reason the AS considers it not available...
Good find. But I now have two core_17s from that work server, even though it still shows as "blue". I will let PG figure it all out. It seems to me possible though that as they transition to core_18, they may have shortages of 17s and have to fill in with the 15s.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 11:27 am
by heikosch
To my mind the problem is that the available WU count is 0 for 171.67.108.52. Regarding to the documentation the color changes to blue when it runs low on available WUs. So there´s always a change to get a WU.

Heiko

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 1:44 pm
by Gary480six
To my mind, what has never been explained, is why a month ago the Maxwell cards were being assigned the P13000 and P13001 work units - and completing them just fine.

Then, changes were made to the Assignment Server.... and suddenly, the Maxwell cards could not complete the P13000 work.

Is somebody going to address That issue?

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 1:58 pm
by 7im
Gary480six wrote:snip

Is somebody going to address That issue?
Two issues actually. AS updates and Maxwell support. On the AS updates, no. No one is going to explain it in any more detail than already given. On Maxwell support, new GPU devices and new chip architectures are highly dependent on functional (for computing, not gaming) drivers from the manufacturers, as stated in the install guides. It takes time for both the OEMs and for fah to work out the kinks on new GPUs, especially when that may not be ther focus right now.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 2:38 pm
by kimben777
What is the deal with the 171.67.108.52 server showing no or very low wu's since Thursday morning? How long does it take to fill it back up with wu's?

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 3:13 pm
by bruce
kimben777 wrote:What is the deal with the 171.67.108.52 server showing no or very low wu's since Thursday morning? How long does it take to fill it back up with wu's?
Every time somebody successfully completes a WU, a new WU is generated so anybody who returns a WU for Core_17 and is assigned a WU for Core_15 is helping to fill the server until the point that it turns non-blue and starts assigning again. Thus a server can alternate between having enough WUs to assign them and not having enough. That process is automatic (i.e.-works unattended).

A separate issue is whether science NEEDS more WUs. FAH does not assign "busy work" but insists that assignments are actually needed by science, so no answer can be given that doesn't consider the science.

At some point every project reaches the stage where they have "enough" completed WUs to draw the necessary scientific conclusions and the project is ended and at that point, no new WUs will be added. [You and I have no way of knowing when that's about to happen.}

On the other hand, a project may need a lot more WUs to be completed and they can be added by the PI -- after digesting the completed WUs (and perhaps moving data off-line) to make room for newly generated WUs. [In that case, your question is a good one!] I don't have a good answer, but I do know it does take a fair amount of processing time and a certain amount of manual work.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 3:25 pm
by Gary480six
7im wrote: snip

On Maxwell support, new GPU devices and new chip architectures are highly dependent on functional (for computing, not gaming) drivers from the manufacturers, as stated in the install guides. It takes time for both the OEMs and for fah to work out the kinks on new GPUs, especially when that may not be their focus right now.
7im - I would understand this issue better if the new Maxwell cards Never worked for Folding on Windows systems. But they were stable and producing work on the P13000 and P13001 work units for several Weeks before everything blew up.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 6:16 pm
by 7im
Do you know about some Kepler GPUs needing to use the older 327.xx driver to fold at full speed? The newer drivers fold just fine also, just slower.

The ability to fold or not fold a single project is not a good indicator. Neither is a newer driver an indicator of a better driver. Fah is very dependent on third party hardware and software that is out of their control.
I don't know if the AS changes were related or not, but as I said, that won't be explained either. But an AS only routes a connection to a WS, and has no affect on the fahcore or work unit data. So unlikely to be the cause.

Re: New Assignment Server feedback/problem

Posted: Sat Nov 01, 2014 8:30 pm
by heikosch
7im wrote:Do you know about some Kepler GPUs needing to use the older 327.xx driver to fold at full speed? The newer drivers fold just fine also, just slower.

The ability to fold or not fold a single project is not a good indicator. Neither is a newer driver an indicator of a better driver. Fah is very dependent on third party hardware and software that is out of their control.
I don't know if the AS changes were related or not, but as I said, that won't be explained either. But an AS only routes a connection to a WS, and has no affect on the fahcore or work unit data. So unlikely to be the cause.
When the new AS was activated (just over night for me!) my GTX 750Ti began to throw errors with P1300x. Core 0x17 Version didn´t change and I didn´t change nVidia driver nor installed other Software or updates.
Maybe they changed not only the AS but independently the content of the P1300x WUs. Shortly after that they stopped to assign P1300x to Maxwell GPUs.

Heiko

PS: You need nVidia 344.11 for GTX970/980, better 344.16 or 344.48 but 337.88 is ok for GTX 750Ti. I didn´t remember if 327.23 works with GTX 750Ti.