Page 1 of 2
Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 1:33 am
by Markus_Laker
I have a reasonably powerful PC (12 Threadripper cores, 24 threads), but a pitifully slow Internet connection that I can't upgrade because of my location. A couple of recent WUs have produced 70MB or 90MB of results. I get most of the way through an upload, and then the server times out after an hour, forcing me to start the upload over and over again. It never succeeds, and so I can never upload the results.
Code: Select all
01:12:43:WU00:FS00:Upload 88.91%
01:12:51:WU00:FS00:Upload 89.11%
01:12:57:WU00:FS00:Upload 89.25%
01:13:05:WU00:FS00:Upload 89.45%
01:13:06:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
01:13:07:WU00:Sending unit results: id:00 state:SEND error:NO_ERROR project:17219 run:1068 clone:1 gen:0 core:0xa7 unit:0x0000000100000000000043430000042c
01:13:07:WU00:Uploading 91.67MiB to 128.252.203.11
01:13:07:WU00:Connecting to 128.252.203.11:8080
01:13:14:WU00:Upload 0.20%
01:13:20:WU00:Upload 0.34%
I've repeatedly had to discard results manually, which wastes my machine's time and energy and delays the science. And the repeated attempts to upload results waste the small amount of upload bandwidth I do have -- which is badly needed during lockdown.
Is it possible to stop the collection server from timing out after an hour?
Failing that, is it possible to restrict my client so that it won't download WUs that will produce more than 10 or 20MB of results, so that I can actually upload them within the hour permitted to me?
Is it possible to avoid WUs that use Core A7? From reading around, I understand that that's the core that tends to produce large results.
One thing I'm trying now is to delete my one 20-thread CPU slot (yep, another big WU wastefully discarded) and replace it with two 10-thread slots in the hope that each slot will get smaller jobs. I get fewer PPD that way, and I know that F@H prefers to have one monster slot rather than several smaller ones, but I don't know what else to try.
Thanks for any ideas you can come up with,
Markus
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 9:33 am
by ajm
Maybe hotspotting?
And I heard that, in the past, with dial-up modems, there was a way to get only smaller WUs. You had to add an Expert option "max-packet-size" with value "SMALL". But I don't know if this is still in use.
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 11:44 am
by gunnarre
There has been an advanced option called "max-packet-size" which could be set to "small", "normal" or "big", but I don't think this option is supported anymore in the newer cores. Might be worth trying?
You can blacklist particular servers by IP address if you want to avoid the servers running A7 core projects. You can do this in your local machine's firewall or your router. Search for "GRO_A7" here
https://apps.foldingathome.org/serverstats
and blacklist the IP addresses in question. A few servers (one at time of writing) have both A8 and A7 projects on them. You probably have to check back on that page periodically to make sure you're not blacklisting GPU or A8 projects in future.
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 3:16 pm
by Joe_H
The option max-packet-size is still supported, its default value of normal is for return file sizes of up to 25 MB. The main issue we have been running into is researchers who have not set up projects correctly on the servers to require the "big" parameter when the WU takes up that much size on return.
The issue is not whether the project use Core_A7, _A8 or the GPU Core_22, but how big the resulting data file becomes after the processing is completed. I will post a reminder to the person running this project.
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 3:24 pm
by gunnarre
So no user intervention needed once that has been fixed - thanks.
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 5:43 pm
by Markus_Laker
25MB in one hour would be achievable, even when there's a video call going on elsewhere in the house. Many thanks for your help, everyone. I appreciate it!
Re: Recent WUs are too large to upload
Posted: Thu Feb 18, 2021 6:03 pm
by Joe_H
The setting should be in place now to prevent this project from being assigned to clients with the default setting. For those with faster connections, setting 'max-packet-size' to 'big' should still get you these COVID related WUs.
I checked my logs, last time I got WUs from this project 17219 was a week and a half ago as a beta tester. I did not have any problems uploading, though it did take about 20 minutes over my DSL connection. That is something I try to schedule for the early AM when I am asleep as much as possible.
Re: Recent WUs are too large to upload
Posted: Tue Feb 23, 2021 11:58 pm
by bruce
If you're folding with your CPU (FAhCore A7 or A8) there's another option.
(I'm assuming you've changed the POWER setting to FULL.) There is a wide variety of proteins being folded with a wide range of number of atoms.
See
https://apps.foldingathome.org/psummary
There are several different reasons for the size of the upload package, but one of them is the simple number of atoms. Setting preferences that give you smaller proteins would be a place to start.
One thing to try would be to divide up your CPU into several independent "slots" I would not recommend ever using a slot with a single CPU but >=2 might work. Starting from your (12 Threadripper cores, 24 threads), you might try 3 or 4 slots of 6 cores each, perhaps leaving a few unused. Fewer CPUs per slot tends to be assigned smaller proteins and they tend to end up with smaller upload packages.
This is probably not the best way to maximize your PPD, but you need to experiment and see what works best for you.
Re: Recent WUs are too large to upload
Posted: Wed Feb 24, 2021 11:21 am
by Markus_Laker
Thanks, Bruce.
Unfortunately, the problem has recurred with two slots of ten threads each:
Code: Select all
10:03:40:WU04:FS01:Sending unit results: id:04 state:SEND error:NO_ERROR project:17219 run:2652 clone:0 gen:2 core:0xa7 unit:0x00000000000000020000434300000a5c
10:03:40:WU04:FS01:Uploading 83.54MiB to 128.252.203.11
There's no way my puny ADSL connection can upload 83MiB in an hour. I'm going to have to discard that work unit just to unclog my system. Joe_H, could you have another word with the person running this project, please?
Meanwhile, I'll reconfigure my slots again. I'll see what happens with five slots of four threads each.
Re: Recent WUs are too large to upload
Posted: Wed Feb 24, 2021 11:54 am
by Neil-B
You need to test it of course but there is a danger that having more hopefully smaller WUs running/completing and needing uploading may actually end up needing a greater total upload over time than the single big ones ... You might actually need just to run one or two small slots and not fully utilise your folding power.
It might be quicker and mean less dumped WUs if you start with say a single 4core slot and let that run and see if your network capacity can handle it before trying multiple larger ones.
Re: Recent WUs are too large to upload
Posted: Wed Feb 24, 2021 6:19 pm
by bruce
Were those uploads "clean" or did they contain a multitude of error reports.
If the former, I presume that all completed WUs are over 80 MB and a change to the WU is called for.
If the latter, what can we do to identify why you're getting so many errors.
If neither of the above, perhaps it's because the slot is set to run on idle and there were lots of cycles of Pause/Resume which also causes the log files to grow.
Re: Recent WUs are too large to upload
Posted: Thu Feb 25, 2021 4:57 pm
by Markus_Laker
And here comes a third one from the same project that I'll need to discard:
Code: Select all
15:50:16:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:17219 run:2899 clone:1 gen:2 core:0xa7 unit:0x00000001000000020000434300000b53
15:50:16:WU02:FS01:Uploading 83.54MiB to 128.252.203.11
Neil-B, that's an interesting point, but this machine's a couple of years old now. I've been folding more or less flat-out since I bought it, and my Internet connection seems to cope in general, until these 80+MiB uploads turn up.
Bruce, I'm not absolutely sure I understand your question. Large uploads always fail, but it's always because the upload server times out after an hour:
Code: Select all
16:49:56:WU02:FS01:Upload 97.63%
16:50:02:WU02:FS01:Upload 97.78%
16:50:10:WU02:FS01:Upload 98.00%
16:50:16:WU02:FS01:Upload 98.15%
16:50:18:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
16:50:18:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:17219 run:2899 clone:1 gen:2 core:0xa7 unit:0x00000001000000020000434300000b53
16:50:18:WU02:FS01:Uploading 83.54MiB to 128.252.203.11
16:50:18:WU02:FS01:Connecting to 128.252.203.11:8080
16:50:25:WU02:FS01:Upload 0.22%
16:50:31:WU02:FS01:Upload 0.37%
How frustrating is that?
It'd be nice if the timeout mechanism could be a bit cleverer and not close the connection if the upload was obviously still making progress.
I've not interrupted folding at all: the machine has been folding flat-out for the last 20 hours, without interruption, and that's enough time for several WUs per slot, even with only four threads per slot. Plenty of other, smaller uploads have succeeded in that time.
Now, we know that Joe_H asked the project owner to set the "big" parameter on these WUs. I guess it's possible that the project owner has done so, but that the change didn't take effect on WUs that were already queued up, and we're still working through those. Or it's possible that the project owner hasn't quite got round the making the necessary change yet, which would be annoying.
Re: Recent WUs are too large to upload
Posted: Thu Feb 25, 2021 5:06 pm
by Joe_H
I have asked the researcher to check up on this, the setting was supposed have gone in. Possibly the setting did not take, or it was removed accidentally.
Re: Recent WUs are too large to upload
Posted: Thu Feb 25, 2021 5:08 pm
by Neil-B
I think what bruce was getting at wasn't the upload failing but questioning if during the course of processing these larger upload wus there were any errors that the client/core managed ... When an error occurs mid wu and the core can correct it then the error reports can significantly inflate the size of the final package ... Pausing a wu many times can sometimes have the same effect.
If it is always just this one project (which to be honest has a very large atom count and a very large base credit which means it may well just be naturally big) then it is really waiting for Joe_H's message to hopefully get the project flagged - and yes it may be that there are some pre flagged wus around ... Making the changes may well not be a simple change (it depends on the server admin processes/procedures and availability of researcher) but someone will get to it as soon as they can (if they can).
Re: Recent WUs are too large to upload
Posted: Thu Feb 25, 2021 5:17 pm
by Neil-B
I'd check what size my uploads for that project are but my server is down for rebuild at the moment so haven't got access to those logs ... If someone else spots this ad is running 17219 wus then perhaps they could confirm the upload sizes they are seeing?