Page 1 of 2
Lost WUsin Ubuntu 11.10; completed one that will not upload
Posted: Tue Feb 14, 2012 10:21 pm
by Agencyman
I came back to this area because I'm apparently incapable of graduating up the ladder.
I have just completed a 6903 in just under 2 days, (I just did the math and changed this from 2-3/4 days), and it won't upload. I'll include the text of the log at the bottom. The only thing I can think of is that I did a [cntrl + c] to wake up the terminal, which seemed to be stopped with no prompt. I did a " -send all" and it claims there are no finished units to send.
When this happened with a 6904 a week or so ago, I decided to do a clean new build of Ubuntu 11.10 on its own HDD and start over.
Can this be salvaged, or has the new WU I just started, overwritten the 'Work' folder?
Sorry forgot the log:
Edited to add:
Code: Select all
[19:20:42] Completed 250000 out of 250000 steps (100%)
Writing final coordinates.
Average load imbalance: 7.4 %
Part of the total run time spent waiting due to load imbalance: 3.1 %
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 20149.531 20149.531 100.0
5h35:49
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 1353.780 71.210 0.536 44.782
Thanx for Using GROMACS - Have a Nice Day
[19:21:01] DynamicWrapper: Finished Work Unit: sleep=10000
[19:21:11]
[19:21:11] Finished Work Unit:
[19:21:11] - Reading up to 121622496 from "work/wudata_01.trr": Read 121622496
[19:21:12] trr file hash check passed.
[19:21:12] - Reading up to 108766508 from "work/wudata_01.xtc": Read 108766508
[19:21:12] xtc file hash check passed.
[19:21:13] edr file hash check passed.
[19:21:13] logfile size: 208420
[19:21:13] Leaving Run
[19:21:13] - Writing 230770416 bytes of core data to disk...
[19:21:44] Done: 230769904 -> 222431631 (compressed to 3.3 percent)
[19:21:44] ... Done.
^C[19:26:42] ***** Got an Activate signal (2)
[19:26:42] Killing all core threads
Folding@Home Client Shutdown.
bruce@bruce-desktop:~/folding$ ./fah6 -smp -send all
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
24 cores detected
--- Opening Log file [February 14 19:26:58 UTC]
# Linux SMP Console Edition ###################################################
###############################################################################
Folding@Home Client Version 6.34
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/bruce/folding
Executable: ./fah6
Arguments: -smp -send all -smp 24 -bigadv -verbosity 9
[19:26:58] - Ask before connecting: Yes
[19:26:58] - User name: Agencyman (Team 196420)
[19:26:58] - User ID: 7AFBDD71663B501F
[19:26:58] - Machine ID: 3
[19:26:58]
[19:26:58] Loaded queue successfully.
[19:26:58] Attempting to return result(s) to server...
[19:26:58] Trying to send all finished work units
[19:26:58] + No unsent completed units remaining.
[19:26:58] ***** Got a SIGTERM signal (15)
[19:26:58] Killing all core threads
Folding@Home Client Shutdown.
Mod Edit: Changed Quote Tags To Code Tags - PantherX
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Wed Feb 15, 2012 4:00 am
by PinHead
Need more of the log from the client in question, between code tags '['code']'log goes here'['/code']' ---- no quotes, for display purposes only
time gaps make detecting the error difficult
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Wed Feb 15, 2012 5:34 am
by bruce
The normal sequence of messages goes like this.
Code: Select all
Done: 4798915 -> 4745650 (compressed to 98.8 percent)
... Done.
- Shutting down core
Folding@home Core Shutdown: FINISHED_UNIT
CoreStatus = 64 (100)
Sending work to server
The WU is compressed and prepared for uploading until the message "CoreStatus = 64 (100)" is received. During that time, the process
cannot be interrupted. The time required to compress depends on the amount of data involved, on the amount of free RAM your system has at that time, and on the type of filesystem being used on the harddisk. (Using the default mount options for ext4, it is a MUCH more significant delay than using the default mount options for ext3.) If the process is interrupted during this time, the WU is lost.
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Wed Feb 15, 2012 9:55 am
by Agencyman
The WU is compressed and prepared for uploading until the message "CoreStatus = 64 (100)" is received. During that time, the process cannot be interrupted
That explains it all, --and so it is lost.. I had read long ago that using [cntrl +c] would merely pause the WU. When I have done so, the client would simply resume when the computer was available again.
Thank you for this, I will have to wait it out at the end of the current WU, another 6903 only at 25%, -that will take about 37 hrs. I will post the time it requires to get from 100% of steps, to finish the uploading process, if that will help any other inexperienced folks avoid such a catastrophe.
Bruce Hinton
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 7:31 pm
by Agencyman
Been sitting here with no change for most of an hour. What now???
This is where I decided to end the "lost one" the other day:
Bruce
Code: Select all
Launch directory: /home/bruce/folding
Executable: ./fah6
Arguments: -smp 24 -bigadv -verbosity 9
[00:48:20] - Ask before connecting: Yes
[00:48:20] - User name: Agencyman (Team 196420)
[00:48:20] - User ID: 7AFBDD71663B501F
[00:48:20] - Machine ID: 3
[00:48:20]
[00:48:20] Loaded queue successfully.
[00:48:20]
[00:48:20] + Processing work unit
[00:48:20] Core required: FahCore_a5.exe
[00:48:20] - Autosending finished units... [February 15 00:48:20 UTC]
[00:48:20] Core found.
[00:48:20] Trying to send all finished work units
[00:48:20] + No unsent completed units remaining.
[00:48:20] - Autosend completed
[00:48:20] Working on queue slot 02 [February 15 00:48:20 UTC]
[00:48:20] + Working ...
[00:48:20] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 02 -np 24 -checkpoint 15 -verbose -lifeline 2260 -version 634'
[00:48:20]
[00:48:20] *------------------------------*
[00:48:20] Folding@Home Gromacs SMP Core
[00:48:20] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[00:48:20]
[00:48:20] Preparing to commence simulation
[00:48:20] - Looking at optimizations...
[00:48:20] - Files status OK
[00:48:24] - Expanded 57247383 -> 71846524 (decompressed 50.4 percent)
[00:48:24] Called DecompressByteArray: compressed_data_size=57247383 data_size=71846524, decompressed_data_size=71846524 diff=0
[00:48:24] - Digital signature verified
[00:48:24]
[00:48:24] Project: 6903 (Run 3, Clone 15, Gen 51)
[00:48:24]
[00:48:25] Assembly optimizations on if available.
[00:48:25] Entering M.D.
[00:48:31] Using Gromacs checkpoints
[00:48:35] Mapping NT from 24 to 24
[00:48:41] Resuming from checkpoint
[00:48:54] Verified work/wudata_02.log
[00:48:55] Verified work/wudata_02.trr
[00:48:55] Verified work/wudata_02.xtc
[00:48:55] Verified work/wudata_02.edr
[00:48:56] Completed 12625 out of 250000 steps (5%)
[01:14:14] Completed 15000 out of 250000 steps (6%)
[01:40:54] Completed 17500 out of 250000 steps (7%)
[02:07:38] Completed 20000 out of 250000 steps (8%)
[02:34:19] Completed 22500 out of 250000 steps (9%)
[03:00:58] Completed 25000 out of 250000 steps (10%)
[03:27:42] Completed 27500 out of 250000 steps (11%)
[03:54:24] Completed 30000 out of 250000 steps (12%)
[04:20:57] Completed 32500 out of 250000 steps (13%)
[04:47:20] Completed 35000 out of 250000 steps (14%)
[05:13:39] Completed 37500 out of 250000 steps (15%)
[05:40:01] Completed 40000 out of 250000 steps (16%)
[06:06:26] Completed 42500 out of 250000 steps (17%)
[06:32:45] Completed 45000 out of 250000 steps (18%)
[06:59:33] Completed 47500 out of 250000 steps (19%)
[07:26:13] Completed 50000 out of 250000 steps (20%)
[07:52:41] Completed 52500 out of 250000 steps (21%)
[08:19:06] Completed 55000 out of 250000 steps (22%)
[08:44:49] Completed 57500 out of 250000 steps (23%)
[09:10:30] Completed 60000 out of 250000 steps (24%)
[09:36:41] Completed 62500 out of 250000 steps (25%)
[10:03:13] Completed 65000 out of 250000 steps (26%)
[10:29:54] Completed 67500 out of 250000 steps (27%)
[10:55:54] Completed 70000 out of 250000 steps (28%)
[11:21:57] Completed 72500 out of 250000 steps (29%)
[11:47:58] Completed 75000 out of 250000 steps (30%)
[12:14:02] Completed 77500 out of 250000 steps (31%)
[12:42:00] Completed 80000 out of 250000 steps (32%)
[13:08:55] Completed 82500 out of 250000 steps (33%)
[13:36:33] Completed 85000 out of 250000 steps (34%)
[14:03:10] Completed 87500 out of 250000 steps (35%)
[14:29:03] Completed 90000 out of 250000 steps (36%)
[14:55:57] Completed 92500 out of 250000 steps (37%)
[15:22:10] Completed 95000 out of 250000 steps (38%)
[15:47:54] Completed 97500 out of 250000 steps (39%)
[16:13:35] Completed 100000 out of 250000 steps (40%)
[16:39:13] Completed 102500 out of 250000 steps (41%)
[17:04:45] Completed 105000 out of 250000 steps (42%)
[17:30:07] Completed 107500 out of 250000 steps (43%)
[17:55:42] Completed 110000 out of 250000 steps (44%)
[18:21:12] Completed 112500 out of 250000 steps (45%)
[18:46:39] Completed 115000 out of 250000 steps (46%)
[19:12:11] Completed 117500 out of 250000 steps (47%)
[19:37:47] Completed 120000 out of 250000 steps (48%)
[20:03:19] Completed 122500 out of 250000 steps (49%)
[20:29:42] Completed 125000 out of 250000 steps (50%)
[20:56:09] Completed 127500 out of 250000 steps (51%)
[21:23:13] Completed 130000 out of 250000 steps (52%)
[21:50:22] Completed 132500 out of 250000 steps (53%)
[22:16:51] Completed 135000 out of 250000 steps (54%)
[22:43:33] Completed 137500 out of 250000 steps (55%)
[23:13:30] Completed 140000 out of 250000 steps (56%)
[23:40:10] Completed 142500 out of 250000 steps (57%)
[00:07:03] Completed 145000 out of 250000 steps (58%)
[00:33:44] Completed 147500 out of 250000 steps (59%)
[01:00:32] Completed 150000 out of 250000 steps (60%)
[01:27:30] Completed 152500 out of 250000 steps (61%)
[01:54:19] Completed 155000 out of 250000 steps (62%)
[02:20:39] Completed 157500 out of 250000 steps (63%)
[02:46:52] Completed 160000 out of 250000 steps (64%)
[03:13:13] Completed 162500 out of 250000 steps (65%)
[03:39:41] Completed 165000 out of 250000 steps (66%)
[04:06:08] Completed 167500 out of 250000 steps (67%)
[04:32:33] Completed 170000 out of 250000 steps (68%)
[04:58:51] Completed 172500 out of 250000 steps (69%)
[05:24:55] Completed 175000 out of 250000 steps (70%)
[05:51:10] Completed 177500 out of 250000 steps (71%)
[06:17:36] Completed 180000 out of 250000 steps (72%)
[06:44:01] Completed 182500 out of 250000 steps (73%)
[07:10:29] Completed 185000 out of 250000 steps (74%)
[07:36:30] Completed 187500 out of 250000 steps (75%)
[08:02:30] Completed 190000 out of 250000 steps (76%)
[08:28:39] Completed 192500 out of 250000 steps (77%)
[08:55:15] Completed 195000 out of 250000 steps (78%)
[09:22:06] Completed 197500 out of 250000 steps (79%)
[09:49:53] Completed 200000 out of 250000 steps (80%)
[10:16:20] Completed 202500 out of 250000 steps (81%)
[10:42:46] Completed 205000 out of 250000 steps (82%)
[11:09:10] Completed 207500 out of 250000 steps (83%)
[11:35:13] Completed 210000 out of 250000 steps (84%)
[12:01:08] Completed 212500 out of 250000 steps (85%)
[12:27:03] Completed 215000 out of 250000 steps (86%)
[12:53:40] Completed 217500 out of 250000 steps (87%)
[13:21:03] Completed 220000 out of 250000 steps (88%)
[13:48:18] Completed 222500 out of 250000 steps (89%)
[14:15:14] Completed 225000 out of 250000 steps (90%)
[14:41:18] Completed 227500 out of 250000 steps (91%)
[15:10:46] Completed 230000 out of 250000 steps (92%)
[15:37:18] Completed 232500 out of 250000 steps (93%)
[16:03:24] Completed 235000 out of 250000 steps (94%)
[16:29:28] Completed 237500 out of 250000 steps (95%)
[16:55:43] Completed 240000 out of 250000 steps (96%)
[17:22:01] Completed 242500 out of 250000 steps (97%)
[17:47:59] Completed 245000 out of 250000 steps (98%)
[18:14:22] Completed 247500 out of 250000 steps (99%)
[18:40:34] Completed 250000 out of 250000 steps (100%)
[18:40:54] DynamicWrapper: Finished Work Unit: sleep=10000
[18:41:04]
[18:41:04] Finished Work Unit:
[18:41:04] - Reading up to 121622496 from "work/wudata_02.trr": Read 121622496
[18:41:04] trr file hash check passed.
[18:41:04] - Reading up to 108766700 from "work/wudata_02.xtc": Read 108766700
[18:41:05] xtc file hash check passed.
[18:41:05] edr file hash check passed.
[18:41:05] logfile size: 208488
[18:41:05] Leaving Run
[18:41:07] - Writing 230770676 bytes of core data to disk...
[18:41:54] Done: 230770164 -> 222430633 (compressed to 3.3 percent)
[18:41:54] ... Done.
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 7:35 pm
by Agencyman
Well OK it must have seen that message; now says 'core shutdown' and FINISHED UNIT. Maybe it will ask to send now.
Bruce
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 7:43 pm
by bruce
Agencyman wrote:Well OK it must have seen that message; now says 'core shutdown' and FINISHED UNIT. Maybe it will ask to send now.
Bruce
There should be no need to "äsk it to send now" nor is it even possible until the message "CoreStatus = 64 (100)" is issued. Everything should continue without intervention.
If the WU is permanently hung, we need to figure out why. If it takes an hour or two for the "CoreStatus = 64 (100)" message to be issued because of something one your system and you're getting impatient, that's not necessarily a permanently hung WU. I don't know an easy way to tell the difference.
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 7:55 pm
by Agencyman
And so it did, and after 50 min, the work has gone up. Hopefully.
To the other Bruce, I appreciate your advice, however harsh the news was.
I wonder why these steps take so long when the folding just roars on through?
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 7:58 pm
by Agencyman
FYI, it is my doing that it has to ask, I reply yes to the config question about asking before sending.
Why it took so long is a mystery, but it was enough to tempt me to pull the plug on the last one and a 6904 before with [cntrl +c]. So much good work down the drain.
At least this one wasn't lost!!!
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 8:02 pm
by bruce
Agencyman wrote:I wonder why these steps take so long when the folding just roars on through?
See my previous post on the subject. Probably because you have and ext4 filesystem but you never confirmed if that applies to you. There are some possible fixes for that sort of problem.
Reconfigure your client to NOT ask for permission to upload. In most systems there's no reason for that to ever be set, even if the system has a manual modem connection.
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 8:37 pm
by Agencyman
I had set it to ask, just so that if I wanted to drop down from -bigadv I'd run config again, but apparently giving it permission to upload made it decide it had permission to download, so another 6903 just fired up. Oh well. Progress
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 8:44 pm
by Agencyman
I'll work on the ext3v.4 thing. I simply put in an extra drive and let it build a single working partition. There is probably a smal sys. partition, but no user sized ones.
Bruce
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 9:20 pm
by 7im
Agencyman wrote:I had set it to ask, just so that if I wanted to drop down from -bigadv I'd run config again, but apparently giving it permission to upload made it decide it had permission to download, so another 6903 just fired up. Oh well. Progress
Use prompt and -oneunit. Client will prompt when done, and then shut down, will not download a new work unit. You will have to manually restart the client to continue folding, but since you are already there to answer the prompt, restarting isn't a big deal.
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Thu Feb 16, 2012 10:48 pm
by Agencyman
Much better idea. I can change that by pausing it now. Then I can see how the finishing and uploading go. That could be the solution. Not to mention quick upload if I'm on a job somewhere...
Thx!,
Bruce
Re: Lost WUsin Ubuntu 11.10; completed one that will not upl
Posted: Fri Feb 17, 2012 5:46 pm
by Adak
Agencyman wrote:I'll work on the ext3v.4 thing. I simply put in an extra drive and let it build a single working partition. There is probably a smal sys. partition, but no user sized ones.
Bruce
You need to re-install Linux, and don't accept the default install (either one of them). Select "advanced mode", and delete the Linux partitions you are now using for folding. Select make a new partition, and be sure to move off from the default ext4 file system, to ext3, before you select the next partition. Swap partition can be left with it's default file system (which is neither ext3 or ext4).