Page 1 of 1

No Bonus for 6900

Posted: Sun Nov 28, 2010 6:32 pm
by stephen123
I complete my first 6900 unit in a similar time to my usual for bigadv units, but received no bonus. I got 8,955 points. My previous bigadv unit was slightly slower and earned 65,372.

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 6:46 pm
by P5-133XL
Check to see if the correct passkey is still configured in the client. Also, you can stop getting bonus if, for any reason, you are not returning 80% valid and on time WU's.

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 7:12 pm
by stephen123
It may be the 80% issue. It depends what time span the 80% is calculated over. I did have a series of units fail while upgrading my computer. Usually, I get about 1/3 chance of FAH unit failure if I reboot. But it's been higher recently and I was rebooting a lot while upgrading drives and memory. In hind sight, I suppose I should have stopped FAH for a few days while upgrading, but I wasn't actually aware that unit failure was harmful to FAH. I guess I had not thought through the statistical effect and was just thinking that the system is designed to handle it.

Do you know what time span the 80% is calculated over? How does one recover after falling below 80%? Just rise above 80%? Or is it 10 consecutive units again?

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 7:21 pm
by ChelseaOilman
According to the WU database you didn't receive bonus points because you exceeded the preferred deadline of 4 days for a p6900 WU.

Days taken to complete WU: 4.75

Hi stephen123 (team 1971),
Your WU (P6900 R10 C11 G1) was added to the stats database on 2010-11-28 07:05:08 for 8955 points of credit.

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 7:48 pm
by stephen123
OK, thanks. That means the unit downloaded, ran, failed, started over from scratch and ran again without acquiring a new unit.

I'm including a part of my log in this post, because the failure mode does not look familiar to me:

Code: Select all

[20:50:11] Completed 177500 out of 250000 steps  (71%)
[21:30:33] Completed 180000 out of 250000 steps  (72%)
[21:54:36] ***** Got a SIGTERM signal (15)
[21:54:36] Killing all core threads

Folding@Home Client Shutdown.


--- Opening Log file [November 25 21:55:45 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/stephen/Library/Folding@home
Executable: /usr/local/fah/fah6
Arguments: -smp 8 -verbosity 9 -bigadv 

[21:55:45] - Ask before connecting: No
[21:55:45] - User name: stephen123 (Team 1971)
[21:55:45] - User ID: XXXXXXXXXX
[21:55:45] - Machine ID: 1
[21:55:45] 
[21:55:46] Loaded queue successfully.
[21:55:46] 
[21:55:46] - Autosending finished units... [21:55:46][21:55:46] + Processing work unit
Trying to send all finished work units
[21:55:46] Core required: FahCore_a3.exe
[21:55:46] Core found.
[21:55:46] + No unsent completed units remaining.
[21:55:46] - Autosend completed
[21:55:46] Working on queue slot 01 [November 25 21:55:46 UTC]
[21:55:46] + Working ...
[21:55:46] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -verbose -lifeline 80 -version 629'

[21:55:46] 
[21:55:46] *------------------------------*
[21:55:46] Folding@Home Gromacs SMP Core
[21:55:46] Version 2.22 (May 7 2010)
[21:55:46] 
[21:55:46] Preparing to commence simulation
[21:55:46] - Looking at optimizations...
[21:55:46] - Files status OK
[21:55:49] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:55:49] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:55:49] - Digital signature verified
[21:55:49] 
[21:55:49] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:55:49] 
[21:55:50] Assembly optimizations on if available.
[21:55:50] Entering M.D.
[21:55:56] Using Gromacs checkpoints
[21:56:06] fcSaveRestoreState: I/O failed dir=0, var=B068FFB4, varsize=20
[21:56:06] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] fcSaveRestoreState: I/O failed dir=0, var=B058BFB4, varsize=20
[21:56:07] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] fcSaveRestoreState: I/O failed dir=0, var=B060DFB4, varsize=20
[21:56:07] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] mdrun returned 3
[21:56:07] Gromacs detected an invalid checkpoint.  Restarting...fcSaveRestoreState: I/O failed dir=0, var=B0383FB4, varsize=20
[21:56:08] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:08] fcSaveRestoreState: I/O failed dir=0, var=B0509FB4, varsize=20
[21:56:08] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:09] Can't open checkpoint file 
[21:56:09] Can't open checkpoint file 
[21:56:09] Resuming from checkpoint
[21:56:09] Can't open checkpoint file 
[21:56:32] 
[21:56:32] Folding@home Core Shutdown: UNKNOWN_ERROR
[21:56:32] CoreStatus = 62 (98)
[21:56:32] + Restarting core (settings changed) 
[21:56:32] 
[21:56:32] + Processing work unit
[21:56:32] Core required: FahCore_a3.exe
[21:56:32] Core found.
[21:56:32] Working on queue slot 01 [November 25 21:56:32 UTC]
[21:56:32] + Working ...
[21:56:32] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -notermcheck -verbose -lifeline 80 -version 629'

[21:56:33] 
[21:56:33] *------------------------------*
[21:56:33] Folding@Home Gromacs SMP Core
[21:56:33] Version 2.22 (May 7 2010)
[21:56:33] 
[21:56:33] Preparing to commence simulation
[21:56:33] - Looking at optimizations...
[21:56:33] - Not checking prior termination.
[21:56:35] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:56:35] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:56:36] - Digital signature verified
[21:56:36] 
[21:56:36] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:56:36] 
[21:56:36] Assembly optimizations on if available.
[21:56:36] Entering M.D.
[21:56:48] Completed 0 out of 250000 steps  (0%)
[22:34:59] Completed 2500 out of 250000 steps  (1%)

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 7:50 pm
by toTOW
Judging by the error messages, it failed to resume from checkpoint :(

Re: No Bonus for 6900

Posted: Sun Nov 28, 2010 8:27 pm
by stephen123
OK, thanks. Assuming it doesn't repeat, I guess this is resolved.

Re: No Bonus for 6900

Posted: Mon Nov 29, 2010 3:36 am
by codysluder
stephen123 wrote:OK, thanks. That means the unit downloaded, ran, failed, started over from scratch and ran again without acquiring a new unit.

I'm including a part of my log in this post, because the failure mode does not look familiar to me:

Code: Select all

[21:56:07] mdrun returned 3
[21:56:07] Gromacs detected an invalid checkpoint.
[21:56:09] Can't open checkpoint file 
[21:56:32] 
[21:56:32] Folding@home Core Shutdown: UNKNOWN_ERROR
[21:56:32] CoreStatus = 62 (98)
[21:56:32] + Restarting core (settings changed) 
[21:56:32] 
[21:56:32] + Processing work unit
[21:56:32] Core required: FahCore_a3.exe
[21:56:32] Core found.
[21:56:32] Working on queue slot 01 [November 25 21:56:32 UTC]
[21:56:32] + Working ...
[21:56:32] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -notermcheck -verbose -lifeline 80 -version 629'

[21:56:33] 
[21:56:33] *------------------------------*
[21:56:33] Folding@Home Gromacs SMP Core
[21:56:33] Version 2.22 (May 7 2010)
[21:56:33] 
[21:56:33] Preparing to commence simulation
[21:56:33] - Looking at optimizations...
[21:56:33] - Not checking prior termination.
[21:56:35] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:56:35] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:56:36] - Digital signature verified
[21:56:36] 
[21:56:36] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:56:36] 
[21:56:36] Assembly optimizations on if available.
[21:56:36] Entering M.D.
[21:56:48] Completed 0 out of 250000 steps  (0%)
[22:34:59] Completed 2500 out of 250000 steps  (1%)
I've never seen that error either, but it does make sense. While you were upgrading, you probably failed to allow the OS to complete the shutdown process normally and parts of the checkpoint file were still in cache when you killed the power. Whether that's what happened or not, FAH detected an invalid checkpoint and had to start over, just as you said.

Fortunately, the WU won't count against your 80% since you did return the WU by the final deadline. You just exceeded the Preferred Deadline which results in no bonus.