No Bonus for 6900

Moderators: Site Moderators, FAHC Science Team

Post Reply
stephen123
Posts: 31
Joined: Thu Nov 27, 2008 5:00 pm

No Bonus for 6900

Post by stephen123 »

I complete my first 6900 unit in a similar time to my usual for bigadv units, but received no bonus. I got 8,955 points. My previous bigadv unit was slightly slower and earned 65,372.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: No Bonus for 6900

Post by P5-133XL »

Check to see if the correct passkey is still configured in the client. Also, you can stop getting bonus if, for any reason, you are not returning 80% valid and on time WU's.
Image
stephen123
Posts: 31
Joined: Thu Nov 27, 2008 5:00 pm

Re: No Bonus for 6900

Post by stephen123 »

It may be the 80% issue. It depends what time span the 80% is calculated over. I did have a series of units fail while upgrading my computer. Usually, I get about 1/3 chance of FAH unit failure if I reboot. But it's been higher recently and I was rebooting a lot while upgrading drives and memory. In hind sight, I suppose I should have stopped FAH for a few days while upgrading, but I wasn't actually aware that unit failure was harmful to FAH. I guess I had not thought through the statistical effect and was just thinking that the system is designed to handle it.

Do you know what time span the 80% is calculated over? How does one recover after falling below 80%? Just rise above 80%? Or is it 10 consecutive units again?
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: No Bonus for 6900

Post by ChelseaOilman »

According to the WU database you didn't receive bonus points because you exceeded the preferred deadline of 4 days for a p6900 WU.

Days taken to complete WU: 4.75

Hi stephen123 (team 1971),
Your WU (P6900 R10 C11 G1) was added to the stats database on 2010-11-28 07:05:08 for 8955 points of credit.
stephen123
Posts: 31
Joined: Thu Nov 27, 2008 5:00 pm

Re: No Bonus for 6900

Post by stephen123 »

OK, thanks. That means the unit downloaded, ran, failed, started over from scratch and ran again without acquiring a new unit.

I'm including a part of my log in this post, because the failure mode does not look familiar to me:

Code: Select all

[20:50:11] Completed 177500 out of 250000 steps  (71%)
[21:30:33] Completed 180000 out of 250000 steps  (72%)
[21:54:36] ***** Got a SIGTERM signal (15)
[21:54:36] Killing all core threads

Folding@Home Client Shutdown.


--- Opening Log file [November 25 21:55:45 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/stephen/Library/Folding@home
Executable: /usr/local/fah/fah6
Arguments: -smp 8 -verbosity 9 -bigadv 

[21:55:45] - Ask before connecting: No
[21:55:45] - User name: stephen123 (Team 1971)
[21:55:45] - User ID: XXXXXXXXXX
[21:55:45] - Machine ID: 1
[21:55:45] 
[21:55:46] Loaded queue successfully.
[21:55:46] 
[21:55:46] - Autosending finished units... [21:55:46][21:55:46] + Processing work unit
Trying to send all finished work units
[21:55:46] Core required: FahCore_a3.exe
[21:55:46] Core found.
[21:55:46] + No unsent completed units remaining.
[21:55:46] - Autosend completed
[21:55:46] Working on queue slot 01 [November 25 21:55:46 UTC]
[21:55:46] + Working ...
[21:55:46] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -verbose -lifeline 80 -version 629'

[21:55:46] 
[21:55:46] *------------------------------*
[21:55:46] Folding@Home Gromacs SMP Core
[21:55:46] Version 2.22 (May 7 2010)
[21:55:46] 
[21:55:46] Preparing to commence simulation
[21:55:46] - Looking at optimizations...
[21:55:46] - Files status OK
[21:55:49] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:55:49] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:55:49] - Digital signature verified
[21:55:49] 
[21:55:49] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:55:49] 
[21:55:50] Assembly optimizations on if available.
[21:55:50] Entering M.D.
[21:55:56] Using Gromacs checkpoints
[21:56:06] fcSaveRestoreState: I/O failed dir=0, var=B068FFB4, varsize=20
[21:56:06] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] fcSaveRestoreState: I/O failed dir=0, var=B058BFB4, varsize=20
[21:56:07] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] fcSaveRestoreState: I/O failed dir=0, var=B060DFB4, varsize=20
[21:56:07] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:07] mdrun returned 3
[21:56:07] Gromacs detected an invalid checkpoint.  Restarting...fcSaveRestoreState: I/O failed dir=0, var=B0383FB4, varsize=20
[21:56:08] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:08] fcSaveRestoreState: I/O failed dir=0, var=B0509FB4, varsize=20
[21:56:08] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
[21:56:09] Can't open checkpoint file 
[21:56:09] Can't open checkpoint file 
[21:56:09] Resuming from checkpoint
[21:56:09] Can't open checkpoint file 
[21:56:32] 
[21:56:32] Folding@home Core Shutdown: UNKNOWN_ERROR
[21:56:32] CoreStatus = 62 (98)
[21:56:32] + Restarting core (settings changed) 
[21:56:32] 
[21:56:32] + Processing work unit
[21:56:32] Core required: FahCore_a3.exe
[21:56:32] Core found.
[21:56:32] Working on queue slot 01 [November 25 21:56:32 UTC]
[21:56:32] + Working ...
[21:56:32] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -notermcheck -verbose -lifeline 80 -version 629'

[21:56:33] 
[21:56:33] *------------------------------*
[21:56:33] Folding@Home Gromacs SMP Core
[21:56:33] Version 2.22 (May 7 2010)
[21:56:33] 
[21:56:33] Preparing to commence simulation
[21:56:33] - Looking at optimizations...
[21:56:33] - Not checking prior termination.
[21:56:35] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:56:35] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:56:36] - Digital signature verified
[21:56:36] 
[21:56:36] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:56:36] 
[21:56:36] Assembly optimizations on if available.
[21:56:36] Entering M.D.
[21:56:48] Completed 0 out of 250000 steps  (0%)
[22:34:59] Completed 2500 out of 250000 steps  (1%)
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: No Bonus for 6900

Post by toTOW »

Judging by the error messages, it failed to resume from checkpoint :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
stephen123
Posts: 31
Joined: Thu Nov 27, 2008 5:00 pm

Re: No Bonus for 6900

Post by stephen123 »

OK, thanks. Assuming it doesn't repeat, I guess this is resolved.
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: No Bonus for 6900

Post by codysluder »

stephen123 wrote:OK, thanks. That means the unit downloaded, ran, failed, started over from scratch and ran again without acquiring a new unit.

I'm including a part of my log in this post, because the failure mode does not look familiar to me:

Code: Select all

[21:56:07] mdrun returned 3
[21:56:07] Gromacs detected an invalid checkpoint.
[21:56:09] Can't open checkpoint file 
[21:56:32] 
[21:56:32] Folding@home Core Shutdown: UNKNOWN_ERROR
[21:56:32] CoreStatus = 62 (98)
[21:56:32] + Restarting core (settings changed) 
[21:56:32] 
[21:56:32] + Processing work unit
[21:56:32] Core required: FahCore_a3.exe
[21:56:32] Core found.
[21:56:32] Working on queue slot 01 [November 25 21:56:32 UTC]
[21:56:32] + Working ...
[21:56:32] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 5 -notermcheck -verbose -lifeline 80 -version 629'

[21:56:33] 
[21:56:33] *------------------------------*
[21:56:33] Folding@Home Gromacs SMP Core
[21:56:33] Version 2.22 (May 7 2010)
[21:56:33] 
[21:56:33] Preparing to commence simulation
[21:56:33] - Looking at optimizations...
[21:56:33] - Not checking prior termination.
[21:56:35] - Expanded 24861359 -> 30796293 (decompressed 123.8 percent)
[21:56:35] Called DecompressByteArray: compressed_data_size=24861359 data_size=30796293, decompressed_data_size=30796293 diff=0
[21:56:36] - Digital signature verified
[21:56:36] 
[21:56:36] Project: 6900 (Run 10, Clone 11, Gen 1)
[21:56:36] 
[21:56:36] Assembly optimizations on if available.
[21:56:36] Entering M.D.
[21:56:48] Completed 0 out of 250000 steps  (0%)
[22:34:59] Completed 2500 out of 250000 steps  (1%)
I've never seen that error either, but it does make sense. While you were upgrading, you probably failed to allow the OS to complete the shutdown process normally and parts of the checkpoint file were still in cache when you killed the power. Whether that's what happened or not, FAH detected an invalid checkpoint and had to start over, just as you said.

Fortunately, the WU won't count against your 80% since you did return the WU by the final deadline. You just exceeded the Preferred Deadline which results in no bonus.
Post Reply