Page 1 of 1

P5786 (R2, C98, G165) & P5783 (R10, C63, G56)

Posted: Thu Jul 22, 2010 11:42 am
by PantherX
I have just installed the 258.96 WHQL Drivers and this WU EUE @ 50%. My video Drivers crashes and was automatically restored. I don't know if the Driver crash caused the EUE or if the EUE caused the Driver to crash. Here is the FAHLog:

Code: Select all

[10:25:16] + Attempting to get work packet
[10:25:16] Passkey found
[10:25:16] - Will indicate memory of 4091 MB
[10:25:16] Gpu type=2 species=30.
[10:25:16] - Connecting to assignment server
[10:25:16] Connecting to http://assign-GPU.stanford.edu:8080/
[10:25:22] Posted data.
[10:25:22] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[10:25:23] + News From Folding@Home: Welcome to Folding@Home
[10:25:23] Loaded queue successfully.
[10:25:23] Gpu type=2 species=30.
[10:25:23] Sent data
[10:25:23] Connecting to http://171.67.108.21:8080/
[10:25:25] Posted data.
[10:25:25] Initial: 0000; - Receiving payload (expected size: 65227)
[10:25:29] - Downloaded at ~15 kB/s
[10:25:29] - Averaged speed for that direction ~22 kB/s
[10:25:29] + Received work.
[10:25:29] Trying to send all finished work units
[10:25:29] + No unsent completed units remaining.
[10:25:29] + Closed connections
[10:25:29] 
[10:25:29] + Processing work unit
[10:25:29] Core required: FahCore_11.exe
[10:25:29] Core found.
[10:25:29] Working on queue slot 03 [July 22 10:25:29 UTC]
[10:25:29] + Working ...
[10:25:29] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -nice 19 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 1260 -version 630'

[10:25:29] 
[10:25:29] *------------------------------*
[10:25:29] Folding@Home GPU Core
[10:25:29] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[10:25:29] 
[10:25:29] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[10:25:29] Build host: amoeba
[10:25:29] Board Type: Nvidia
[10:25:29] Core      : 
[10:25:29] Preparing to commence simulation
[10:25:29] - Looking at optimizations...
[10:25:29] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[10:25:29] - Created dyn
[10:25:29] - Files status OK
[10:25:29] - Expanded 64715 -> 341507 (decompressed 527.7 percent)
[10:25:29] Called DecompressByteArray: compressed_data_size=64715 data_size=341507, decompressed_data_size=341507 diff=0
[10:25:29] - Digital signature verified
[10:25:29] 
[10:25:29] Project: 5786 (Run 2, Clone 98, Gen 165)
[10:25:29] 
[10:25:29] Assembly optimizations on if available.
[10:25:29] Entering M.D.
[10:25:35] Tpr hash work/wudata_03.tpr:  2130414023 2038943993 26319499 2415291690 611598672
[10:25:35] 
[10:25:35] Calling fah_main args: 14 usage=100
[10:25:35] 
[10:25:35] Working on GRoups of Organic Molecules in ACtion for Science
[10:25:36] Client config found, loading data.
[10:25:36] Starting GUI Server
[10:26:55] Completed 1%
[10:28:14] Completed 2%
SNIP
[11:26:25] Completed 46%
[11:27:44] Completed 47%
[11:29:04] Completed 48%
[11:30:23] Completed 49%
[11:31:42] Completed 50%
[11:32:56] Run: exception thrown during GuardedRun
[11:32:57] Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
[11:32:57] Going to send back what have done -- stepsTotalG=20000000
[11:32:57] Work fraction=0.5089 steps=20000000.
[11:33:01] logfile size=0 infoLength=0 edr=0 trr=23
[11:33:01] + Opened results file
[11:33:01] - Writing 642 bytes of core data to disk...
[11:33:01] Done: 130 -> 127 (compressed to 97.6 percent)
[11:33:01]   ... Done.
[11:33:01] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[11:33:01] 
[11:33:01] Folding@home Core Shutdown: EARLY_UNIT_END
[11:33:03] CoreStatus = 72 (114)
[11:33:03] Sending work to server
[11:33:03] Project: 5786 (Run 2, Clone 98, Gen 165)
[11:33:03] - Read packet limit of 540015616... Set to 524286976.


[11:33:03] + Attempting to send results [July 22 11:33:03 UTC]
[11:33:03] - Reading file work/wuresults_03.dat from core
[11:33:03]   (Read 639 bytes from disk)
[11:33:03] Gpu type=2 species=30.
[11:33:03] Connecting to http://171.67.108.21:8080/
[11:33:06] Posted data.
[11:33:06] Initial: 0000; - Uploaded at ~0 kB/s
[11:33:06] - Averaged speed for that direction ~6 kB/s
[11:33:06] + Results successfully sent
[11:33:06] Thank you for your contribution to Folding@Home.
EDIT - I and another EUE of this WU Project: 5783 (Run 10, Clone 63, Gen 56), here is the FAHLog:

Code: Select all

[11:46:47] - Autosending finished units... [July 22 11:46:47 UTC]
[11:46:47] + Processing work unit
[11:46:47] Trying to send all finished work units
[11:46:47] Core required: FahCore_11.exe
[11:46:47] + No unsent completed units remaining.
[11:46:47] Core found.
[11:46:47] - Autosend completed
[11:46:47] Working on queue slot 04 [July 22 11:46:47 UTC]
[11:46:47] + Working ...
[11:46:47] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -nice 19 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3516 -version 630'

[11:46:47] 
[11:46:47] *------------------------------*
[11:46:47] Folding@Home GPU Core
[11:46:47] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:46:47] 
[11:46:47] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:46:47] Build host: amoeba
[11:46:47] Board Type: Nvidia
[11:46:47] Core      : 
[11:46:47] Preparing to commence simulation
[11:46:47] - Looking at optimizations...
[11:46:47] - Files status OK
[11:46:47] - Expanded 65050 -> 343707 (decompressed 528.3 percent)
[11:46:47] Called DecompressByteArray: compressed_data_size=65050 data_size=343707, decompressed_data_size=343707 diff=0
[11:46:47] - Digital signature verified
[11:46:47] 
[11:46:47] Project: 5783 (Run 10, Clone 63, Gen 56)
[11:46:47] 
[11:46:47] Assembly optimizations on if available.
[11:46:47] Entering M.D.
[11:46:53] Will resume from checkpoint file
[11:46:53] Tpr hash work/wudata_04.tpr:  54510407 780862327 3818981750 1027029249 3238888524
[11:46:53] 
[11:46:53] Calling fah_main args: 14 usage=100
[11:46:53] 
[11:46:55] Working on GROwing Monsters And Cloning Shrimps
[11:46:55] Client config found, loading data.
[11:46:55] Resuming from checkpoint
[11:46:55] Starting GUI Server
[11:46:56] fcCheckPointResume: retreived and current tpr file hash:
[11:46:56]    0     54510407     54510407
[11:46:56]    1    780862327    780862327
[11:46:56]    2   3818981750   3818981750
[11:46:56]    3   1027029249   1027029249
[11:46:56]    4   3238888524   3238888524
[11:46:56] fcCheckPointResume: file hashes same.
[11:46:56] fcCheckPointResume: state restored.
[11:46:56] Verified work/wudata_04.log
[11:46:56] Verified work/wudata_04.edr
[11:46:56] Verified work/wudata_04.xtc
[11:46:56] Completed 3%
[11:48:17] Completed 4%
SNIP
[13:23:34] Completed 72%
[13:24:55] Completed 73%
[13:26:16] Completed 74%
[13:27:37] Completed 75%
[13:28:17] Run: exception thrown during GuardedRun
[13:28:17] Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
[13:28:17] Going to send back what have done -- stepsTotalG=20000000
[13:28:17] Work fraction=0.7549 steps=20000000.
[13:28:21] logfile size=11134 infoLength=11134 edr=0 trr=23
[13:28:21] + Opened results file
[13:28:21] - Writing 11670 bytes of core data to disk...
[13:28:21] Done: 11158 -> 4061 (compressed to 36.3 percent)
[13:28:21]   ... Done.
[13:28:21] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[13:28:21] 
[13:28:21] Folding@home Core Shutdown: EARLY_UNIT_END
[13:28:23] CoreStatus = 72 (114)
[13:28:23] Sending work to server
[13:28:23] Project: 5783 (Run 10, Clone 63, Gen 56)
[13:28:23] - Read packet limit of 540015616... Set to 524286976.


[13:28:23] + Attempting to send results [July 22 13:28:23 UTC]
[13:28:23] - Reading file work/wuresults_04.dat from core
[13:28:23]   (Read 4573 bytes from disk)
[13:28:23] Gpu type=2 species=30.
[13:28:23] Connecting to http://171.67.108.21:8080/
[13:28:25] Posted data.
[13:28:25] Initial: 0000; - Uploaded at ~2 kB/s
[13:28:25] - Averaged speed for that direction ~5 kB/s
[13:28:25] + Results successfully sent
I am thinking that the new Drivers doesn't like my OC so I have dropped from Shaders 1512 MHz to 1458 MHz. Hopefully it will work without any further problems.

Re: P5786 (R2, C98, G165) & P5783 (R10, C63, G56)

Posted: Thu Jul 22, 2010 8:11 pm
by sortofageek
Both of those projects have been completed successfully for full credit by several others.

Re: P5786 (R2, C98, G165) & P5783 (R10, C63, G56)

Posted: Thu Jul 22, 2010 9:49 pm
by ikerekes
sortofageek wrote:Both of those projects have been completed successfully for full credit by several others.
Both project has preferred deadline of 15 days. My slowest card can finish these 783 pointer WU's in about 4 hours.
What causing to assign these WU's to several others?

I was under the impression that F@H is not duplicating work. Don't tell me that 7im can be wrong :twisted:

Re: P5786 (R2, C98, G165) & P5783 (R10, C63, G56)

Posted: Thu Jul 22, 2010 10:40 pm
by 7im
Yes, it's rare, but I can be wrong. This just isn't one of those times. :twisted:

I almost always qualify my quoting of PG's "No Dupes" policy appropriately. ;) As I said before, PG has a standard policy of not duping work units. However, as I also said before, there are certain circumstances that can cause WUs to be re-assigned a 2nd time, or more. WUs that run past a preferred deadline are just one example. EUEs are another. Failed downloads and communication errors, etc. Another (rare example) is when PG mis-programs a setting in the WU or Work Server. I don't know the full list, just what I've read about here in the forum and experienced personally.

Lately, there seems to be more dupes reported than has been usual. But there are any number of explanations... GPU overclocks that were stable in December may not be so stable in July. ;) That, plus GPU3 just came out. The stability track record for fahcore_15 isn't fully known yet, neither are the new GTX cards. That, plus lot's of people upgrading from G80s to GTXs, or adding a GTX to a G80 system, and they are all trying to find their top stable overclocks, work out driver issues, etc... maybe bombing a few WUs here and there? I've also read about them testing new work server code, who knows?

That said, PG is just as careful to prevent this as we are to report it. PG's progress is obviously slowed by one machine when a dupe WU is sent out instead of new WUs. They don't want dupes any more than we do. And as long as we get credit for a dupe, we probably care less than they do. :lol:

Re: P5786 (R2, C98, G165) & P5783 (R10, C63, G56)

Posted: Fri Jul 23, 2010 4:08 am
by PantherX
sortofageek wrote:Both of those projects have been completed successfully for full credit by several others.
Thanks for informing me. I believe that it was the OC issue. With 197.45 WHQL, Shaders at 1512 MHz was no problems but apparently with 258.96 WHQL, they were unstable so I reduced it to 1458 MHz and so far, it has been stable without any issues whatsoever. Will monitor it for a day or so before calling it "stable" :lol: