Page 1 of 1

Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 6:58 pm
by tcphillips

Code: Select all

[18:13:18] Initial: F656; + 1542921 bytes downloaded
[18:13:18] Verifying core Core_15.fah...
[18:13:18] Signature is VALID
[18:13:18] 
[18:13:18] Trying to unzip core FahCore_15.exe
[18:13:18] Decompressed FahCore_15.exe (4618752 bytes) successfully
[18:13:23] + Core successfully engaged
[18:13:29] 
[18:13:29] + Processing work unit
[18:13:29] Core required: FahCore_15.exe
[18:13:29] Core found.
[18:13:29] Working on queue slot 01 [October 10 18:13:29 UTC]
[18:13:29] + Working ...
[18:13:29] - Calling '.\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -priority 96 -checkpoint 15 -verbose -lifeline 3712 -version 630'

[18:13:29] 
[18:13:29] *------------------------------*
[18:13:29] Folding@Home GPU Core
[18:13:29] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[18:13:29] Build host             amoeba 
[18:13:29] Board Type             NVIDIA/CUDA
[18:13:29] Core                   15
[18:13:29] 
[18:13:29] Window's signal control handler registered.
[18:13:29] Preparing to commence simulation
[18:13:29] - Looking at optimizations...
[18:13:29] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[18:13:29] - Created dyn
[18:13:29] - Files status OK
[18:13:29] sizeof(CORE_PACKET_HDR) = 512 file=<>
[18:13:29] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[18:13:29] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[18:13:29] - Digital signature verified
[18:13:29] 
[18:13:29] Project: 11177 (Run 12, Clone 170, Gen 9)
[18:13:29] 
[18:13:29] Assembly optimizations on if available.
[18:13:29] Entering M.D.
[18:13:31] Tpr hash work/wudata_01.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[18:13:31] calling fah_main gpuDeviceId=0
[18:13:32] Working on ALZHEIMER'S DISEASE AMYLOID
[18:13:32] Client config found, loading data.
[18:13:32] Starting GUI Server
[18:14:43] Setting checkpoint frequency: 500000
[18:14:43] Completed         0 out of 50000000 steps (0%).
[18:14:43] mdrun_gpu returned 53
[18:14:43] Calculated & specified T inconsisitent
[18:14:43] 
[18:14:43] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:14:45] CoreStatus = 7A (122)
[18:14:45] Sending work to server
[18:14:45] Project: 11177 (Run 12, Clone 170, Gen 9)
[18:14:45] - Error: Could not get length of results file work/wuresults_01.dat
[18:14:45] - Error: Could not read unit 01 file. Removing from queue.
[18:14:45] Trying to send all finished work units
[18:14:45] + No unsent completed units remaining.
[18:14:45] - Preparing to get new work unit...
[18:14:45] Cleaning up work directory
[18:14:45] + Attempting to get work packet
[18:14:45] Passkey found
[18:14:45] - Will indicate memory of 2047 MB
[18:14:45] Gpu type=2 species=30.
[18:14:45] - Connecting to assignment server
[18:14:45] Connecting to http://assign-GPU.stanford.edu:8080/
[18:14:46] Posted data.
[18:14:46] Initial: 43AB; - Successful: assigned to (171.67.108.31).
[18:14:46] + News From Folding@Home: Welcome to Folding@Home
[18:14:46] Loaded queue successfully.
[18:14:46] Gpu type=2 species=30.
[18:14:46] Sent data
[18:14:46] Connecting to http://171.67.108.31:8080/
[18:14:46] ***** Got a SIGTERM signal (2)
[18:14:46] Killing all core threads

Folding@Home Client Shutdown.
...and so on a couple dozen times

NVIDIA 9800 GT - no overclock at all
Passes memtestg80 256 100 finer n frog-hair

Deleted "queue.dat" and "work" folder multiple times, but it was insistent I run this unit.
Downloaded new core multiple times (that is what you see above in the log where I pick it up)
Even ran a A11 unit once over the past couple days no incident

Eventually re-config'd and got a new unit that seems to be running fine....

Mod Edit: Added Code Tags - PantherX

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 7:05 pm
by sortofageek
Project: 11177 (Run 12, Clone 170, Gen 9) sure seemed to not like something about your folder. Just FYI, however, this WU was completed successfully for full credit by several other donors, so it wasn't a bad WU. If this doesn't continue to happen, one possibility is a corrupt download. If it did continue, I would be checking my disks next.

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 8:49 pm
by codysluder
So what does mdrun_gpu returned 53 mean? Rather than a corrupt download, couldn't it be bad hardware or drivers that doesn't handle an unusual condition well?

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 8:58 pm
by sortofageek
You're probably on a better track than I am. I've never owned a GPU, much less folded with it. tcphillips, ignore me. Listen to the guy with experience.

What I should have said is that it isn't a bad WU, something else caused the problem ... and then I should have shut up and sat down. :)

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 9:08 pm
by codysluder
I'm not the guy with experience. I asked the question because I didn't know the answer, not with an intention of humiliating you. Maybe someone with actual experience will happen by and enlighten both of us.

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Oct 10, 2011 9:12 pm
by sortofageek
No worries. I shamed myself. I didn't even look at the error messages or even think in terms of a GPU. Long day, long story.

I did turn up several hits searching on that message, but don't have the time right now to wade through everything. Hopefully, someone else will get the OP started in a good direction, especially if this happens again.

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Tue Oct 11, 2011 10:25 am
by tcphillips
OK, so someone else finished the WU...I'll check a few thing on my machine, then this is me forgetting about it.
Thanks, all....

--T

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Wed Oct 26, 2011 3:29 pm
by powerarmour
(Edit: Ignore this one, I fixed the problem by downloading a newer version of the V6 console client, which then picked up a 10512 on the restart. Running as fine as normal now)

I'm also getting problems with this WU on my GT 240 :-

Code: Select all

[15:23:08] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:23:08] 
[15:23:08] Assembly optimizations on if available.
[15:23:08] Entering M.D.
[15:23:10] Tpr hash work/wudata_02.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[15:23:10] calling fah_main gpuDeviceId=0
[15:23:10] Working on ALZHEIMER'S DISEASE AMYLOID
[15:23:10] Client config found, loading data.
[15:23:11] Starting GUI Server
[15:24:16] Setting checkpoint frequency: 500000
[15:24:16] Completed         0 out of 50000000 steps (0%).
[15:24:16] mdrun_gpu returned 53
[15:24:16] Calculated & specified T inconsisitent
[15:24:16] 
[15:24:16] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:24:19] CoreStatus = 7A (122)
[15:24:19] Sending work to server
[15:24:19] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:24:19] - Read packet limit of 540015616... Set to 524286976.
[15:24:19] - Error: Could not get length of results file work/wuresults_02.dat
[15:24:19] - Error: Could not read unit 02 file. Removing from queue.
[15:24:19] - Preparing to get new work unit...
[15:24:19] Cleaning up work directory
[15:24:19] + Attempting to get work packet
[15:24:19] Passkey found
[15:24:19] Gpu type=2 species=30.
[15:24:19] - Connecting to assignment server
[15:24:20] - Successful: assigned to (171.67.108.31).
[15:24:20] + News From Folding@Home: Welcome to Folding@Home
[15:24:20] Loaded queue successfully.
[15:24:20] Gpu type=2 species=30.
[15:24:22] + Closed connections
[15:24:27] 
[15:24:27] + Processing work unit
[15:24:27] Core required: FahCore_15.exe
[15:24:27] Core found.
[15:24:27] Working on queue slot 03 [October 26 15:24:27 UTC]
[15:24:27] + Working ...
[15:24:27] 
[15:24:27] *------------------------------*
[15:24:27] Folding@Home GPU Core
[15:24:27] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[15:24:27] Build host             amoeba 
[15:24:27] Board Type             NVIDIA/CUDA
[15:24:27] Core                   15
[15:24:27] 
[15:24:27] Window's signal control handler registered.
[15:24:27] Preparing to commence simulation
[15:24:27] - Looking at optimizations...
[15:24:27] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[15:24:27] - Created dyn
[15:24:27] - Files status OK
[15:24:27] sizeof(CORE_PACKET_HDR) = 512 file=<>
[15:24:27] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[15:24:27] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[15:24:27] - Digital signature verified
[15:24:27] 
[15:24:27] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:24:27] 
[15:24:27] Assembly optimizations on if available.
[15:24:27] Entering M.D.
[15:24:29] Tpr hash work/wudata_03.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[15:24:29] calling fah_main gpuDeviceId=0
[15:24:29] Working on ALZHEIMER'S DISEASE AMYLOID
[15:24:29] Client config found, loading data.
[15:24:29] Starting GUI Server
[15:25:34] Setting checkpoint frequency: 500000
[15:25:34] Completed         0 out of 50000000 steps (0%).
[15:25:34] mdrun_gpu returned 53
[15:25:34] Calculated & specified T inconsisitent
[15:25:34] 
[15:25:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:25:37] CoreStatus = 7A (122)
[15:25:37] Sending work to server
[15:25:37] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:25:37] - Read packet limit of 540015616... Set to 524286976.
[15:25:37] - Error: Could not get length of results file work/wuresults_03.dat
[15:25:37] - Error: Could not read unit 03 file. Removing from queue.
[15:25:37] - Preparing to get new work unit...
[15:25:37] Cleaning up work directory
[15:25:37] + Attempting to get work packet
[15:25:37] Passkey found
[15:25:37] Gpu type=2 species=30.
[15:25:37] - Connecting to assignment server
[15:25:38] - Successful: assigned to (171.67.108.31).
[15:25:38] + News From Folding@Home: Welcome to Folding@Home
[15:25:38] Loaded queue successfully.
[15:25:38] Gpu type=2 species=30.
[15:25:40] + Closed connections
[15:25:45] 
[15:25:45] + Processing work unit
[15:25:45] Core required: FahCore_15.exe
[15:25:45] Core found.
[15:25:45] Working on queue slot 04 [October 26 15:25:45 UTC]
[15:25:45] + Working ...
[15:25:45] 
[15:25:45] *------------------------------*
[15:25:45] Folding@Home GPU Core
[15:25:45] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[15:25:45] Build host             amoeba 
[15:25:45] Board Type             NVIDIA/CUDA
[15:25:45] Core                   15
[15:25:45] 
[15:25:45] Window's signal control handler registered.
[15:25:45] Preparing to commence simulation
[15:25:45] - Looking at optimizations...
[15:25:45] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[15:25:45] - Created dyn
[15:25:45] - Files status OK
[15:25:45] sizeof(CORE_PACKET_HDR) = 512 file=<>
[15:25:45] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[15:25:45] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[15:25:45] - Digital signature verified
[15:25:45] 
[15:25:45] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:25:45] 
[15:25:45] Assembly optimizations on if available.
[15:25:45] Entering M.D.
[15:25:47] Tpr hash work/wudata_04.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[15:25:47] calling fah_main gpuDeviceId=0
[15:25:47] Working on ALZHEIMER'S DISEASE AMYLOID
[15:25:47] Client config found, loading data.
[15:25:47] Starting GUI Server
[15:26:52] Setting checkpoint frequency: 500000
[15:26:52] Completed         0 out of 50000000 steps (0%).
[15:26:52] mdrun_gpu returned 53
[15:26:52] Calculated & specified T inconsisitent
[15:26:52] 
[15:26:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:26:55] CoreStatus = 7A (122)
[15:26:55] Sending work to server
[15:26:55] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:26:55] - Read packet limit of 540015616... Set to 524286976.
[15:26:55] - Error: Could not get length of results file work/wuresults_04.dat
[15:26:55] - Error: Could not read unit 04 file. Removing from queue.
[15:26:55] - Preparing to get new work unit...
[15:26:55] Cleaning up work directory
[15:26:55] + Attempting to get work packet
[15:26:55] Passkey found
[15:26:55] Gpu type=2 species=30.
[15:26:55] - Connecting to assignment server
[15:26:57] - Successful: assigned to (171.67.108.31).
[15:26:57] + News From Folding@Home: Welcome to Folding@Home
[15:26:57] Loaded queue successfully.
[15:26:57] Gpu type=2 species=30.
[15:26:59] + Closed connections
[15:27:04] 
[15:27:04] + Processing work unit
[15:27:04] Core required: FahCore_15.exe
[15:27:04] Core found.
[15:27:04] Working on queue slot 05 [October 26 15:27:04 UTC]
[15:27:04] + Working ...
[15:27:04] 
[15:27:04] *------------------------------*
[15:27:04] Folding@Home GPU Core
[15:27:04] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[15:27:04] Build host             amoeba 
[15:27:04] Board Type             NVIDIA/CUDA
[15:27:04] Core                   15
[15:27:04] 
[15:27:04] Window's signal control handler registered.
[15:27:04] Preparing to commence simulation
[15:27:04] - Looking at optimizations...
[15:27:04] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[15:27:04] - Created dyn
[15:27:04] - Files status OK
[15:27:04] sizeof(CORE_PACKET_HDR) = 512 file=<>
[15:27:04] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[15:27:04] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[15:27:04] - Digital signature verified
[15:27:04] 
[15:27:04] Project: 11177 (Run 12, Clone 170, Gen 9)
[15:27:04] 
[15:27:04] Assembly optimizations on if available.
[15:27:04] Entering M.D.
[15:27:06] Tpr hash work/wudata_05.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[15:27:06] calling fah_main gpuDeviceId=0
[15:27:06] Working on ALZHEIMER'S DISEASE AMYLOID
[15:27:06] Client config found, loading data.
[15:27:07] Starting GUI Server
Obviously it's not picking up another WU at the moment because I'm stuck on this one, but this card has been folding other WU's fine previously?

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Wed Oct 26, 2011 5:13 pm
by Duboisi
Was the GPU card driver updated recently? Try an older driver, before 169... something.

Re: Project: 11177 (Run 12, Clone 170, Gen 9)

Posted: Mon Nov 14, 2011 4:16 am
by pbj
i've encountered the same errors with this WU on my GT240 as well. mine's in a linux box under wine, gpu version 6.31 and nvidia driver version 256.35. it's been stable for a long while (as you can see by the out of date versions :D )