GPU clients crash and burn under a v7.1.52 service
Posted: Tue Jun 19, 2012 9:05 am
Hi, I just reloaded the OS on my computer and loaded the new v7 client from the Windows installer. I set it to use SMP and GPU's. The SMP client seems to be working fine, but both GPU clients are crashing. GPU 0 is a GeForce GTX 275 and GPU 1 is a GeForce GTX 460. Both have been running for over a year on the previous GPU client with no problems.
Here's the log for the GTX 275:
Here's the log for the 460:
Are there any extra flags that I need to set for the GPU clients to make them run correctly? I remember with the previous clients that I had to set a flag for the GPU microarchitecture (g80 for the 275 and fermi for the 460.) Is this still required? Also, is there still an analouge for the -bigadv flag from previous clients? I'm afraid I've been out of the loop on the state of F@H for a while, so I'm not familiar with any tweeks that may be required to get things running optimally (or running at all) with the new client. If there's an FAQ somewhere that answers these questions, please let me know.
Thanks,
rab38505
Here's the log for the GTX 275:
Code: Select all
08:44:28:WU01:FS00:Connecting to 171.67.108.21:8080
08:44:28:WU03:FS00:Connecting to assign-GPU.stanford.edu:80
08:44:29:WU03:FS00:News: Welcome to Folding@Home
08:44:29:WU03:FS00:Assigned to work server 171.67.108.21
08:44:29:WU03:FS00:Requesting new work unit for slot 00: READY gpu:0:"GT200b [GeForce GTX 275]" from 171.67.108.21
08:44:29:WU03:FS00:Connecting to 171.67.108.21:8080
08:44:29:WU01:FS00:Upload complete
08:44:29:WU01:FS00:Server responded WORK_ACK (400)
08:44:29:WU01:FS00:Cleaning up
08:44:29:WU03:FS00:Downloading 61.86KiB
08:44:30:WU03:FS00:Download complete
08:44:30:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:OK project:10504 run:331 clone:0 gen:287 core:0x11 unit:0x0000030d6652eda54b75b1b600008d90
08:44:30:WU03:FS00:Starting
08:44:30:WU03:FS00:Running FahCore: d:\programs\FAHClient7/FAHCoreWrapper.exe d:/FAHClient7Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 03 -suffix 01 -version 701 -lifeline 1784 -checkpoint 15 -gpu 0 -service
08:44:30:WU03:FS00:Started FahCore on PID 3048
08:44:30:WU03:FS00:Core PID:3128
08:44:30:WU03:FS00:FahCore 0x11 started
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:*------------------------------*
08:44:31:WU03:FS00:0x11:Folding@Home GPU Core
08:44:31:WU03:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
08:44:31:WU03:FS00:0x11:Build host: amoeba
08:44:31:WU03:FS00:0x11:Board Type: Nvidia
08:44:31:WU03:FS00:0x11:Core :
08:44:31:WU03:FS00:0x11:Preparing to commence simulation
08:44:31:WU03:FS00:0x11:- Looking at optimizations...
08:44:31:WU03:FS00:0x11:DeleteFrameFiles: successfully deleted file=03/wudata_01.ckp
08:44:31:WU03:FS00:0x11:- Created dyn
08:44:31:WU03:FS00:0x11:- Files status OK
08:44:31:WU03:FS00:0x11:- Expanded 62828 -> 336799 (decompressed 536.0 percent)
08:44:31:WU03:FS00:0x11:Called DecompressByteArray: compressed_data_size=62828 data_size=336799, decompressed_data_size=336799 diff=0
08:44:31:WU03:FS00:0x11:- Digital signature verified
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Project: 10504 (Run 331, Clone 0, Gen 287)
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Assembly optimizations on if available.
08:44:31:WU03:FS00:0x11:Entering M.D.
08:44:37:WU03:FS00:0x11:Tpr hash 03/wudata_01.tpr: 1279372263 2054781382 2694799241 3133399893 1444846642
08:44:37:WU03:FS00:0x11:
08:44:37:WU03:FS00:0x11:Calling fah_main args: 14 usage=100
08:44:37:WU03:FS00:0x11:
08:44:37:WU03:FS00:0x11:mdrun_gpu returned
08:44:37:WU03:FS00:0x11:Going to send back what have done -- stepsTotalG=0
08:44:37:WU03:FS00:0x11:Work fraction=0.0000 steps=0.
08:44:41:WU03:FS00:0x11:logfile size=0 infoLength=0 edr=0 trr=25
08:44:41:WU03:FS00:0x11:+ Opened results file
08:44:41:WU03:FS00:0x11:- Writing 635 bytes of core data to disk...
08:44:41:WU03:FS00:0x11:Done: 123 -> 123 (compressed to 100.0 percent)
08:44:41:WU03:FS00:0x11: ... Done.
08:44:41:WU03:FS00:0x11:DeleteFrameFiles: successfully deleted file=03/wudata_01.ckp
08:44:41:WU03:FS00:0x11:
08:44:41:WU03:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:44:41:WU03:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:44:41:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:10504 run:331 clone:0 gen:287 core:0x11 unit:0x0000030d6652eda54b75b1b600008d90
08:44:41:WU03:FS00:Uploading 635B to 171.67.108.21
08:44:41:WU03:FS00:Connecting to 171.67.108.21:8080
08:44:42:WU03:FS00:Upload complete
08:44:42:WU03:FS00:Server responded WORK_ACK (400)
08:44:42:WU03:FS00:Cleaning up
Code: Select all
08:58:13:WU00:FS01:Starting
08:58:13:WU00:FS01:Running FahCore: d:\programs\FAHClient7/FAHCoreWrapper.exe d:/FAHClient7Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 701 -lifeline 1784 -checkpoint 15 -gpu 1 -service
08:58:13:WU00:FS01:Started FahCore on PID 1444
08:58:13:WU00:FS01:Core PID:1256
08:58:13:WU00:FS01:FahCore 0x15 started
08:58:14:WU00:FS01:0x15:
08:58:14:WU00:FS01:0x15:*------------------------------*
08:58:14:WU00:FS01:0x15:Folding@Home GPU Core
08:58:14:WU00:FS01:0x15:Version 2.22 (Thu Dec 8 17:08:05 PST 2011)
08:58:14:WU00:FS01:0x15:Build host SimbiosNvdWin7
08:58:14:WU00:FS01:0x15:Board Type NVIDIA/CUDA
08:58:14:WU00:FS01:0x15:Core 15
08:58:14:WU00:FS01:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
08:58:14:WU00:FS01:0x15:
08:58:14:WU00:FS01:0x15:Window's signal control handler registered.
08:58:14:WU00:FS01:0x15:Preparing to commence simulation
08:58:14:WU00:FS01:0x15:- Ensuring status. Please wait.
08:58:23:WU00:FS01:0x15:- Looking at optimizations...
08:58:23:WU00:FS01:0x15:- Working with standard loops on this execution.
08:58:23:WU00:FS01:0x15:- Previous termination of core was improper.
08:58:23:WU00:FS01:0x15:- Going to use standard loops.
08:58:23:WU00:FS01:0x15:- Files status OK
08:58:23:WU00:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
08:58:23:WU00:FS01:0x15:- Expanded 145445 -> 660994 (decompressed 454.4 percent)
08:58:23:WU00:FS01:0x15:Called DecompressByteArray: compressed_data_size=145445 data_size=660994, decompressed_data_size=660994 diff=0
08:58:23:WU00:FS01:0x15:- Digital signature verified
08:58:23:WU00:FS01:0x15:
08:58:23:WU00:FS01:0x15:Project: 8020 (Run 5, Clone 269, Gen 57)
08:58:23:WU00:FS01:0x15:
08:58:23:WU00:FS01:0x15:Entering M.D.
08:58:25:WU00:FS01:0x15:Tpr hash 00/wudata_01.tpr: 205948020 226739098 28531194 2781083651 111202163
08:58:25:WU00:FS01:0x15:GPU device info: vendor=0 device=0 name=<NA> match=0
08:58:25:WU00:FS01:0x15:Working on Gromacs Runs On Most of All Computer Systems
08:58:25:WU00:FS01:0x15:Client config unavailable.
08:58:25:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1 = 0xffffffff)
08:58:25:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
Thanks,
rab38505