Page 18 of 21

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Fri Jan 27, 2017 5:03 pm
by neognomic
SombraGuerrero wrote:Hopefully Nvidia will merge the fix into the final Linux package, because as of the beta, they haven't. [...snip]
Hey, Thanks for that. I was going to try the beta but there is no point if it is not going to fix this bug. ...saved me some time, :).

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sat Jan 28, 2017 5:21 am
by Leonardo
neognomic wrote:
Joe_H wrote:I know that this is probably the wrong place to post that since this is essentially a useless/dead thread but do not see a "378.xx" thread to post 378.49 results. :wink:
No not useless or dead, not by a long shot. Many of us are sitting on our 373.06 or 372.XX drivers until we are certain newer Nvidia GPU drivers don't kick GPU Folding in the head.

Snapshot, did you mean to write that you went back to '373.06.' (You wrote "363.06.")

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sat Jan 28, 2017 6:43 am
by snapshot
Yeah, my bad. I did mean 373.06 - my hacked version so I can use a GTX 1050 or 1050 Ti.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sun Jan 29, 2017 7:14 am
by neognomic
nVidia 378.49 might have similar problems with Core 21:
...
02:03:53:WU02:FS01:0x21:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-5)
...
02:04:37:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
02:04:37:WU00:FS01:0x21:Version 0.0.17
02:04:37:WU00:FS01:0x21:ERROR:Bad platformId size.
...
02:06:01:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
02:06:01:WU02:FS01:0x21:Version 0.0.17
02:06:01:WU02:FS01:0x21:ERROR:Bad platformId size.
...
and a half dozen more for Core 21 until it "FAILED" ...

ODD thing is that it started and completed about a dozen Core 21 before the above started.
So maybe it is driver ...or maybe something else.

I have rebooted and am starting again. It chose Core 18 and is running without errors. If it fails for Core 21, again, I'll post one more time.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sun Jan 29, 2017 10:33 am
by Nathan_P
02:03:53:WU02:FS01:0x21:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-5)

That line in the logs is the same error message that comes from the borked drivers - are you sure the box was upgraded properly?? You might need to do a full uninstall and use driver cleaner or similar to remove all traces of the old driver. If you are win10 make sure windows update hasn't rolled back to a whql driver

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sun Jan 29, 2017 3:12 pm
by SombraGuerrero
Barring all that, I suppose it's possible that the application profile Nvidia put in may not be all-encompassing. Perhaps you have found a work unit/project that wasn't taken into account with the JIT compile time change. It's important to remember that Nvidia hasn't actually fixed the problem with these drivers. They've just done their best to sidestep the offending bug in OpenMM. That's why an update to core21 is still being pursued.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Sun Jan 29, 2017 9:55 pm
by neognomic
Nathan_P wrote:02:03:53:WU02:FS01:0x21:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-5)

That line in the logs is the same error message that comes from the borked drivers - are you sure the box was upgraded properly?? You might need to do a full uninstall and use driver cleaner or similar to remove all traces of the old driver. If you are win10 make sure windows update hasn't rolled back to a whql driver
Hi Nathan,
Thanks for that, :), but I did a clean install and a remove and a clean install more than once. If there are entries in registry or elsewhere after all that, it is nVidia's fault.

The error this time, ...'energyBuffer'..., is different from the error with the 375.xx driver. Here, with 375.xx on Win 8.1, it was

FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)

I.e., "interactionCount" before with 375.xx.
I suspect it is all part of the same issue so difference M/B irrelevant.
Of course you are correct in that there may be a dozen other reasons. It is, after all, a new build/system that is still being config'd/broken-in.
Still, it is quite odd that it failed exactly the same way, albeit not same error message, with Core 21 but Core 18 runs for 378.49 on Win 8.1.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Mon Jan 30, 2017 7:55 am
by JT3rd
Over the weekend, I've installed the 378.49 driver on a GTX 1050 & GTX 1050 ti on Win7 OS.

Based on the past few WUs the PPD numbers appear on par with driver 376.48

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Tue Jan 31, 2017 4:18 pm
by neognomic
SombraGuerrero wrote:Barring all that, I suppose it's possible that the application profile Nvidia put in may not be all-encompassing. Perhaps you have found a work unit/project that wasn't taken into account with the JIT compile time change. It's important to remember that Nvidia hasn't actually fixed the problem with these drivers. They've just done their best to sidestep the offending bug in OpenMM. That's why an update to core21 is still being pursued.
Maybe.
I managed to get back to windoz 81 for a while and started the FAH GPU client.
No errors.
All GPU WU, including both Core 21 and 18, loaded and completed. That's at least 10 completed without error(s).
...
I do not know what caused the failure before. As long as it does not repeat, doesn't matter/don't care.

ithink I am going to try the nVidia 378.xx BETA for Linux anyway. If it fails as 375.xx does I will B! at nVidia in their forum. They do listen, sometimes, ;).

Thanks for the input.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Tue Jan 31, 2017 6:19 pm
by tictoc
neognomic wrote:
SombraGuerrero wrote:Barring all that, I suppose it's possible that the application profile Nvidia put in may not be all-encompassing. Perhaps you have found a work unit/project that wasn't taken into account with the JIT compile time change. It's important to remember that Nvidia hasn't actually fixed the problem with these drivers. They've just done their best to sidestep the offending bug in OpenMM. That's why an update to core21 is still being pursued.
Maybe.
I managed to get back to windoz 81 for a while and started the FAH GPU client.
No errors.
All GPU WU, including both Core 21 and 18, loaded and completed. That's at least 10 completed without error(s).
...
I do not know what caused the failure before. As long as it does not repeat, doesn't matter/don't care.

ithink I am going to try the nVidia 378.xx BETA for Linux anyway. If it fails as 375.xx does I will B! at nVidia in their forum. They do listen, sometimes, ;).

Thanks for the input.
The Linux beta driver does not have the hotfix that was in the Windows driver. It is now a moot point , because the new version of core_21 is being pushed out to all clients. viewtopic.php?f=24&t=29633

With the updated core version you will be able to use the latest Linux driver (375.26) or the beta driver. I have been running a 1050ti and a 1070 on the 375.26 Linux driver without issue for the last three days.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Tue Jan 31, 2017 7:40 pm
by SombraGuerrero
I have successfully folded a 0x21 work unit on project 11707 using driver set 376.33.
08:30:02:WU02:FS01:Final credit estimate, 101218.00 points
I think this updated core is going to do a lot of good once all the projects get refactored!

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Wed Feb 01, 2017 3:55 am
by neognomic
tictoc wrote: The Linux beta driver does not have the hotfix that was in the Windows driver. It is now a moot point , because the new version of core_21 is being pushed out to all clients. viewtopic.php?f=24&t=29633

With the updated core version you will be able to use the latest Linux driver (375.26) or the beta driver. I have been running a 1050ti and a 1070 on the 375.26 Linux driver without issue for the last three days.
Well, I had version 17 on windoz all this AM and based on your response just started the FAH-GPU on Linux (driver 375.26) but servers are still pushing version 17 too, which fails as before:

Code: Select all

03:20:46:WU00:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:20:46:WU00:FS02:0x21:Version 0.0.17
03:20:46:WU00:FS02:0x21:ERROR:Bad platformId size.
...
Version 18 is available but it yields the same error info:

Code: Select all

03:21:45:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:21:45:WU01:FS02:0x21:Version 0.0.18
03:21:45:WU01:FS02:0x21:ERROR:126: Bad platformId size.
03:21:45:WU01:FS02:0x21:Saving result file logfile_01.txt
03:21:45:WU01:FS02:0x21:Saving result file log.txt
03:21:45:WU01:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:21:45:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
and so forth until it "Failed" ...

Code: Select all

03:26:51:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:26:51:WU01:FS02:0x21:Version 0.0.18
03:26:51:WU01:FS02:0x21:ERROR:126: Bad platformId size.
03:26:51:WU01:FS02:0x21:Saving result file logfile_01.txt
03:26:51:WU01:FS02:0x21:Saving result file log.txt
03:26:51:WU01:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:26:51:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:26:51:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:10496 run:64 clone:16 gen:64 core:0x21 unit:0x0000005e8ca304f556bbaab8218f13f8
03:26:51:WU01:FS02:Uploading 1.97KiB to 140.163.4.245
03:26:51:WU01:FS02:Connecting to 140.163.4.245:8080
03:26:51:WU01:FS02:Upload complete
03:26:52:WU01:FS02:Server responded WORK_ACK (400)
03:26:52:WU01:FS02:Cleaning up
FWIW, FAH-CPU folds just fine. And Core 18 worked on Win81 this AM. If I could get the Core 18 for LInux it might work too.
Maybe version 19 for Core 21 will work for my system...

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Thu Feb 02, 2017 3:24 am
by tictoc
neognomic wrote:
tictoc wrote: The Linux beta driver does not have the hotfix that was in the Windows driver. It is now a moot point , because the new version of core_21 is being pushed out to all clients. viewtopic.php?f=24&t=29633

With the updated core version you will be able to use the latest Linux driver (375.26) or the beta driver. I have been running a 1050ti and a 1070 on the 375.26 Linux driver without issue for the last three days.
Well, I had version 17 on windoz all this AM and based on your response just started the FAH-GPU on Linux (driver 375.26) but servers are still pushing version 17 too, which fails as before:

Code: Select all

03:20:46:WU00:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:20:46:WU00:FS02:0x21:Version 0.0.17
03:20:46:WU00:FS02:0x21:ERROR:Bad platformId size.
...
Version 18 is available but it yields the same error info:

Code: Select all

03:21:45:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:21:45:WU01:FS02:0x21:Version 0.0.18
03:21:45:WU01:FS02:0x21:ERROR:126: Bad platformId size.
03:21:45:WU01:FS02:0x21:Saving result file logfile_01.txt
03:21:45:WU01:FS02:0x21:Saving result file log.txt
03:21:45:WU01:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:21:45:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
and so forth until it "Failed" ...

Code: Select all

03:26:51:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
03:26:51:WU01:FS02:0x21:Version 0.0.18
03:26:51:WU01:FS02:0x21:ERROR:126: Bad platformId size.
03:26:51:WU01:FS02:0x21:Saving result file logfile_01.txt
03:26:51:WU01:FS02:0x21:Saving result file log.txt
03:26:51:WU01:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:26:51:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:26:51:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:10496 run:64 clone:16 gen:64 core:0x21 unit:0x0000005e8ca304f556bbaab8218f13f8
03:26:51:WU01:FS02:Uploading 1.97KiB to 140.163.4.245
03:26:51:WU01:FS02:Connecting to 140.163.4.245:8080
03:26:51:WU01:FS02:Upload complete
03:26:52:WU01:FS02:Server responded WORK_ACK (400)
03:26:52:WU01:FS02:Cleaning up
FWIW, FAH-CPU folds just fine. And Core 18 worked on Win81 this AM. If I could get the Core 18 for LInux it might work too.
Maybe version 19 for Core 21 will work for my system...
That is odd. I have completed 7 p10496 WUs, and 19 other WUs, in Linux with the updated core version.

Code: Select all

*********************** Log Started 2017-01-30T05:19:19Z ***********************
05:19:19:************************* Folding@home Client *************************
05:19:19:    Website: http://folding.stanford.edu/
05:19:19:  Copyright: (c) 2009-2014 Stanford University
05:19:19:     Author: Joseph Coffland <[email protected]>
05:19:19:       Args: --config /opt/fah/config.xml --exec-directory=/opt/fah
05:19:19:             --data-directory=/opt/fah
05:19:19:     Config: /opt/fah/config.xml
05:19:19:******************************** Build ********************************
05:19:19:    Version: 7.4.4
05:19:19:       Date: Mar 4 2014
05:19:19:       Time: 12:02:38
05:19:19:    SVN Rev: 4130
05:19:19:     Branch: fah/trunk/client
05:19:19:   Compiler: GNU 4.4.7
05:19:19:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
05:19:19:             -fno-unsafe-math-optimizations -msse2
05:19:19:   Platform: linux2 3.2.0-1-amd64
05:19:19:       Bits: 64
05:19:19:       Mode: Release
05:19:19:******************************* System ********************************
05:19:19:        CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
05:19:19:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
05:19:19:       CPUs: 4
05:19:19:     Memory: 7.81GiB
05:19:19:Free Memory: 6.07GiB
05:19:19:    Threads: POSIX_THREADS
05:19:19: OS Version: 4.9
05:19:19:Has Battery: false
05:19:19: On Battery: false
05:19:19: UTC Offset: 
05:19:19:        PID: 4871
05:19:19:        CWD: /opt/fah
05:19:19:         OS: Linux 4.9.6-1-ARCH x86_64
05:19:19:    OS Arch: AMD64
05:19:19:       GPUs: 2
05:19:19:      GPU 0: ATI:5 Ellesmere XT [Radeon RX 470/480]
05:19:19:      GPU 1: NVIDIA:5 GP107 [GeForce GTX 1050 Ti]
05:19:19:       CUDA: 6.1
05:19:19:CUDA Driver: 8000
05:19:19:***********************************************************************
--------------------------------------------------------------------------------------------------------
15:59:39:WU00:FS00:0x21:*********************** Log Started 2017-01-31T15:59:39Z ***********************
15:59:39:WU00:FS00:0x21:Project: 10496 (Run 189, Clone 7, Gen 0)
15:59:39:WU00:FS00:0x21:Unit: 0x000000028ca304f556bbb36a3242d856
15:59:39:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
15:59:39:WU00:FS00:0x21:Machine: 0
15:59:39:WU00:FS00:0x21:Reading tar file core.xml
15:59:39:WU00:FS00:0x21:Reading tar file system.xml
15:59:39:WU00:FS00:0x21:Reading tar file integrator.xml
15:59:39:WU00:FS00:0x21:Reading tar file state.xml
15:59:41:WU00:FS00:0x21:Digital signatures verified
15:59:41:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
15:59:41:WU00:FS00:0x21:Version 0.0.18
15:59:56:WU00:FS00:0x21:Completed 0 out of 2000000 steps (0%)
15:59:56:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
16:05:18:WU00:FS00:0x21:Completed 20000 out of 2000000 steps (1%)
--------------------------------------------------------------------------------
00:44:22:WU00:FS00:0x21:Completed 2000000 out of 2000000 steps (100%)
00:44:32:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
00:44:33:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:10496 run:189 clone:7 gen:0 core:0x21 unit:0x000000028ca304$
--------------------------------------------------------------------------------
00:48:52:WU00:FS00:Final credit estimate, 54677.00 points
Did you try and delete the core version, and download another one? I don't think that would be the issue, but maybe when you downloaded the new core it was corrupted.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Thu Feb 02, 2017 5:18 am
by SombraGuerrero
There also has been no post to confirm that all projects are updated yet. I would expect failures to continue at the work unit level for at least a few more days. There are a lot of projects out there...

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Posted: Fri Feb 03, 2017 12:08 am
by neognomic
tictoc wrote:That is odd. I have completed 7 p10496 WUs, and 19 other WUs, in Linux with the updated core version.
Thanks for posting that. Yes, it's odd but there are a number of factrors that differ. This is a dual CPU (Westmere) 'workstation' system using GTX1060-6G with nVidia 375.26 driver and the OS is currently kernel 4.9.6 for Arch/Manjaro.
I did an upgrade to 24GB RAM today and after a fresh boot, just tried FAH-GPU. Again all attempts failed:

Code: Select all

22:55:28:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9838 run:0 clone:9 gen:49 core:0x21 unit:0x00000032ab436ca0588272e2e89a266c
22:55:36:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9838 run:0 clone:7 gen:28 core:0x21 unit:0x0000001eab436ca0588272e23615e283
22:55:58:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11424 run:8 clone:5 gen:23 core:0x21 unit:0x0000001a8ca304f1571f9dcb8847b37c
22:57:08:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11710 run:2 clone:436 gen:64 core:0x21 unit:0x000000688ca304e75814df49cb3b3682
22:57:16:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9190 run:2 clone:35 gen:85 core:0x21 unit:0x0000008fab40415457cb2c672b93e825
22:57:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9176 run:24 clone:1 gen:259 core:0x21 unit:0x00000183ab436c6957b24c294cb303b9
22:58:22:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9180 run:38 clone:1 gen:195 core:0x21 unit:0x00000112ab436c9f57bdce05459afb73
22:58:47:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13206 run:42 clone:9 gen:6 core:0x21 unit:0x00000003ab436c665791887fe1cbe95a
22:59:23:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9179 run:3 clone:5 gen:264 core:0x21 unit:0x0000018aab436c9f57bdce04c05517f8
22:59:38:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11707 run:131 clone:18 gen:44 core:0x21 unit:0x000000308ca304f35876a53c8282c036
Those are all based on preference for "Cancer" so switched to "Any" cause and got:

Code: Select all

23:15:09:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:10496 run:29 clone:10 gen:104 core:0x21 unit:0x000000a98ca304f556bba8314719d1be
23:15:57:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9175 run:14 clone:2 gen:167 core:0x21 unit:0x00000111ab436c6957b24c289ff56d2c
...etc.
Again, all attempts failed.
Not once did it try to get Core 18 ...
tictoc wrote:Did you try and delete the core version, and download another one? I don't think that would be the issue, but maybe when you downloaded the new core it was corrupted.
No did not try deleting anything and starting over. I have stopped and disabled the foldingathome,service; restarted and tried again but same result, "Failed", every time.
However, Arch does not have a client from Stanford so I am using an AUR(Arch User Repo) "non-root" version of the FAH client. I can remove it and change to the root version, at least to see if it helps.

I did try to get the nVidia BETA too but it will not install due to some conflicts that I do not have time to chase down and fix. ...

...later.

{EDIT
Starting fresh with the 'run as root' FAH client and completely new setup(did not even reuse the config.xml file) did not help.
It did get a Core 18 first, then promptly failed:

Code: Select all

01:17:53:WU01:FS01:Started FahCore on PID 18533
01:17:53:WU01:FS01:Core PID:18537
01:17:53:WU01:FS01:FahCore 0x18 started
01:17:53:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
01:17:53:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9152 run:11 clone:6 gen:413 core:0x18 unit:0x000001ceab436c9f56623c60ccb4c6b2
...etc.
Perhaps needless to say, All the core 21 failed too.
}