Pausing mitigates TDR bug?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Pausing mitigates TDR bug?

Post by csvanefalk »

I fold on a GTX770, using the 319.76 Linux drivers on a Fedora 20 box.

While I was previously rebooting every 36 hours to avoid the TDR bug, I have noticed that pausing the folding seems to have the same effect. Letting the card rest for 5-10 minutes between each WU, I am now approaching 72 hours of folding without rebooting.

Can anyone confirm if this is expected behavior?
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Pausing mitigates TDR bug?

Post by 7im »

The TDR bug was simply time related. Didn't matter if you were folding or gaming or not. So pausing would have no affect.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Post by csvanefalk »

Understood, could it have something to do with my OS then? I am far past the 36-hour cutoff, and there have been no broken WU:s, no crashes, or any other symptoms of the bug at all.
ChristianVirtual
Posts: 1576
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Pausing mitigates TDR bug?

Post by ChristianVirtual »

I expirienced the TDR bug mainly on GTX 780 at that time; with GK110 chipset (also Titan and 780Ti). The 770 has GK104.

With newer driver the TDR got fixed; but GK104 based card got slower (like my 660TI). I split my GPU in different system and gave each a matching driver. TDR not seen for 9 month or so.
ImageImage
Please contribute your logs to http://ppd.fahmm.net
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Pausing mitigates TDR bug?

Post by 7im »

csvanefalk wrote:Understood, could it have something to do with my OS then? I am far past the 36-hour cutoff, and there have been no broken WU:s, no crashes, or any other symptoms of the bug at all.
2 options. Pre-TDR bug driver version. Or the GPU did a reset. Check the FAH logs to see if there are any folding interruptions in the last 2 days other than your pausing the client.

Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Pausing mitigates TDR bug?

Post by bollix47 »

7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.
AFAIK that version of the core is Windows only at this time.
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Pausing mitigates TDR bug?

Post by 7im »

bollix47 wrote:
7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.
AFAIK that version of the core is Windows only at this time.
Yep. Time to poke Prot again.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
heikosch
Posts: 110
Joined: Thu Apr 30, 2009 7:31 pm
Hardware configuration: [email protected]
[email protected]

[email protected]
GTX460@800MHz
Location: Essen, Germany

Re: Pausing mitigates TDR bug?

Post by heikosch »

7im wrote:
bollix47 wrote:
7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.
AFAIK that version of the core is Windows only at this time.
Yep. Time to poke Prot again.
In my opinion v55 is still beta.

Heiko
Image Image
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Pausing mitigates TDR bug?

Post by 7im »

Operationally, yes (simply because no one has moved it to public yet).

Is there some functional reason you think they should not release it as public?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
heikosch
Posts: 110
Joined: Thu Apr 30, 2009 7:31 pm
Hardware configuration: [email protected]
[email protected]

[email protected]
GTX460@800MHz
Location: Essen, Germany

Re: Pausing mitigates TDR bug?

Post by heikosch »

7im wrote:Operationally, yes (simply because no one has moved it to public yet).

Is there some functional reason you think they should not release it as public?
No but I´ve no idea who decides about the public release of a fahcore and why it takes so long to release an obviously working fahcore version.

Heiko
Image Image
Image
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Post by csvanefalk »

7im - I can't identify with either of the cases you mentioned. The driver version is 319.76, and I have had the TDR issue with it earlier:

Code: Select all

[christopher@chrisdesktop ~]$ nvidia-smi 
Fri Jul 18 07:44:01 2014       
+------------------------------------------------------+                       
| NVIDIA-SMI 5.319.76   Driver Version: 319.76         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 770     Off  | 0000:03:00.0     N/A |                  N/A |
| 50%   66C  N/A     N/A /  N/A |      688MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+
I also cannot find any evidence in the log of the GPU resetting, apart from me pausing it (too large to post here):

http://hastebin.com/zomevafedu.coffee
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Pausing mitigates TDR bug?

Post by 7im »

The bug, as reported in the NV forum, was time based. You are welcome to look it up.

Also keep trying your pause trick. Does it work consistently, or just this once on a while? Let us know.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Post by csvanefalk »

I have not used the pause trick for at least 48 hours, and the folding process continues without error. There appear to be no traces of the bug at all. I wish I could determine exactly how I got to this stage for the benefit of other Linux GPU folders, but the only major change I can recall doing was to recompile the driver after updating to kernel 3.15.

Complete log is here: http://hastebin.com/bahegomewu.coffee
Post Reply