9401 fails on GM107 but not GK106 {Hopefully fixed}

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
tofuwombat
Posts: 19
Joined: Mon Nov 22, 2010 4:06 pm

Re: 9401 fails on 750ti

Post by tofuwombat »

I think this proves that there is a need for a CUDA flag, and/or assignment server tweaks for this card.
7im wrote:
tofuwombat wrote:Seems the CUDA core was written well enough to keep working with new CUDA hardware.
OpenCL in Fahcore_17 does not appear to be the same.

. . . If that is unclear there, ask here!
My lack of clarity is the point to my babbling today.

Everyone's help and patience is MUCH appreciated.
Freightanimal
Posts: 9
Joined: Tue Mar 11, 2014 11:12 pm
Location: Pennsylvania

Re: 9401 fails on 750ti

Post by Freightanimal »

tofuwombat wrote:I think this proves that there is a need for a CUDA flag, and/or assignment server tweaks for this card.
7im wrote:
tofuwombat wrote:Seems the CUDA core was written well enough to keep working with new CUDA hardware.
OpenCL in Fahcore_17 does not appear to be the same.

. . . If that is unclear there, ask here!
My lack of clarity is the point to my babbling today.

Everyone's help and patience is MUCH appreciated.

I second that thought (or some other measure that can keep these cards folding). I am not sure how many of us have the new card. Most of us are most likely not folding with it because of the core 17 issues. I keep deleting the gpu in configuration and add it back to have it continue to try for core 15 work units. I am seeing 15k ppd on core 15 units (I can usually do 3 per day) when I can get them. so far none today. I don't care about the points, I care about the project and the help it can do for science and medicine. I don't game at all. Folding is the biggest reason I bought this card instead of one for 1/3 it's cost (was going to get gt520).
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti

Post by bruce »

One of my machines has a GPU that is normally happy folding with Core_17. Recently there was a server problem which made it impossible to download WUs for Core_17. During that time, WUs were available for Core_15 or Core_16. Like you, I have no control over the assignment process. Frankly, I'm glad to be able to fold rather than have my GPU sitting idle, waiting for an assignment from a particular group of projects.

The most important job of the assignment process is to give everyone's hardware something it can do even under extenuating situations when a preferred choice is not available.
Zagen30
Posts: 823
Joined: Tue Mar 25, 2008 12:45 am
Hardware configuration: Core i7 3770K @3.5 GHz (not folding), 8 GB DDR3 @2133 MHz, 2xGTX 780 @1215 MHz, Windows 7 Pro 64-bit running 7.3.6 w/ 1xSMP, 2xGPU

4P E5-4650 @3.1 GHz, 64 GB DDR3 @1333MHz, Ubuntu Desktop 13.10 64-bit

Re: 9401 fails on 750ti

Post by Zagen30 »

Anyone with a 750 (Ti) could temporarily install the v6 GPU client, as it cannot get core 17 projects and would therefore only get core 15.
Image
Freightanimal
Posts: 9
Joined: Tue Mar 11, 2014 11:12 pm
Location: Pennsylvania

Re: 9401 fails on 750ti

Post by Freightanimal »

Zagen30 wrote:Anyone with a 750 (Ti) could temporarily install the v6 GPU client, as it cannot get core 17 projects and would therefore only get core 15.
Thanks for the info. I personally wouldn't want to do that. Unfortunately I kind of doubt most people will either. Hopefully they can fix core 17 soon.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti

Post by bruce »

Freightanimal wrote:Hopefully they can fix core 17 soon.
A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything except to provide better support for Maxwell.
uddarts
Posts: 21
Joined: Sun Feb 26, 2012 3:10 pm

Re: 9401 fails on 750ti

Post by uddarts »

3 weeks and we find out they don't have enough info.

win7 64bit, 3770 running 6 cores and 334.89 drivers.

all core 17 wu fail without engaging the gpu.

ud
win7 64bit / amd 630 cpu 3 / 750ti - gm 107 / 337.88 drivers / v7.4.4
bfromcolo
Posts: 56
Joined: Fri Mar 01, 2013 1:12 am

Re: 9401 fails on 750ti

Post by bfromcolo »

My system with the 750ti is Ubuntu 12.04 64-bit and a 1045T (6 cores). If there is any information I can provide tell me what you need.

Note my Win 7 system has a 8320 (8 cores) and a 7850, and it has completed 9401s without a problem.
rwh202
Posts: 410
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: 9401 fails on 750ti

Post by rwh202 »

bruce wrote: A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything.
Can we at least state that there is a bug though? Whether it is Core_17 or driver, 750 Tis do not fold core_17 as it stands. This is trivial to reproduce. Users reeling off their setups here isn't going to help - it just needs debugging by Stanford and, if necessary, tickets raising with nVIDIA.
folding_hoomer
Posts: 349
Joined: Sun Feb 10, 2013 6:06 pm
Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10

Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz

Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300
Location: Bavaria, Germany

Re: 9401 fails on 750ti

Post by folding_hoomer »

bruce wrote:
Freightanimal wrote:Hopefully they can fix core 17 soon.
A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything.

Do you happen to be folding with 7 CPUs? I beginning to suspect that project 9401 fails with 7 cores and its assignments need to be restricted, but I don't have enough information to propose such a change. My 640ti seems to work just fine and it's unlikely that the problem is JUST the 750ti.
Bruce - your suggestion that -smp7 and WU 9401 could be the reason for any issue might be wrong.
I´m folding under Ubuntu 13.04 with -smp7 (no isuue for half a year) and my GTX670 is ATM folding one 9401 after the other - without any issue, too.
IMO it has something to do with the changed structure of the Maxwell-GPU respectivly the (different) handling of Core17.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti

Post by bruce »

Yes, it might or might not be smp 7. I could also suggest that the problem is overclocking or defective hardware or a bad set of drivers. I so not have any way to know for sure, nor do I have a system which fails, so for me, my statement that I have no problem is just as valid as your statement that you do have a problem. What's different?

Maxwell is a high probability reason. Development is already working on some issues associated with Maxwell and there's nothing you can do until they finish. If there is anything else that you might change to get your system into production, I think those reasons should be explored, but that's not required if you're certain you know what makes you unique.
Sam-I-Am
Posts: 18
Joined: Mon Oct 29, 2012 2:34 am

Re: 9401 fails on 750ti [Maxwell]

Post by Sam-I-Am »

Apparently FahBench (w/ OpenMM 5.1) ran successfully on GTX 750 Ti, with both implicit and explicit solvent.
Does anyone know what's different between FahBench and FahCore17, regarding setting up tasks for the
GPU, before calling OpenMM 5.1?

The issue with GTX 750 Ti might be due to the new "Unified Virtual Memory" feature in Maxwell.
I believe this feature allows CPU code and GPU code to reside in the same virtual memory space.
This is not possible with NVIDIA GPU architecture prior to Maxwell.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti [Maxwell]

Post by bruce »

The FahCore that works on Fermi/Keppler doesn't use that new feature. Since you used the word "allows" which implies that software is not required to use unified memory. How could that prevent the existing code from working?
rwh202
Posts: 410
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: 9401 fails on 750ti [Maxwell]

Post by rwh202 »

bruce wrote:I so not have any way to know for sure, nor do I have a system which fails, so for me, my statement that I have no problem is just as valid as your statement that you do have a problem.
It might be valid, but is it relevant? "I have two goldfish" is valid, but hardly relevant.

The topic here is that core_17 does not fold on maxwell. No one has got it working. Maybe every maxwell chip is defective and overclocked to the hilt, but seems unlikely. Does someone at Stanford want a stock 750/ti to test? I'm sure newegg/amazon can get one to them for Monday if you give me an address.
Sam-I-Am
Posts: 18
Joined: Mon Oct 29, 2012 2:34 am

Re: 9401 fails on 750ti [Maxwell]

Post by Sam-I-Am »

bruce wrote:The FahCore that works on Fermi/Kappler doesn't use that new feature. Since you used the word "allows" which implies that software is not required to use unified memory. How could that prevent the existing code from working?
From the published GTX 750 Ti benchmark results from Tom's and AnandTech, I think the problem is probably
unrelated to OpenMM or OpenCL, and it's probably related to CPU <-> GPU communication. Specifically, memory
barrier, and memory coherence come to mind.

To quote Dr. Pande, "... because of how our old core 15 and 16 was written, it was in fact easier for us to write
the core (17) from scratch." Perhaps in the new FahCore17 code, certain assumptions are made, about how future
GPU will communicate with CPU, and these assumptions are now no longer valid in Maxwell. However, comparable
assumptions made in FahCore15 are still valid. Hence FahCore15 still runs fine, though with degraded performance
(due to the fact that FahCore15 does not support OpenMM 5.1).
Post Reply