Page 1 of 1

Stats files and unique id's

Posted: Tue Apr 08, 2008 4:53 pm
by Bok
Hi,

I started looking at doing all stats for Folding@Home at the free-dc stats site I run. I can parse and interpret the data into mysql adding all the ranking and such easily enough, but I get problems because the data contains non-unique id's. http://fah-web.stanford.edu/daily_user_summary.txt

Now, I presume that internally a unique id is used, just like in the team file, so is there any reason that this id is not in the user file at all or am I missing something ?

Thanks

Bok

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 5:26 pm
by 7im
All I see on the Team stats are the team number, team name, total score, and total WU. I don't see an "internally unique ID" on the Team list. Which number is the unique ID in your reference?



P.S. Welcome to the forum.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 5:43 pm
by Bok
thanks for the welcome !

teamnumber is unique in the teamfile, that's what's missing in the user file, a 'usernumber'

Bok

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 5:51 pm
by ChelseaOilman
He's probably talking about what looks like duplicate user names. What looks like duplicates aren't. Many people use an email address as their user name and Pandegroups policy is to only post the first part before the @ sign.
If you choose your email address as your username, we will NOT print your full email address. Instead, just the part before the @ sign will be used in any stats listing, etc.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 5:52 pm
by ChelseaOilman
Bok wrote:teamnumber is unique in the teamfile, that's what's missing in the user file, a 'usernumber'
Can you quote an example? I'm not sure what you mean.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 5:55 pm
by 7im
I don't think there is a unique "usernumber" because anyone can use "John" as a user name. I could configure my client to submit work units to the username Bok if I wanted. But there is no way to distinguish between the points you submit to that user account from the points I submit to that account if only looking at a user name.

As a result, some stats sites arbitrarily assign a record number to each user name, and then display each user and team # combo separately. Some sites combine all the Johns in to one account for display purposes. There is no better or easier way to do it, AFAIK. Handle it how you best see fit.

I haven't had to deal with this personally, so I'll let someone else with more Stats experience comment further. Sorry.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 6:17 pm
by Bok
yup, I agree that's what I've done in the past but it's not really optimum.

This post is more to see if the folding@home admins would perhaps modify the file to contain a unique id to get around these issues and allow the various stats sites to be more accurate :) They must hold an internal id otherwise how would the system itself know where to post points too if you changed your username to be 'Bok'....

Do the admins read the boards at all ?

Bok

p.s. re-reading your post, if there were no way to distinguish, then shouldn't there NOT be non-uniques in the output file? As the folding backend software would not be able to distinguish either and therefore lump them together?? But there are non-uniques which makes me think there is an internal identifier somehow.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 6:28 pm
by 7im
Bok wrote:yup, I agree that's what I've done in the past but it's not really optimum.

This post is more to see if the folding@home admins would perhaps modify the file to contain a unique id to get around these issues and allow the various stats sites to be more accurate :)

Do the admins read the boards at all ?

Bok
Yes, Pande Group members do read and post. Check Vijay's post count.

Questions. Your computer and my computer both submit a work unit to the user name Bok. Should that username get two unique IDs, or just one? Is there a benefit from either choice? If two IDs, how does Stanford know which one of those two computers is yours, and which one is mine?

The problem is that Stanford can't tell them apart. That's the big flaw. Stanford has no way to distinguish unique users who all use the same username of "John" With no way to distinguish between them, there is no good way to assign unique IDs.

Hence the addition of a Passkey number (unique identifier) in the v6 client. However, those are confidential, so that doesn't help your problem, even in the future when we all start using v6 clients.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 6:33 pm
by Bok
7im wrote:
Yes, Pande Group members do read and post. Check Vijay's post count.

Questions. Your computer and my computer both submit a work unit to the user name Bok. Should that username get two unique IDs, or just one? Is there a benefit from either choice? If two IDs, how does Stanford know which one of those two computers is yours, and which one is mine?
yes, ideally they would be differing ID's, much like most other projects would give. Either that or prevent a user registering the same name as is already taken. (is this the case ? - it doesn't appear to be to me)
7im wrote: The problem is that Stanford can't tell them apart. That's the big flaw. Stanford has no way to distinguish unique users who all use the same username of "John" With no way to distinguish between them, there is no good way to assign unique IDs.
If that's the case, then yes it's the big flaw, but see my added comments to my previous post and you'll see why I was thinking they were holding an internal id somehow.
7im wrote: Hence the addition of a Passkey number (unique identifier) in the v6 client. However, those are confidential, so that doesn't help your problem, even in the future when we all start using v6 clients.
true.

So for now, I'll just lump them together. :mrgreen:

Bok

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 6:54 pm
by 7im
There are no easy answers. Lumping is the least troublesome, and is what most other Stats sites do.


EDIT: I also sent you PM.

Re: Stats files and unique id's

Posted: Tue Apr 08, 2008 8:45 pm
by Bok
ok,

preliminary stats are at http://stats4.free-dc.org/stats.php?page=teams&proj=fah

I initially ran it off some old files as I was tweaking the scripts, then ran it against the current data, so the 'last update' is probably for a few weeks worth of data.

This needs to run for a few days to get the data looking consistent.

Should I filter out the default team and anonymous/PS3 user ?

Bok

Re: Stats files and unique id's

Posted: Wed Apr 09, 2008 3:50 pm
by VijayPande
People have nailed down most of the issues. I'll just elaborate that while the passkeys are private info, we could expose a unique identifier (different from the passkey) for 3rd party stats to use to distinguish donors that have given us passkeys.