Stats files and unique id's
Moderator: Site Moderators
Stats files and unique id's
Hi,
I started looking at doing all stats for Folding@Home at the free-dc stats site I run. I can parse and interpret the data into mysql adding all the ranking and such easily enough, but I get problems because the data contains non-unique id's. http://fah-web.stanford.edu/daily_user_summary.txt
Now, I presume that internally a unique id is used, just like in the team file, so is there any reason that this id is not in the user file at all or am I missing something ?
Thanks
Bok
I started looking at doing all stats for Folding@Home at the free-dc stats site I run. I can parse and interpret the data into mysql adding all the ranking and such easily enough, but I get problems because the data contains non-unique id's. http://fah-web.stanford.edu/daily_user_summary.txt
Now, I presume that internally a unique id is used, just like in the team file, so is there any reason that this id is not in the user file at all or am I missing something ?
Thanks
Bok
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats files and unique id's
All I see on the Team stats are the team number, team name, total score, and total WU. I don't see an "internally unique ID" on the Team list. Which number is the unique ID in your reference?
P.S. Welcome to the forum.
P.S. Welcome to the forum.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Stats files and unique id's
thanks for the welcome !
teamnumber is unique in the teamfile, that's what's missing in the user file, a 'usernumber'
Bok
teamnumber is unique in the teamfile, that's what's missing in the user file, a 'usernumber'
Bok
-
- Posts: 1037
- Joined: Sun Dec 02, 2007 3:47 pm
- Location: Colorado @ 10,000 feet
Re: Stats files and unique id's
He's probably talking about what looks like duplicate user names. What looks like duplicates aren't. Many people use an email address as their user name and Pandegroups policy is to only post the first part before the @ sign.
If you choose your email address as your username, we will NOT print your full email address. Instead, just the part before the @ sign will be used in any stats listing, etc.
-
- Posts: 1037
- Joined: Sun Dec 02, 2007 3:47 pm
- Location: Colorado @ 10,000 feet
Re: Stats files and unique id's
Can you quote an example? I'm not sure what you mean.Bok wrote:teamnumber is unique in the teamfile, that's what's missing in the user file, a 'usernumber'
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats files and unique id's
I don't think there is a unique "usernumber" because anyone can use "John" as a user name. I could configure my client to submit work units to the username Bok if I wanted. But there is no way to distinguish between the points you submit to that user account from the points I submit to that account if only looking at a user name.
As a result, some stats sites arbitrarily assign a record number to each user name, and then display each user and team # combo separately. Some sites combine all the Johns in to one account for display purposes. There is no better or easier way to do it, AFAIK. Handle it how you best see fit.
I haven't had to deal with this personally, so I'll let someone else with more Stats experience comment further. Sorry.
As a result, some stats sites arbitrarily assign a record number to each user name, and then display each user and team # combo separately. Some sites combine all the Johns in to one account for display purposes. There is no better or easier way to do it, AFAIK. Handle it how you best see fit.
I haven't had to deal with this personally, so I'll let someone else with more Stats experience comment further. Sorry.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Stats files and unique id's
yup, I agree that's what I've done in the past but it's not really optimum.
This post is more to see if the folding@home admins would perhaps modify the file to contain a unique id to get around these issues and allow the various stats sites to be more accurate They must hold an internal id otherwise how would the system itself know where to post points too if you changed your username to be 'Bok'....
Do the admins read the boards at all ?
Bok
p.s. re-reading your post, if there were no way to distinguish, then shouldn't there NOT be non-uniques in the output file? As the folding backend software would not be able to distinguish either and therefore lump them together?? But there are non-uniques which makes me think there is an internal identifier somehow.
This post is more to see if the folding@home admins would perhaps modify the file to contain a unique id to get around these issues and allow the various stats sites to be more accurate They must hold an internal id otherwise how would the system itself know where to post points too if you changed your username to be 'Bok'....
Do the admins read the boards at all ?
Bok
p.s. re-reading your post, if there were no way to distinguish, then shouldn't there NOT be non-uniques in the output file? As the folding backend software would not be able to distinguish either and therefore lump them together?? But there are non-uniques which makes me think there is an internal identifier somehow.
Last edited by Bok on Tue Apr 08, 2008 6:29 pm, edited 1 time in total.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats files and unique id's
Yes, Pande Group members do read and post. Check Vijay's post count.Bok wrote:yup, I agree that's what I've done in the past but it's not really optimum.
This post is more to see if the folding@home admins would perhaps modify the file to contain a unique id to get around these issues and allow the various stats sites to be more accurate
Do the admins read the boards at all ?
Bok
Questions. Your computer and my computer both submit a work unit to the user name Bok. Should that username get two unique IDs, or just one? Is there a benefit from either choice? If two IDs, how does Stanford know which one of those two computers is yours, and which one is mine?
The problem is that Stanford can't tell them apart. That's the big flaw. Stanford has no way to distinguish unique users who all use the same username of "John" With no way to distinguish between them, there is no good way to assign unique IDs.
Hence the addition of a Passkey number (unique identifier) in the v6 client. However, those are confidential, so that doesn't help your problem, even in the future when we all start using v6 clients.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Stats files and unique id's
yes, ideally they would be differing ID's, much like most other projects would give. Either that or prevent a user registering the same name as is already taken. (is this the case ? - it doesn't appear to be to me)7im wrote:
Yes, Pande Group members do read and post. Check Vijay's post count.
Questions. Your computer and my computer both submit a work unit to the user name Bok. Should that username get two unique IDs, or just one? Is there a benefit from either choice? If two IDs, how does Stanford know which one of those two computers is yours, and which one is mine?
If that's the case, then yes it's the big flaw, but see my added comments to my previous post and you'll see why I was thinking they were holding an internal id somehow.7im wrote: The problem is that Stanford can't tell them apart. That's the big flaw. Stanford has no way to distinguish unique users who all use the same username of "John" With no way to distinguish between them, there is no good way to assign unique IDs.
true.7im wrote: Hence the addition of a Passkey number (unique identifier) in the v6 client. However, those are confidential, so that doesn't help your problem, even in the future when we all start using v6 clients.
So for now, I'll just lump them together.
Bok
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats files and unique id's
There are no easy answers. Lumping is the least troublesome, and is what most other Stats sites do.
EDIT: I also sent you PM.
EDIT: I also sent you PM.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Stats files and unique id's
ok,
preliminary stats are at http://stats4.free-dc.org/stats.php?page=teams&proj=fah
I initially ran it off some old files as I was tweaking the scripts, then ran it against the current data, so the 'last update' is probably for a few weeks worth of data.
This needs to run for a few days to get the data looking consistent.
Should I filter out the default team and anonymous/PS3 user ?
Bok
preliminary stats are at http://stats4.free-dc.org/stats.php?page=teams&proj=fah
I initially ran it off some old files as I was tweaking the scripts, then ran it against the current data, so the 'last update' is probably for a few weeks worth of data.
This needs to run for a few days to get the data looking consistent.
Should I filter out the default team and anonymous/PS3 user ?
Bok
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: Stats files and unique id's
People have nailed down most of the issues. I'll just elaborate that while the passkeys are private info, we could expose a unique identifier (different from the passkey) for 3rd party stats to use to distinguish donors that have given us passkeys.