Message boards : Number crunching : Who are you talking to????
Author | Message |
---|---|
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I've downloaded the User.gz zipped xml export file. One limitation is that my Excel only has 65536 rows and so it only allows me access to that many users. In order to get a small but workable random/arbitrary distribution, I chose to work with three groups of 250 userID numbers. They are: Group 1, userid 10000 to 10250, join dates 7-8 Nov 2005 Group 2, userid 30000 to 30250, join date 5 Dec 2005 Group 3, userid 50000 to 50250, join dates 13-14 Jan 2006 Some userids weren't listed in the xml so the sample isn't 750, but only 503 users. The following list is the full list of users(contained in the stats dump of the groups), none have been deleted. Of the 503 users only: 22 have ever posted to these fora (4.4%) 479 have never posted here (95.23 %) 2 I couldn't determine (.37%) Of the 22 who have posted: 11 only posted once 4 posted twice and one each have posted 4, 5, 8, 10, 21, 42, and 150 times. Of the 503 users: 198 are have a computer still attached (39.36%) 35 are hidden (6.96%) 269 are inactive (no attached puter)(53.48%) 1 was undeterminable (.2%) Can you think of any other interesting stats that can be determined from this: |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I looked in Boincstats and see they report total users as 73712 and active users as 33220. This means willy sees it at 45% (where mine shows 39%). If I add those users who are hidden, then I'm at 46%. I guess this data is "nearly" representative as a sample. Since 1/2 of the posters only posted ONCE. I wonder if they were just statements such as "I'm here", or perhaps they were unanswered questions leading to their departure? I realize those eleven only represent 2.19 % of the sample, but I wonder. Assuming it would take atleast two posts to constitute an actual conversation on these boards, then only 2.2% of ALL users have ever had a conversation here. Only 7 have posted more than twice, that's 1.4%. I wish there was a way to see who had actually "viewed" the fora. This would show how many have "looked, but not posted". I.E How many users are just "lurkers"? |
pieface Send message Joined: 20 Sep 05 Posts: 17 Credit: 797,661 RAC: 0 |
numbers numbers numbers.... I guess I am one of the lurkers, I visit fairly often (but stick pretty much to the NC forum), and seldom post unless I have a problem. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
As of this minute Alex Dixon UserID101741 is Rosetta's newest member. since I can only get 65536 to fit in excel Lithis (join date March 15th) is the newest I can show. |
John McLeod VII Send message Joined: 17 Sep 05 Posts: 108 Credit: 195,137 RAC: 0 |
As of this minute Alex Dixon UserID101741 is Rosetta's newest member. Have you thought of using access instead? It won't have the limit on the number of rows, and it should be possible to do the statistics easily. BOINC WIKI |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
I was bored and i wrote a perl-script which pumps the file into a mysql-database. Can easily be adapted to other database as well, since it's using DBI. 1) Grab that script: https://opz.ynfonatic.de/pastebin.php?dl=83 Save as gobbleup.pl or however you want to call it. Give it execution rights: chmod 755 gobbleup.pl 2) Login to Mysql as root 3) create database boinc; use boinc; 4) create table users ( id INTEGER, name VARCHAR(80), country VARCHAR(80), create_time INTEGER, total_credit REAL, expavg_credit REAL, expavg_time REAL, cpid VARCHAR(80), teamid INTEGER, url VARCHAR(80)); 5) Create a boinc-user who's allowed to push crap into the database: GRANT insert, select, update, delete ON boinc.users TO 'boinc'@'localhost' IDENTIFIED BY 'YourFavouritePassword'; 6) Adapt the variables $db, $user and $password in gobbleizer.pl 7) Grab the latest user.gz, save into the same folder where gobbleup.pl is, unpack 8) Run: "./gobbleup.pl user" 9) Wait. It took me a minute on my machine: $ time ./gobbleup.pl user real 1m20.784s user 1m3.577s sys 0m1.907s (That was on https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=268780) 10) Check if data's there: $ echo "select count(id) from users;" | mysql -u boinc -p boinc Enter password: count(id) 74949 Voila! Have fun, Alex. "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Thanks Alexander, One problem, I'm decent at boinc, less knowledgeable at Rosetta, and completely ignorant of Linux, Apache, and Mysql. I hope this info will help someone else though. thanks for taking the time to present it. tony Basically, my interest in doing some of this stuff was lessened when I found out I'd have to learn those three just to look up that stuff. LOL. That's why this board hasn't been "blessed" with more of my data. I'm currently working on cross project credit analysis, and have been thinking of doing a "decoy/hr study based on Cpu type", but haven't allocated the time for that just yet. |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
Thanks Alexander, One problem, I'm decent at boinc, less knowledgeable at Rosetta, and completely ignorant of Linux, Apache, and Mysql. I hope this info will help someone else though. thanks for taking the time to present it. He Toni, ah, never mind ;) You're welcome. When i wrote that I was just bored i really meant it (I'm on holiday since monday and mywife is still working so i got lot's of spare-time) - and i thought it would be about time learn about the dodgy XML-stuff. If you like i could give you access to my database where you can connect to. You can use normal SQL-commands to extract data from it to get data for your survey. I don't mind, the data is already there and setting up an account is a piece of cake. But... #define "decent at BOINC". You're into the the BOINC runtime-lib? I started reading a bit about the API (say: brew-your-own-science-app) and could use a hint or two... tony Alex. "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
If you like i could give you access to my database where you can connect to. You can use normal SQL-commands to extract data from it to get data for your survey. I don't mind, the data is already there and setting up an account is a piece of cake. I have NO programming skills (other than AB plc's for machine controls). I'd define "decent at boinc" as: have been here since day one of boinc. Was undergoing cancer treatment, so I had nothing better to do than to, read every thread/post at seti/boinc, get involved in helping others when I could, learn more and more from others helping others, remember what I'd seen/read, and become pretty decent at helping others get boinc going/troubleshooting issues the users have. My eyes glaze over when I see "ack, syn, html and xml tags, and anything involving programming. I.E I'm "decent at the user end of boinc", the back end is a mystery (though I do know some small bits that have unintentionally slipped past my filter, LOL) As to your offer of help, I think at the time, what I was trying to do was to get a list of random users from the "user.gz" xml dump that could be used later to manually look up info that isn't in the xml dumps. I.E I wanted an xls form with <userid>,</userid> <username> </username> (see stupid xml tags are everywhere) for say userID 14 (the first one, dr baker by the way) all the way to the end, but Excel only allows 65536 rows, so I'd need them divided into different columns. Like column A and B, skip C, then D and E, Skip F, Columns G and H, where the first would be userID(columns A,Dand G) and the second Username(columns B, E, and H). The User.gz dump is here tony |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Oh yeah, I'm also a Boinc Alpha tester, which means I download the latest software and try it out, try to find bugs, and report them. I'm not much help in offering solutions in the form of program changes. While doing this I'm also on all the mail lists so I keep informed of the changes behind the scene (some of which has forced me to learn some more about the back end). |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
if you are pondering doing this it might be nice to also have: /user/cpid /user/expavg_credit/#agg and /user/total_credit/#agg the rest of the user.gz is not needed. tony |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
if you are pondering doing this it might be nice to also have: Tony, piece of cake, can do that; i was just about to wander off and go to bed, let's discuss the details tomorrow. Cheers from .de, Alex. "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
well, thats' mighty nice. Just for clarity, what I'm hoping to get would look something like this: That includes every user in that file. have a good sleep. tony |
Stefan Send message Joined: 12 Feb 06 Posts: 5 Credit: 15,058 RAC: 0 |
Wow, interesting stats, I've been posting on other boards, but I thought I'd check this one out too, after all I did rejoin Rosetta this past month :) Its kind of interesting to see how many users don't post on boards, problems might not be completely reported... Human Stupidity Is Infinite... |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
That isn't so surprising. Several DC projects or (in case of BOINC) DC platforms are advertized as screensavers, install them, enjoy them and do something for science. A lot of people who installed them for this purpose will not even notice if something is not like it should be. A screensaver that you have to take care of and even upgrade now and then is not really what those people wanted, they need something to install and forget. If a screensaver doesn't work at all, people will use a different one. If it produces errors but basically looks like a screensaver, they will enjoy it. |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
well, thats' mighty nice. Just for clarity, what I'm hoping to get would look something like this: I found Excel to cumbersome, so i went for CSV-files; you can easily load them in Excel. There is a Spreadsheet::Excel module for perl, but i ain't no perl-programmer usually so I've chosen the lazy way... :) The script i used to generate the CSV-files: gobbledown.pl Needs the XML::Parser module. The CSV-file with two users on each line: user2.csv.gz (37522 lines) The CSV-file with three users on each line: user3.csv.gz (25015 lines) have a good sleep. Oh i had a very good sleep until one of my %$!"-colleagues who knows that i'm on holiday and who also knows that I enjoy a long sleep called me up early in the morning: Oh, hi, it's me, $arse, i know you're on holiday, just have a short^Wdumb question... He deserves something more painful than death... :) Have fun with that stuff; i'd advice you to get perl for Windows if you don't have some unixish machine handy; you can download perl on Active State's Homepage. tony Cheers, Alex - who's going to make coffee now. "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Cheers, Alex - who's going to make coffee now.It's 4:32 a.m here, I will go make the coffee. I get up early so I can work outdoors before it hits 100F. I'm building brick columns/pillars (8) along the gate and front of the property (part of wifes' "honey-do" list. |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
It's 4:32 a.m here, I will go make the coffee. I get up early so I can work outdoors before it hits 100F. We had quite a long heat-wave in Germany. It was a constant 30-35 centigrees for weeks without rain; not even a thunderstorm any now and then. This all relaxed a couple of days ago. Now we got splendid fine german rainy summer-weather :) Which i really enjoy much, 'cause i got my office in the attic. Rooftiles keep heating up during the day, casting their ray-of-death into the house all day and night. I'm building brick columns/pillars (8) along the gate and front of the property (part of wifes' "honey-do" list. Uh-oh... I promised my wife to build her a new kitchen during my holidays. I know exactly what you're talking about... ;-) Would be nice if you could tell me if that CSV-stuff is what you want; although it would be probably easier for you to install Perl so that you can brew-your-own file the next time. I'd be happy to give you a hand, so if you got questions, drop me line. Cheers, Alex (slurping coffee. Hmmm, good coffee...) "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
both the user2 and user3.gz links point to the user2.gz file. I haven't opened them yet. |
Alexander W. Janssen Send message Joined: 31 May 06 Posts: 33 Credit: 97,311 RAC: 0 |
both the user2 and user3.gz links point to the user2.gz file. I haven't opened them yet. Sorry, was a typo... user2.csv.gz (37522 lines) user3.csv.gz (25015 lines) Sorry for the confusion, Alex. "I am tired of all this sort of thing called science here... We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." -- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901. |
Message boards :
Number crunching :
Who are you talking to????
©2024 University of Washington
https://www.bakerlab.org