BOINC over a cluster

Message boards : Number crunching : BOINC over a cluster

To post messages, you must log in.

AuthorMessage
Profile Dragokatzov

Send message
Joined: 5 Oct 05
Posts: 25
Credit: 2,446,376
RAC: 0
Message 7731 - Posted: 27 Dec 2005, 17:10:05 UTC

I see over at the Folding@home forums theres a sticky with LOTS of information on setting up a diskless stack. I was woundering would it be possible to rin BOINC with Rosetta on one of these? how many cpu's does BOINC support anyways? thanks for your information!
Victory is the ONLY option!
ID: 7731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Honza

Send message
Joined: 18 Sep 05
Posts: 48
Credit: 173,517
RAC: 0
Message 7732 - Posted: 27 Dec 2005, 17:28:31 UTC

It is up to you how many CPUs can BOINC use
Just go to the General preferences ( https://boinc.bakerlab.org/rosetta/prefs.php?subset=global ) and set "On multiprocessors, use at most ... processors"
It should be fine with 8 if you get to a quad dual-core Opteron machine for example.

When running a farm, I would suggest to use BOINCView ( http://boincview.amanheis.de/ ) to monitor/control machines.

You can run BOINC over a network, from a Ramdrive or even from a flash disk [only CPDN is not suitable for this since it is quite HD space and I/O demanding].
ID: 7732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 13
Message 7735 - Posted: 27 Dec 2005, 18:04:25 UTC

One warning is that you need a separate installation of BOINC for each host, and each needs it's own access to the internet. If your "stack" appears to the OS as a single computer with multiple processors, all is well; if each boots individually, you have multiple "hosts", and you'll have to do a bit more work to make BOINC work correctly.

ID: 7735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
J D K
Avatar

Send message
Joined: 23 Sep 05
Posts: 168
Credit: 101,266
RAC: 0
Message 7745 - Posted: 27 Dec 2005, 20:12:41 UTC

I have seen 64....
BOINC Wiki

ID: 7745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John McLeod VII
Avatar

Send message
Joined: 17 Sep 05
Posts: 108
Credit: 195,137
RAC: 0
Message 8141 - Posted: 1 Jan 2006, 22:08:04 UTC

There is nothing that I have seen in the code that limits the number of CPUs. However, the communications between the BOINC daemon and the project applications is through shared memory - and AFAIK clusters do not do shared memory, thus requiring a separate installation for each host.


BOINC WIKI
ID: 8141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 8169 - Posted: 2 Jan 2006, 9:56:51 UTC - in response to Message 8141.  

There is nothing that I have seen in the code that limits the number of CPUs. However, the communications between the BOINC daemon and the project applications is through shared memory - and AFAIK clusters do not do shared memory, thus requiring a separate installation for each host.


In principle either a software hack (*) or a hardware hack could provide shared memory across a cluster. But nobody has done one as far as I know because you would not want to anyway.

With many cpus sharing the same memory you reach a bottleneck where they are fighting one another for access to that memory. Even if memory speeds can be pushed a little, the speed of light limits how far the memory can be away from the chip at a given clock speed. If you had a 1GHz memory bus for example light would travel 1 foot (30cm) in that time, and because of the ask/fetch model that limits the memory to be within 6in (15cm). Or 4in (10cm) as the signals travel at 'only' 2/3 the speed of light.

The consequence is that two real cpus will not give quite twice the crunch as a single cpu; by the time you get to four cpus in one box the effect should be noticeable even if they are real cpus in separate sockets.

If you got to (say) 64 cpus the whole thing would be memory bound and adding more cpus would just slow it down. The cpus would spend more time haggling over memory access than doing real work. Well before we see the 64-cpu motherboard my guess is we will see motherboards with several groups of 2-4 cpus, each group having its own RAM...

River~~

(*) the software hack is to associate a file with the shared memory area, and every member of the cluster use the same file. If the word 'performance' has just walked across your mind, you just independently figured out why you wouldn't want to do it...
ID: 8169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 8170 - Posted: 2 Jan 2006, 10:17:26 UTC - in response to Message 7732.  

You can run BOINC over a network, from a Ramdrive or even from a flash disk.


Ouch!

Please don't use a flash disk directly.

You only have a few times 10,000 rewrites on a flash disk before you wear out the memory. The flash mechanism is fine for photos and passing files from box to box, but not for uses where the file is repeatedly re-written. Remember it was developped from write once read many (WORM) memories used for BIOS, and so on. It is just not designed for this job.

Reading a flash memory does not wear it out, writing does.

If you want to do run from a flash drive, figure out how to make you flash disk copy itself onto a ramdisk before BOINC starts and copy itself back after it stops. This means of course that a power failure wipes out all work that hasn't been reported, as it is all on the ramdisk, but on Rosetta we're used to poor checkpointing anyway ;-(.

Linux users might be able to achieve the same effect by setting 'async' in fstab - but check whether the kernel really does no writes till umount. Personally I would still want to do a copy to ramdisk before and after because I haven't looked at the kernel code.

If you are running Rosetta it ls less of a problem than running Einstein - Rosetta only does about 10 checkpionts per wu, Einstein (Albert app) 1 checkpoint every few sec. Rosetta would take only a thousand wu to wear out the memory in your slot directories. That is three wu a day for a year.
Albert could wear it out in a few dozen wu...

ID: 8170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PCZ

Send message
Joined: 16 Sep 05
Posts: 26
Credit: 2,024,330
RAC: 0
Message 8228 - Posted: 3 Jan 2006, 1:03:04 UTC
Last modified: 3 Jan 2006, 1:19:27 UTC

What you have seen described as a discless stack are PXE boot nodes.

I and many others use these.
Basically you have one Server which runs DHCP, TFTP and NFS.
The nodes are just basic MB's without any HD's.
The nodes boot up over the network and load a kernel via TFTP.
The OS is then mounted as an NFS share from the Server.
There are other NFS shares setup for the various DC projects that you want to run.

So the nodes have no HD's and all there files are on a central Server.
However once booted they are individual entities, each node runs it's own copy
of boinc.
You control them via telnet or SSH.
Each one of the discless nodes is infact a linux PC but without Screen or HD's.

Folks use these clusters because the node cost is very low, just MB, RAM, CPU and PSU.
Adding extra nodes only takes seconds, all that is required to add a node is a quick edit of dhcpd.conf to add the MAC address and some NFS exports for the new node.


These PXE stacks are not true clusters as in they are not slaves to the central server, they are individual PC's using resources provided by a Server but not controlled by it.



ID: 8228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
raukondil

Send message
Joined: 6 Jan 06
Posts: 4
Credit: 6,465,556
RAC: 1,277
Message 8828 - Posted: 12 Jan 2006, 9:27:59 UTC

did anyone have experience in using openmosix with boinc???
i've read sometimes in forums that this is not possible, but i'm not quite sure...
would be happy to know if it is possible ...

@DEV-team
do i have to change the code of the boinc manager or just the project-code...

answers are welcome

greeting david
ID: 8828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 8836 - Posted: 12 Jan 2006, 12:13:53 UTC

The current structure and design of BOINC does not support this concept.

Long term there is some vague plan to maybe do something along these lines. But, the demand for this type of hosting is not really there. And, with BOINC running on a per-node basis it simulates a very loosly coupled cluster ...

For example, I have 9 computers, all running BOINC and each computer manages its work but the main account records all the information from my "cluster".

Also, I manage the group of computers with BOINC View as a "Cluster Console" ...

So, without development work on BOINC it is not currently possible. The biggest stumbling block is the use of shared memory segments to send messages between the BOINC Daemon and the Science Applications.

Changes would have to include modifications to the science applications also, or, a 'stub" Daemon with shared memory that would run on each cluster node linked to a BOINC Daemon/Manager on the cluster control using some other mechanism.

Bottom line, probably not worth the effort as there is no particular gain to be had.
ID: 8836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,675,695
RAC: 11,002
Message 8857 - Posted: 12 Jan 2006, 16:54:56 UTC - in response to Message 8170.  

You can run BOINC over a network, from a Ramdrive or even from a flash disk.


Ouch!

Please don't use a flash disk directly.

You only have a few times 10,000 rewrites on a flash disk before you wear out the memory. The flash mechanism is fine for photos and passing files from box to box, but not for uses where the file is repeatedly re-written. Remember it was developped from write once read many (WORM) memories used for BIOS, and so on. It is just not designed for this job.

Reading a flash memory does not wear it out, writing does.

If you want to do run from a flash drive, figure out how to make you flash disk copy itself onto a ramdisk before BOINC starts and copy itself back after it stops. This means of course that a power failure wipes out all work that hasn't been reported, as it is all on the ramdisk, but on Rosetta we're used to poor checkpointing anyway ;-(.



JFI you can do this using the WinXP embedded (XPe) Enhanced Write Filter (by overwriting a couple of files from XPe to XP). There's loads of info on it at www.mp3car.com I'm working on one at the moment but having I/O problems with my CF IDE converter...

ID: 8857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 8876 - Posted: 12 Jan 2006, 19:57:07 UTC - in response to Message 8228.  
Last modified: 12 Jan 2006, 19:57:52 UTC

What you have seen described as a discless stack are PXE boot nodes.

I and many others use these.
Basically you have one Server which runs DHCP, TFTP and NFS.
The nodes are just basic MB's without any HD's.
The nodes boot up over the network and load a kernel via TFTP.
The OS is then mounted as an NFS share from the Server.
There are other NFS shares setup for the various DC projects that you want to run.


Sounds interesting. Ive already got Linux going on my LAN, but so far with several separate boxes and fixed IP addresses. Can you recommend a good link to get me going in DCHP, TFTP, NFS & PXE, none of which I've used before?

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?

As well as learning the software, I also need to know stuff like how do I work out how much power is needed (same question really as how many MB can I run off one power supply if there are no extra HD on the extra MB) and similar practical ideas.

And thank you *very* much to Dragokatzov for asking this question - its given me something to think about instead of going out and buying another 5 second hand boxes.

River~~
ID: 8876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 8901 - Posted: 13 Jan 2006, 4:16:32 UTC

I've seen the descriptions of how to setup "headless crunchers" for FaH; but opted for replacing my work system with a dual core cpu instead of experimenting with headless crunchers.
Don't some of the other BOINC clients have writeups on how to setup "headless crunchers" to use PXE to boot up and run their client? Then all you should need to do is match the headless machines to Rosetta's requirements and change the client used.
ID: 8901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 8931 - Posted: 13 Jan 2006, 12:26:37 UTC - in response to Message 8876.  

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?

Since he has the most recent experience, perhaps River~~ might do it ...
ID: 8931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile spacemeat

Send message
Joined: 15 Dec 05
Posts: 1
Credit: 961,743
RAC: 0
Message 9346 - Posted: 19 Jan 2006, 13:43:56 UTC - in response to Message 8876.  

Sounds interesting. Ive already got Linux going on my LAN, but so far with several separate boxes and fixed IP addresses. Can you recommend a good link to get me going in DCHP, TFTP, NFS & PXE, none of which I've used before?

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?


http://www.gentoo.org/doc/en/diskless-howto.xml
ID: 9346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPR_Mojo

Send message
Joined: 20 Sep 05
Posts: 4
Credit: 684,947
RAC: 0
Message 9697 - Posted: 24 Jan 2006, 10:48:38 UTC

Rather than look into a HOW-TO, just download a preconfigued Linux distro designed for this. I use K12LTSP from K12LTSP.org and it runs diskless nodes fine on BOINC projects right out of the box.

Configuration help can be found at my team's site http://forums.teamphoenixrising.net
ID: 9697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile networkman

Send message
Joined: 19 Jan 06
Posts: 1
Credit: 1,004,646
RAC: 0
Message 9759 - Posted: 25 Jan 2006, 3:30:39 UTC

Hmm.. interesting..

"Beer is proof that God loves us and wants us to be happy." - Benjamin Franklin
---
ID: 9759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : BOINC over a cluster



©2024 University of Washington
https://www.bakerlab.org