A short summary of current progress in the development of evolution@home.
When reading this report, please bear in mind that evolution@home currently has no full-time staff and resources are extremely tight in many respects. This accounts to a large degree for the slow speed of development on the non-biological, technical side.
Laurence Loewe, the main developer has been working mainly on theoretical models of evolution and methods for estimating parameters that are important for such models. His recent work explored how systems biology can contribute towards answering some of the most difficult questions in evolutionary biology. Related work on formal modelling methods led to the realisation that some of the simulation approaches that are being used in systems biology can also greatly simplify simulation work in evolution and ecology. These foundational insights are of importance for developing the biology and computing technology behind future simulators.
Our transition of the website to Plone as a content management system has been very successful and this was a good decision. Development of content will continue slowly but steadily. The successful installation of "LinguaPlone" now facilitates the translation of pages into other languages.
Thanks to science@home, we also have a forum in English and German. You can contribute to discussions there and perhaps you are even interested enough to help with the moderation.
Fully automated global computing. Many of you have repeatedly requested such basic global computing features like full automation of task distribution, results submission and highscores updates. We are pleased to announce the successful launch of the first BOINC-EvoHo prototype, the first fully-automated global computing implementation for evolution@home. This was developed by cooperating with Rechenkraft.net's "yoyo" project and is currently operated via their servers. After the successful launch of the first BOINC-EvoHo prototype a number of tasks remain. These include:
- High scores improvements. Currently all BOINC results are only listed as one entry, without the possibility for more detailed rankings as provided by the current high scores. Also these high scores are still computed semi-automatically.
- Checkpoints. Due to the absence of checkpoints in the current Simulator005 release 6, the length of BOINC simulations is seriously limited.
- Data handling. The large number of new results demand a new and more scalable way of handling and storing the data.
- Porting. The current prototype only works under Windows. Plans are underway to port the code to Linux and the Mac as well.
These tasks will be addressed by rebuilding Simulator005 on a more solid foundation that will also facilitate quicker development of future simulators.
Complexity. It is particularly challenging to deal with the complexity of data management problems and the interrelation of various issues from widely differring parts of the system in evolution@home. Evolution@home is one of those global computing efforts that are more of the 'ant-hill'-type than of the 'ruby-in-the-rubbish' type. The latter just need to write results to archives and keep those few records at hand that exceed a specific score of interest. Ant-hill-type problems, in contrast, need to keep almost all results at hand, because they are typically used in combination to understand trends in the big picture. Thus they require special data-warehouses that need a considerable infrastructure to operate and to grow with the needs.
Simulators + parameters. Much of our limited energy goes into organizing data at various levels, from designing a new simulator (what parameters are interesting enough to include, what will just bloat databases?) to final results analysis (what should be stored, what is easily recomputed?). Development of the biological core of the next simulator takes up a considerable amount of time that is well invested, as it reduces the overall amount of uninteresting results that have to be computed in order to find the interesting ones. Work on estimating the distribution of mutational effects and work in evolutionary systems biology contribute towards this goal.
Online results databases. The goal of this process is, to eventually make simulation results publicly accessible for analysis by other scientists that do not have the computing power to address the corresponding models, but need their output to understand the organisms they are studying. This can only be started once the other infrastructure is in place.
Computing progress. Finally, we want to express a big thank you to all of you who contributed over 500 years of CPU time to what is already the worlds largest database for Muller's ratchet simulation results. We have completed our first major computing projects (P1+P2) and are now in the process of crunching through the challenging part of the next project (P3).
Biological analysis of existing evolution@home results
Since there are more than 500 CPU years worth of simulation results in our database, we have long started the process of analyzing results, which is another activity that takes up a significant amount of time. For example, the first computational project of Simulator005 was a parameter grid search of the speed of Muller's ratchet for a wide range of parameters. To analyze results in a meaningful way, they have to be applied to actual biological organisms. To do this, we need to compile enough details about these organisms to have reasonable upper and lower estimates for the parameters in question (like population size, mutation rate etc...). Then the results of these analyses need to be written up and reviewed by other scientists for accuracy. Only after that will we publish them here.