Department of Mathematics and Computer Science

Areas for PhD Projects

The Research Training Program on Efficient Algorithms and High Quality Solutions has funding for a "mobility" PhD stipend in one of the areas of on-line, graph, and bioinformatics algorithms. These three areas are all either directly practically motivated or aimed at facilitating the development of better algorithms for practical problems — are algorithm theory with a practical focus. The focus of this Ph.D. project will be on efficient algorithms as well as on quality improvements of approximative solutions to hard problems. These problems may be hard because they are NP-hard (belong to a well recognized class of computationally hard problems) and/or are on-line (where requests have to be served before all data is available, so doing as well as an algorithm knowing all the data in advance is often impossible). In either case, improvements to the quality of the solutions found lead to very significant savings in resources or costs in important industrial applications in many different scheduling, planning, and packing problems. Examples of such problems include packing items in containers so that as few containers as possible are used, planning routes for vehicles as efficiently as possible, or scheduling tasks in a factory or business so that deadlines are met. In the area of bioinformatics, examples include inferring phylogenetic trees based on gene order data, determining the co-evolutionary history of host-parasite systems, and the prediction of RNA secondary structure. The goal is to find efficient, low cost solutions. The emphasis of this Ph.D. project will be on the theoretical understanding of such algorithms and on their analyses.

On-line Algorithms

[Boyar, Favrholdt, Larsen] An on-line algorithm is a special type of approximation algorithm, where the algorithms are restricted to behaving as if the data comes on-line. Many real world problems have an on-line component, some decisions that have to be made immediately, before all related data is available. An example of an on-line problem is the paging problem where one considers two levels of computer memory: a fast, but relatively small memory and a larger, but several magnitudes slower memory (cache versus RAM or RAM versus disk, for example). A paging algorithm has to make decisions as to what chunks (pages) of data to keep in the fast memory, even if it does not know what pages will be needed by the running programs in the future.

In analyzing algorithms, it is crucial to have a well-defined theoretical tool for predicting how well algorithms will do in practice. The standard measure for the quality of on-line algorithms is the competitive ratio, which has existed for more than twenty years.

However, it is well recognized that there are many cases where the results the competitive ratio gives are unsatisfactory as predictors of algorithms' performances. One of these is the paging problem where the competitive ratio does not separate algorithms that are known to perform very differently in practice. Hence, there have been many attempts to provide supplements or alternatives to the competitive ratio.

One such new measure for the quality of on-line algorithms, the relative worst order ratio, was proposed by Boyar and Favrholdt. In contrast to most earlier proposals, the relative worst order ratio has already been applied to several problems of different nature, giving results which are superior to those obtained with competitive analysis. Recent work announced at an international meeting through four presentations by Boyar, Favrholdt, Larsen, and Ehmsen (one of the group's Ph.D. students) shows its superiority to some other measures as well. There was great interest in comparisons of the many proposed supplements and alternatives to the competitive ratio, as presented by the SDU group as well as other invited speakers. In the past, most researchers compared new quality measures to the competitive ratio only; comparisons to other quality measures are only very recently starting to happen. More full comparisons of the ability of different quality measures to predict the behavior of online algorithms for online problems are needed. This will pave the way for drawing general conclusions about which measures are most useful in which situations and how these can be characterized.

Graph Algorithms

[Bang-Jensen, Toft] Graph theory plays an essential role within algorithmics, partly because graphs often occur as the underlying model for the problems treated and partly because the area of graph theory offers a large number of challenging algorithmic problems. Graphs are an essential tool for modelling many practical problems, hence making algorithmic graph theory an important area with many practical applications.

Both of the books co-authored by Bang-Jensen and Toft (Digraphs: Theory, Algorithms and Applications, by J. Bang-Jensen and G. Gutin, and Graph Coloring Problems, by T.R. Jensen and B. Toft) contain a large number of open problems and conjectures many of which could form the starting point of a Ph.D. thesis. One such topic is deciding the existence of orientations of undirected graphs (the assignment of directions to the edges of the undirected graph, making it directed) such that certain parameters are met (such as connectivity or containment/avoidance of certain substructures). These problems very often have both theoretical (structural results, which are often the key step in developing efficient algorithms) and algorithmic (determining the existence of orientations with the desired properties or finding the desired orientations) subproblems of independent interest. Graph coloring (assigning colors to the vertices so that no two adjacent vertices have the same color) is one of the most popular graph theoretical topics within computer science and operations research because of the many challenging open problems (of which many are described in Toft's book) and because many real-life problems (in particular timetabling and scheduling problems, for example how to schedule without, or with only few, waiting periods.) can be formulated as graph coloring problems. Colorings of undirected graphs is also closely related to orientations of graphs and there are a number of interesting open problems dealing with this connection between a digraph and the chromatic number of its underlying undirected graph. Studying these connections would be a very interesting Ph.D. project.

Another topic for a project is to study problems for directed graphs where we are looking for stuctures, only part of which need to adhere with the orientation of the arcs in the digraph. One such example is the question of deciding when a digraph has an out-branching whose (arc)-deletion leaves a connected graph (in the undirected sense).

Bioinformatics

[Fagerberg, Favrholdt, Merkle] Bioinformatics is a data-rich and technology-driven discipline. The rapid developments in genomic and other molecular research technologies together with developments in information technologies are the reason for a tremendous amount of information to be handled. Within this highly interdisciplinary research area, Ph.D. students will focus on how to reflect biological processes much more realistically. This will lead to a multitude of new algorithms and data structures.

In recent studies the group has, for example, applied gene cluster constraints to develop algorithms that lead to tremendous speedups when solving core problems for phylogenetic reconstruction. Algorithms based on these methods were recently applied to comprehensively analyze the phylogenetics of all known echinoderms and to identify undocumented events in mitogenomes. In these studies also so-called tandem duplication random loss (TDRL) operations for genomic rearrangement were considered, as there is strong support for this operation in biology. A TDRL duplicates a contiguous segment of genes, followed by the loss of one copy of each of the duplicated genes. This operation is even considered as ``being the most important rearrangement operation in vertebrates''.

The group has also strong expertise in reconstruction of the co-phylogenetic history of species. Co-evolutionary systems, such as hosts and their parasites or plants and insects that feed or breed on these plants, are a commonly used model systems for evolutionary studies. The core problem is to reconstruct the common history from the known phylogenies and the current relationships (for an overview see, for example, "Traversing the tangle: algorithms and applications for cophylogenetic studies", in Journal of Biomedical Informatics, by M.A. Charleston and S.L. Perkins). The current relations between the species from both groups (for example, which parasite lives on which host species) are often known from field studies or laboratory experiments. However, using relevant biological information such as divergence timing of the phylogenetic trees or geographic information of the species also from an algorithmic perspective has only rarely been investigated. The inclusion of such information will certainly open the door for very promising Ph.D. projects.

Due to the expert knowledge of the research group in the fields of computational biology, Ph.D. students will receive an education that emphasizes interdisciplinary and internationality. Several scientific meetings in the area of information technology and life sciences were organized by the research group already, where especially young academics were encouraged to participate (e.g. German Workshop on Artificial Life 2008 GWAL-8; Swarm Intelligence and Computational Biology at the IEEE Swarm Intelligence Symposium 2007; Molecular Docking, Complexity, and Optimization MDCO-07).

Data protection at SDU ▪ Databeskyttelse på SDU