Master Thesis / ISA Suggestions

Here, you can find a list of proposed projects you can conduct either as an ISA or preferably as a MSc thesis. Please feel free to contact me at anytime. Furthermore, if you have an own idea and think it might suit in the scope of our group, do not hesitate to contact us. The suggested projects can and will be adjusted to personal needs and preferences.

When? Where? How much? Machine learning for resource planning

Description

Are you interested in machine learning and want to work on a highly demanded and relevant topic while already collaborating with industry during your Master thesis? How about predicting a companies resource demand in order to enable efficient and precise planning. Depended on the business model this resource and demand planning can have huge impact on the economic success of the company as the prices may vary depending on your buying and selling strategy.

In this particular project, we will collaborate with a Danish company which will provide us with access to their actual business intelligence and data of the past decade. We will use this data in order to find relevant customer groups and learn their demand behavior in relation to external and predictable factors, e.g., the weather, time of the year, etc. At the end we will try to provide estimation of the resource demand for the company in the near future based on the past data.

More concretely, this project is split into several parts, which can be worked on with several students in a more thorough fashion or as a proof-of-concept fashion as one thesis:

  1. Collect and familiarize with the provided data
  2. Separate customers based on their past behavior by means of time-series clustering
  3. Apply rigorous machine learning techniques in order to built reliable predictors
  4. Thorough testing and evaluation of the models

Machine Learning - Can we trust it?

Description

In the recent years, we have seen a tremendous growth of impressive examples of the possibilities of machine learning: From go-playing supercomputer to self-driving cars the development was just breathtaking. Nevertheless, machine learning is not a one-size fits all procedure and requires a considerable amount of knowledge when it comes to applying the right hammer to the right problem. In many fields, standard machine learning techniques are used by non-experts in order to make predictions, learn models and infer knowledge. In biomedicine, machine learning is used in nearly every field with consequences for research and patients. The question is: Can we trust the results? Do we discover a signal or rather spurious noise? How sensitive are these methods to outliers, parameter setting, etc.? Can we explain why and how the machine came up with a certain result?

In this project, we want to automatize the conduction and evaluation of machine learning approaches in order to facilitate a large-scale comparison analysis of common techniques on common task with respect to their performance, their sensitivity to noise and stability. The ultimate goal is provide solid criteria for the practitioner what method with which parameters to choose for a concrete problem.

The project phase might be as follows:

  1. Familiarize with the topic and make a tool and problem selection
  2. Design and implement the automation of a machine learning tasks
  3. Design the exact parameters of the study
  4. Conduct and evaluate the study

Parameter-free hierarchical clustering

Description

Clustering is a standard technique in unsupervised machine learning grouping objects in a dataset into groups of similar objects. Clustering is widely applied in a plethora of different field, from astronomy, social sciences, economics to bioinformatics. Even though, clustering is a standard technique, the process itself is quite complicated and many mistakes can be made. Especially, as the ground truth in unknown, it is hard to decide whether a result is good or bad. The practitioner has to answer many different questions, for example how to pre-process the data or what tool to use. Even when a tools was chosen, every clustering approach requires at least one parameter, defining whether we want to have many small or few large clusters, has to be set. Furthermore, it might happen that the clusters in a dataset are not of equal properties and you actually would require multiple parameters for the same dataset.

In this project, we aim to extend our existing Transitivity Clustering to a parameterless hierarchical clustering tool. Is to build a hierarchical cluster structure and compare the quality of the clustering on each node to a clustering with similar properties but without structure in order to define the quality of the split. This will allow for a dynamic tree-cut in order to derive the best possible clustering for the whole dataset without user interference.

The project will be structured as follows:

  1. Familiarize yourself with Transitivity Clustering
  2. Extend Transitivity Clustering to an hierarchical approach
  3. Design and implement the relative quality measure
  4. Implement the dynamic tree-cut into the hierarchical clustering

ClustEval Presentation Web-Interface

Description

ClustEval is an integrated framework which automates and aims to standardize cluster analyses. It consists of a backend implemented in Java and, additionally, a currently overly complex website implemented in Ruby on Rails. This project aims at designing and implementing the ClustEval Presentation Web-Interface which will be the address that scientists can visit to inspect clustering results generated using ClustEval. It should:

  1. be usable by any scientist, also those not working in the Computer Science field;
  2. be intuitively usable and not too complex;
  3. be based on appropriate scientific use-cases;

Project Plan

The overall course of the project should consist of the following phases:

  1. Familiarize yourself with the background of the scientific area in which the website will be used
  2. Think of and visualize use cases of users of the website
  3. Design and visualize work flows for the website, which allow the scientist to solve the use cases
  4. Develop design mock-ups
  5. Design and implement an appealing website using up-to-date technology (i.e. jQuery, MVC-based web frameworks such as Django or Ruby on Rails)

Clustering as a Service

Description

ClustEval is an integrated framework which automates and aims to standardize cluster analyses. It consists of a backend implemented in Java and, additionally, a currently overly complex website implemented in Ruby on Rails. This project aims at designing and implementing the ClustEval Clustering as a Service Web-Interface which will be the address that scientists can visit to design and perform cluster analyses using their own data, tools and quality measures. It should:

  1. be usable by scientists, with an advanced background knowledge of the clustering field
  2. be intuitively usable but show a high level of detail
  3. be based on appropriate scientific use-cases

Project Plan

The overall course of the project should consist of the following phases:

  1. Familiarize yourself with the background of the scientific area in which the website will be used
  2. Think of and visualize use cases of users of the website
  3. Design and visualize work flows for the website, which allow the scientist to solve the use cases
  4. Develop design mock-ups
  5. Design and implement an appealing website using up-to-date technology (i.e. jQuery, MVC-based web frameworks such as Django or Ruby on Rails)