Sebastian Schmidl

Time Series Analytics

Detecting anomalous subsequences in time series.

Time series datasets are particularly challenging for data engineering and analytics due to their size, recording speed and complex nature. They are also a dominant form of data in statistics, sciences, and engineering. I focus on the advancement of subsequence anomaly detection and anomaly clustering techniques by employing more efficient and effective approaches. For this purpose, I investigate and evaluate the state-of-the-art in time series research, design novel subsequence anomaly detection algorithms, and build automated and scalable systems for the management and analysis of massive corpora of time series data.

Efficient and scalable time series analytics is the focus of my PhD thesis at the Hasso Plattner Institute in Potsdam, resulting in many publications and projects in this area.

Example projects: Comprehensive evaluation of time series anomaly detection algorithms, AutoTSAD, HYPEX, AKITA (with Rolls-Royce), DendroTime (with DLR, in development).

Distributed Computing

Solving computationally complex problems in distributed computing environments.

I investigate computationally complex problems and how they can be solved in a distributed environment. Complex problems are prevalent in both data engineering and data analytics. The majority of existing solutions for data-centric problems lack efficiency, scalability, robustness, and elasticity. These deficiencies, I believe, can be addressed through distributed computing. I am especially interested in actor programming, which allows building fault-tolerant, elastic, and shared-nothing systems.

Distributed computing was the focus of my master's studies at the Hasso Plattner Institute in Potsdam, Germany. The courses I took at HPI included Distributed Data Management, Reliable Distributed Systems Engineering, Actor Database Systems, Methods of Cloud Computing and many others. For my master's thesis, I developed a distributed and reactive algorithm to discover bidirectional order dependencies in relational data, which was published in the VLDB Journal.

Example projects: DISTOD, TimeEval, ActorDB.

Data Profiling

Developing efficient algorithms to extract metadata from relational datasets.

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher, and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. For example, data profiling is used to understand the structure of a dataset, identify data quality issues, monitor data quality, prepare data for analysis, and optimize SQL queries in database management systems. There is a considerable amount of research in this area. However, data profiling application areas hardly use any modern data profiling techniques. Thus, the current state of data profiling research unfortunately fails to address the practical application needs.

I am particularly interested in efficient and scalable methods to discover order dependencies, and solving the profiling-and-application-disconnect with a novel data profiling engine and query language (DPQL).

Example projects: DISTOD, Metaserve (DPQL)

TimeEval

I maintain TimeEval, an evaluation tool for time series anomaly detection algorithms. It bundles more than 70 anomaly detection algorithms and thousands of time series datasets. We have used TimeEval in a project with Rolls-Royce to evaluate the state-of-the-art in time series anomaly detection resulting in a comprehensive evaluation paper (pVLDB).

TimeEval Docs

DISTOD

I developed the DISTOD data profiling algorithm, a distributed algorithm to discover bidirectional order dependencies from relational data. It combines efficient pruning techniques with a novel, reactive, and distributed search strategy outperforming all existing baselines. The algorithm is published in the VLDB Journal.

Scala | Akka | Python | Bash | Linux

Code & Docs

aeon

I am a core developer for aeon, a scikit-learn compatible toolkit for all machine learning tasks on time series. I am responsible for the creation and maintenance of the anomaly detection module; and I contribute to the large collection of elastic time series distance measures. We presented aeon in a tutorial at ECML PKDD.

aeon Docs

DendroTime (in development)

DendroTime is a progressive algorithm to compute a hierarchical agglomerative clustering for large collections of time series subsequence anomalies. Its anytime behavior allows the user to interrupt the clustering process early and still obtain a meaningful solution. I develop the algorithm in cooperation with the German Aerospace Center (DLR).

Scala | Akka | JavaScript | React | D3

Write me an Email

Contact me on LinkedIn

Explore my Github

Academic Activities

Research Interests

Scalable and reactive systems, especially using actor programming
Time series anomaly detection
Distributed computing
Data profiling

Collaborations

For my research, I have collaborated with researchers and employees from the following institutions:

Philipps-Universität Marburg, Germany
Rolls-Royce Deutschland Ltd & Co KG, Germany
Friedrich Schiller University Jena, Germany
German Aerospace Center (DLR e.V.), Institute of Data Science, Germany
University of Southampton, United Kingdom
Monash University, Australia
IRIMAS, Université de Haute-Alsace, France
U2IS, ENSTA Paris, France
Humboldt-Universität zu Berlin, Germany

Reviewing

My reviewing activities focused on software publications and reproducibility:

ACM SIGMOD 2023 Availability and Reproducibility
SoftwareX Journal 2023
ACM SIGMOD 2022 Availability

Publications

Anthony Bagnall, Matthew Middlehurst, Germain Forestier, Ali Ismail-Fawaz, Antoine Guillaume, David Guijo-Rubio, Arik Ermshaus, Patrick Schäfer, Thorsten Papenbrock, Phillip Wenig, Sebastian Schmidl: An Introduction to Machine Learning from Time Series. Proceedings of the European Conference on Machine Learning and Data Mining (ECML PKDD), 2024.
Sebastian Schmidl, Felix Naumann, Thorsten Papenbrock: AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data. PVLDB 17:(11), 2024. doi: 10.14778/3681954.3681978 . Download.
Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock: Anomaly Detectors for Multivariate Time Series: The Proof of the Pudding is in the Eating. Proceedings of the International Conference on Data Engineering Workshops (ICDEW), 2024. doi: 10.1109/ICDEW61823.2024.00018 Download.
Marcian Seeger, Sebastian Schmidl, Alexander Vielhauer, Thorsten Papenbrock: DPQL: The Data Profiling Query Language. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023. doi: 10.18420/BTW2023-19. Download.
Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock: HYPEX: Hyperparameter Optimization in Time Series Anomaly Detection. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2023. Gesellschaft für Informatik, Bonn. (p. 461-483). doi: 10.18420/BTW2023-22. Download.
Phillip Wenig, Sebastian Schmidl, Thorsten Papenbrock: TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms. PVLDB 12:(15), 2022. doi: 10.14778/3554821.3554873. Download.
Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock: Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB 9:(15), 2022. DOI: 10.14778/3538598.3538602. Download.
Sebastian Schmidl, Thorsten Papenbrock: Efficient Distributed Discovery of Bidirectional Order Dependencies. The VLDB Journal, 2022. DOI: 10.1007/s00778-021-00683-4. Download.
Julian Weise, Sebastian Schmidl, Thorsten Papenbrock: Optimized Theta-Join Processing. Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW), 2021. Gesellschaft für Informatik, Bonn. (p. 59-78). DOI: 10.18420/btw2021-03. Download.
Sebastian Schmidl, Frederic Schneider, Thorsten Papenbrock: An Actor Database System for Akka. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW) - Workshopband, 2019. Gesellschaft für Informatik, Bonn. (p. 225-234). DOI: 10.18420/btw2019-ws-23. Download.

Teaching

During my PhD studies at HPI, I supervise students and teach Bachelor and Master courses in the field of data profiling, time series analytics, distributed computing, and reproducability in science:

Seminars

Advanced Data Profiling (Master Seminar, 2023/2024)
CAST: Classifying Time Series Anomalies (Master Project, 2022/2023)
Large-Scale Time Series Analytics (Master Seminar, 2021/2022)
UltraMine - Scalable Analytics on Time Series Data (Bachelor Project, 2020/2021)
Sustainable Machine Learning on Edge Device Clusters (Master Seminar, 2020)

Lectures

Guest Lecture about Reproducibility for the Lecture Series on Research Methods (2024/2025)
Exercises for the Data Integration lecture (2024)
Guest Lecture about Order Dependencies for the Data Profiling course (2023)
Guest Lecture about distributed discovery of Order Dependencies for the Data Profiling course (2020/2021)

Supervised Master Thesis

Detection of Subsequence Anomalies in Univariate Time Series with Convolutional Kernels (Stefan Spangenberg, 2025)
DPQLEngine: Processing the Data Profiling Query Language (Marcian Seeger UMR, 2023, supervision assistance)
Correlation Anomaly Detection in High-Dimensional Time Series (Niklas Köhnecke, 2023)
HYPEX: Explainable Hyperparameter Optimization in Time Series Anomaly Detection (Mats Pörschke, 2022)
Time Series Anomaly Detection: An Aircraft Turbine Case Study (Jacopo Roberto Nicosia, 2022)
A2DB: A Reactive Database for Theta-Joins (Julian Weise, 2020, supervision assistance)

Supervised Bachelor Thesis

Time Series Visualization and Interactive Exploration in UltraMine (Ulrike Herwig, 2023)
Time Series Anomaly Classification in UltraMine (Grit Fessel, 2021)
Univariate Time Series Anomaly Detection in UltraMine (Siddeshkanth Logonathan, 2021)
Multivariate Time Series Anomaly Detection in UltraMine (David Matuschek, 2021)
Distributed Computing for Time Series Analysis (Rohan Sawahn, 2021)
Incremental Time Series Analysis in Ultramine (Richard Schulz, 2021)

Unrefereed Publications and Reports

Schmidl, S. (2020). Efficient Distributed Discovery of Bidirectional Order Dependencies. Master Thesis. Faculty of Digital Engineering, University of Potsdam, Potsdam. Download.
Schmidl, S. & Waack, J. (2019). Distributed Order Dependency Discovery. Technical Report. Hasso Plattner Institute, Potsdam. Download.
Schmidl, S. (2019). Self-Healing Microservices with Kubernetes. Technical Report. Hasso Plattner Institute, Potsdam. Download.
Bock, B., Meissner, A., Schiewe, V., & Schmidl, S. (2019). Optimal Self-Sovereign Identity using Blockchain-Technology. Technical Report. Hasso Plattner Institute, Potsdam.
Schmidl, S. (2019). How to Repair Data? The HoloClean Framework. Technical Report. Hasso Plattner Institute, Potsdam. Download.
Kroschewski, J. M., Preuß, A., Schmidl, S., Schöne, F. C., Stebner, A., & Straßenburg, N. H. (2018). Entwicklung eines Deep Learning Ansatzes für Image Captioning. Technical Report. Hasso Plattner Institute, Potsdam.
Bock, B., Fischer, M., & Schmidl, S. (2018). Brand Personality Prediction based on Big Five User Personality Scores. Technical Report. Hasso Plattner Institute, Potsdam.
Bock, B., Schmidl, S., & Weisgut, M. (2018). Implementation of Log-Based Security Analysis for Cloud Storage Broker. Technical Report. Hasso Plattner Institute, Potsdam. (Report is subject to restriction note)
Schmidl, S. (2017). Untersuchung von Advanced Persistent Threats: Welche Hinweise finden sich in sicherheitsrelevanten Log-Daten? Bachelor Thesis. Baden-Württemberg Cooperative State University, Karlsruhe and SAP SE, Walldorf. (Thesis is subject to restriction note)

Sebastian Schmidl