RDKit UGM 2025

Day	Time	Title	Speaker	Abstract	Rooms	Type
Wed	08:30 - 09:00	Check-In/ Registration		Abstract ☕️	BI/entrance	☕️
Wed	09:00 - 10:00	Welcome and Greg and What's new 1		Abstract	BI	standard
Wed	10:00 - 10:30	MolTrack: A flexible system for chemical registration	Oleksandra Serhiienko	Abstract Following the footsteps of lwreg, we introduce MolTrack: an open-source, lightweight, flexible, hackable, and easily extendable FastAPI-based server for managing chemical compounds, batches, and assay results, with the RDKit cartridge-enabled Postgres for chemical intelligence. Suitable for labs, startups, and enterprises	BI	standard
Wed	10:30 - 11:30	Coffee + posters		Abstract ☕️	Respirium	☕️
Wed	11:30 - 11:50	Enhancing Drug Discovery with Nuclear Magnetic Resonance & Artificial Intelligence	Thomas Evangelidis	Abstract At AI\|ffinity, we integrate nuclear magnetic resonance (NMR) spectroscopy with artificial intelligence (AI), cheminformatics, and computer-aided drug design into a proprietary platform – soon accessible on the web - aimed at accelerating drug discovery across diverse therapeutic targets. Our platform leverages NMR data at every stage of the discovery pipeline. Using 1D hyperpolarized NMR, we achieve high sensitivity and throughput - screening up to 1,500 compounds per day - and can detect weak intermolecular interactions in the millimolar range, enabling early hit identification. Crucially, ligand-observed NMR allows us to extract ligand epitopes without requiring prior knowledge of the receptor’s 3D structure. These epitope data are then utilized by deepHitMiner, our AI-driven virtual screening tool, to identify new binders, predict their binding modes, and propose attachment points for chemical synthesis and hit expansion. This approach is especially powerful for challenging targets, including intrinsically disordered and transmembrane proteins. In parallel, our 4D-GRAPHS software automates chemical shift assignment and NMR-based structure determination, enabling detailed mapping of ligand–protein interactions through protein-observed NMR. This comprehensive understanding supports structure-based hit-to-lead optimization, even for targets lacking stable tertiary structure. Finally, by implementing 4D NMR spectroscopy, we push the molecular weight limit of NMR beyond 25 kDa and enhance its applicability to study protein dynamics and interactions with greater precision, speed, and accuracy.	BI	standard
Wed	11:50 - 12:10	Cheminformatics Meets Biology: Hybrid Molecular Representations in Drug Discovery	Jonathan Bisson	Abstract Medicinal chemistry and drug discovery increasingly intersect with biopolymers such as peptides, protein therapeutics, and nucleic acids as therapeutic agents. However, traditional cheminformatics tools struggle to integrate small-molecule and macromolecule data, since chemical structures and biological sequences have historically been handled in separate systems and formats. We address this challenge with hybrid molecular representations, unifying small-molecule and sequence-based descriptions into a single framework. In particular, we utilize the V3000 Molfile Self-Contained Sequence Representation (SCSR) to encode large biomolecules (e.g., peptides, antibody-drug conjugates) in a chemically intuitive format. This approach bridges chemical and biological data structures, allowing chemists and biologists to interact with complex molecules, from a single source of truth but through multiple representations. Our implementation combines several tools to support these hybrid representations. We worked with the Ketcher web-based molecule editor team to enhance the drawing and annotation of molecules with sequence information in an SCSR-enabled format. Then added extended SCSR support into the RDKit, to process the resulting hybrid structures for property calculations and substructure searching, treating encoded biopolymer components on par with traditional small molecules. Data management and collaboration are facilitated by CDD Vault, which now supports storing and querying large macromolecular structures using SCSR. We demonstrate that this unified representation enables both substructure queries within large biopolymers and sequence-based searches, while also facilitating the calculation of physicochemical properties for modified bioconjugates. This work highlights the significance of hybrid molecular representations in modern medicinal chemistry workflows. By seamlessly integrating chemical and biological modalities, our approach improves the efficiency of analyzing complex therapeutics and paves the way for future drug discovery projects that involve increasingly large and hybrid molecular entities. We will also talk briefly about how the RDKit is central to almost everything chemistry at CDD from the frontend to the backend.	BI	standard
Wed	12:10 - 12:30	Automated machine learning for property and ADME Prediction using the RDKit	Ryan Greenhalgh	Abstract How we use the RDKit to build automated pipelines for molecular property and ADME prediction. This includes RDKit featurisation, model selection, and evaluation techniques integrated into our AutoML framework. We'll also share insights on model performance and challenges in generalisation.	BI	standard
Wed	12:30 - 13:30	Lunch + posters		Abstract ☕️	Respirium	☕️
Wed	13:30 - 13:35	EasyDock 1.0: customizable and scalable docking tool	Guzel Minibaeva	Abstract EasyDock 1.0 - an open-source and scalable Python-based tool for fully automated molecular docking. The current version supports popular docking programs, namely Autodock Vina, gnina, and smina. The tool automatically prepares ligands by removing salts, generating initial conformers and stereoisomers, using RDKit, and performing protonation with the open-source program MolGpKa. Ring sampling is implemented to improve docking of molecules containing saturated ring systems. All input data, settings, and results are stored in an SQLite database, enabling interrupted jobs to be resumed. EasyDock integrates Dask for distributed computation across multiple machines. A built-in model predicts docking times to optimize task scheduling and reduce total runtime. Special cases, such as boron-containing molecules, are handled by temporarily substituting boron with carbon during the docking process. The ProLIF package is integrated to calculate protein-ligand interactions. The current version is composed entirely of open-source modules.	BI	lightning
Wed	13:35 - 13:40	Open Source 2D Sketcher	Chris Von Bargen	Abstract Overview of an open source 2d sketcher using the RDKit as it's underlying model	BI	lightning
Wed	13:40 - 13:45	Introduction to OpenBind	Ed Griffen	Abstract Poster showing what the OpenBind Consortium is hoping to deliver and how. https://www.gov.uk/government/news/uk-to-become-world-leader-in-drug-discovery-as-technology-secretary-heads-for-london-tech-week you might be able to access the data.	BI	lightning
Wed	13:45 - 13:50	ChemPatentizer: Transforming Chemical Patents into Actionable Scientific Data	Riccardo Fusco	Abstract Chemical patents contain valuable drug discovery data, but extracting it is a nightmare. These documents are often hundreds of pages of scanned images with inconsistent formats and poor-quality chemical structures. We developed ChemPatentizer, a semi-automated tool that finally makes this data usable. Instead of trying to fully automate everything (which is extremely complicated due to the heterogeneity of patents), our approach smartly combines human expertise with AI-powered structure recognition. Chemists guide the initial steps, then the pipeline automatically converts messy patent data into clean, analyzable tables. This means researchers can now efficiently extract structure-activity relationships from patents and use them for drug design. We've successfully tested ChemPatentizer on GLP-1 receptor patents, proving it can transform previously inaccessible patent data into a practical resource for drug discovery.	BI	lightning
Wed	13:50 - 13:55	OpenMMDL: Building, Simulating, and Analyzing Protein–Ligand Systems in OpenMM	Valerij Talagayev	Abstract The presentation would be about OpenMMDL, which is a workflow consisting of OpenMMDL Setup, a tool consisting of an web-based GUI to make the preparation of OpenMM protein-ligand simulation files easy for beginners, allowing a new user to either prepare the files in PDBFixer or Ambertools, with a step-by-step GUI allowing to have the optimal simulation settings, with default settings being in place as well. OpenMMDL simulation, which is the backend that performs the simulation and the postprocessing via MDTraj and MDAnalysis to deliver an output, that is directly ready to be used for OpenMMDL Analysis, which allows to track stable waters in the simulation, create protein-ligand interaction fingerprints and use those to generate binding modes, which are combinations of interactions, thus showing the most common combination of interactions between the ligand and protein. The lightining talk will focus on the quick introduction in each of the three parts of the workflow with additionally highlighting the implementation of ProLIF for the protein-ligand interaction as an additional option to the already present PLIP package in the newest release of OpenMMDL and briefly highlight the new additions to ProLIF, including the implementation of water-bridge mediated interactions, improvement of visualization and H-Bond interactions from implicit hydrogens with the latter two additions being performed as part of Google Summer of Code 2025 with both OpenMMDL and ProLIF using RDKit in various parts of the packages. https://summerofcode.withgoogle.com/programs/2025/projects/XWsglxQM https://summerofcode.withgoogle.com/programs/2025/projects/5Otkx8vp	BI	lightning
Wed	14:00 - 14:30	Improving Conformer Quality in ETKDG	Niels Maeder	Abstract Here we present several improvements to the RDKit's conformer generation workflow that ensuring physically realistic conformers, with a special focus on bond lengths and angles. Conformer evaluation workflows that mainly focus on correct torsional states often miss more fundamental flaws in conformers, so we propose additional metrics to identify bad conformers. A detailed analysis of ETKDG-generated conformers, emphasizing physical validity, provides new insight into the performance and accuracy of the conformer generator. This understanding has enabled optimization of the RDKit's workflow and improvements to the quality of the resulting conformers.	BI	standard
Wed	14:30 - 15:00	Unraveling Torsion Preferences: Comparative Analysis of Torsion Motif Angle Distributions Across Different Environments	Jessica Braun	Abstract Comparing accumulated torsion profiles over different environments (crystal, vacuum, water, hexane) to answer the questions if there is in fact a difference, if can we understand where the difference is coming from and how we can make use of these differences in the context of conformer generation.	BI	standard
Wed	15:00 - 16:00	Coffee + posters		Abstract ☕️	Respirium	☕️
Wed	16:00 - 16:30	Can you find hits by screening only 100 molecules?	Jan Jensen	Abstract The answer is yes. And in my talk I'll explain what we found.	BI	standard
Wed	16:30 - 17:00	Freedom Space 4.0 - Creation of ultra-large chemical spaces using ML-based reagent filtering	Anna Kapeliukha	Abstract - Freedom Space introduction + custom space generation with ML models - Freedom Search Platform overview - Overview of the RDKit search that was used for the platform with some case studies that David and us will do for the paper.	BI	standard
Wed	17:00 - 17:30	GenCReM: de novo design guided by explainable docking	Aleksandra Ivanova	Abstract De novo generation methods represent a promising alternative to conventional virtual screening by enabling a more efficient exploration of the vast chemical space, which is too extensive for exhaustive enumeration and screening. Nonetheless, ensuring the synthetic feasibility of de novo-generated molecules remains a significant challenge for many current approaches. In our work, we employed the CReM [Polishchuk, P.,CReM: Chemically reasonable mutations framework for structure generation, Journal of Cheminformatics, 2020, 12, 1, 1-18] method, which inherently accounts for the synthetic accessibility of generated compounds. This was combined with a genetic algorithm and molecular docking to traverse chemical space efficiently. The primary goal of our tool is structure optimization, scaffold decoration, and exploration of local chemical space around a lead compound. The main feature is traversing chemical space guided by explainable docking preserving well-fitted 3D fragments while modifying suboptimal atoms, allowing the algorithm to converge faster than regular procedure. The optimized objective function can incorporate parameters such as docking scores, physicochemical and drug-likeness properties, and ligand-protein interaction fingerprints to preserve crucial interactions, etc. To some extent, the tool can be applied for unrestricted de novo generation and exploration of wider chemical space.	BI	standard
Wed	19:30 - ???	Conference Dinner #1		Abstract ☕️	Masaryk Cafeteria	☕️
Thu	08:30 - 09:00	Check-In/ Registration		Abstract ☕️	BI/entrance	☕️
Thu	09:00 - 9:30	What's new 2 (Greg) + Monomer-based representations of molecules using RDKit (scheduled with Greg)	Rachel Walker	Abstract We are working on adding a monomer-based representation of molecules to RDKit that is compatible with HELM parsing and writing, along with translation methods between atom-based and monomer-based methods. A draft of this can be found here https://github.com/rdkit/rdkit/pull/8218 -- I am hoping this will be much further along by the UGM in September	BI	standard
Thu	09:30 - 10:00	From Sequences to Molecules: An open-source Monomer-Centric Toolkit	Davit Rizhinashvili	Abstract This talk covers working with sequences and custom monomers: building and managing libraries, handling different sequence formats (HELM, FASTA, custom), enumerating sequences using combinatorial libraries and chemical reactions, conversions to molecules, SAR analysis, and the integration with Datagrok.	BI	standard
Thu	10:00 - 10:30	ConfScale: An Open-Source Python Package for Scalable Conformer Generation and Filtration	Etienne Reboul	Abstract Confscale is a Python package for large-scale conformer processing, designed to work efficiently with RDKit in high-performance computing (HPC) environments. It supports parsing, generation, and filtering of conformers and molecular data, with a strong focus on scalability . One of the core goals of Confscale is to provide data handling in cheminformatics by moving away from traditional uncompressed text formats (e.g., .smi, .sdf) toward columnar, compressed formats like Parquet. These formats offer significant advantages in speed, size, and compatibility with distributed computing frameworks like Dask and PyArrow. The goal is to provide a simplified API to work with very large molecular libraries such as ZINC. Conformer generation is another focus: it's a computationally expensive but embarrassingly parallel task. Confscale uses Dask to distribute this workload across HPC clusters, enabling large-scale generation efficiently. Finally, Confscale introduces a novel method for conformer comparison and selection. It extends the concept of Torsion Angular Bin Strings (TABS) to create confp , a simulated count-based conformation fingerprint. This allows for conformer labeling and the computation of similarity metrics between conformers, providing a scalable solution for conformer diversity analysis and filtering.	BI	standard
Thu	10:30 - 11:30	Coffee + posters		Abstract ☕️	Respirium	☕️
Thu	11:30 - 11:50	SwiftPol: RDKit and cheminformatics for polymer building	Micaela Matta	Abstract In this talk, I will outline how our group has been leveraging RDKit to facilitate the building of polymer systems for molecular dynamics simulations using reaction SMARTS. Our SwiftPol package (https://doi.org/10.21105/joss.08053) was developed to build statistically representative polymer ensembles that could match experimentally relevant properties beyond Mw, such as polydispersity or % of residual monomer/oligomer. More recently, we have been testing the capability of RDKit (and thus SwiftPol) towards building more complex polymer topologies, such as polymer brushes and 3D polymer networks. The possibility of building large polymer networks using RDKit to perform cross-linking reactions is exciting (and, we think, unexplored): it provides a shortcut to realistic and ready-to-run polymer topologies that can be otherwise challenging to obtain, without ad-hoc software and/or a reactive MD scheme (and much longer simulation timescales).	BI	standard
Thu	11:50 - 12:10	The Evolution of Chemical Collections and Drug-Like Diversity: The Role of a Compound Aggregator – The Story Behind Molport’s 5-Million Compound Library	Andrea Altieri	Abstract The design and development of compound libraries have undergone a remarkable evolution, paralleling advances in medicinal chemistry and high-throughput screening technologies. From the early days of simple aggregations of available chemicals (sourced from academia or historical compound collections) mto the era of combinatorial chemistry, the field has steadily shifted toward more purpose-driven collections. Modern libraries now emphasize drug-like properties, target-oriented design, natural product-inspired scaffolds, and structural diversity to increase the likelihood of identifying biologically relevant hits with better potential to progress into clinical candidates. This talk will trace this progression, highlighting how the philosophy behind library design has matured—from quantity-driven approaches to quality-focused strategies. As screening demands have grown, so too has the need to aggregate vast chemical inventories from multiple suppliers, ensuring both compound availability and consistency in data. This necessity gave rise to MolPort, a publicly available centralized platform built to unify global compound availability. With over 5 million unique small molecules sourced from numerous providers, MolPort represents a comprehensive, real-world compound database specifically geared for screening applications. We will explore how such large-scale aggregation not only facilitates efficient compound sourcing but also enables in-depth cheminformatics analyses. Using the MolPort database as a case study, we will examine its drug-like property distributions, structural diversity metrics, and chemical space coverage. These analyses demonstrate how aggregated libraries can be both broad and balanced, bridging synthetic accessibility, commercial availability, and medicinal chemistry relevance. Ultimately, this talk will underscore the strategic importance of well-curated compound libraries in modern drug discovery—and how platforms like MolPort are transforming the way screening campaigns are conceived, designed, and executed.	BI	standard
Thu	12:10 - 12:30	Integrating 19Focused Screening with Make-on-Demand Chemical Spaces for Enhanced Fragment Follow-Up	Patrick Penner	Abstract A core challenge of Fragment-based screening (FBS) is ensuring follow-up for fragment hits. Hits coming out of a fragment screen may not be in an affinity range that is measurable in standard assays, which can discourage further efforts. Lowering the threshold to follow-up could help unlock the potential of these hits. Since its inception, Fragment-based drug design (FBDD) has relied on robust biophysical methods. One of the most classical biophysical methods in FBDD is nuclear magnetic resonance (NMR). NMR using 19F in the screened fragments as its active isotope has significantly evolved as a platform for FBS to facilitate, for example, focused screening. Another transformative development in recent years has been the rise of massive make-on-demand chemical spaces. Their unprecedented size presents a challenge to established computer-aided drug design methodology, but also a great opportunity to rapidly explore chemistry. In this study we have integrated our established 19Focused screening platform with the Enamine REAL Space to enhance fragment selection and follow-up strategies. This begins by extracting fragments from the Enamine REAL Space using a few different approaches so that they can be followed up on. These fragments then undergo a 19Focused screen, which involves predicting the NMR signals. The prediction workflow itself must be able to deal with the heterogeneous chemistry it is expected to encounter. Finally, we also demonstrate the follow-up on the fragments found in the initial screening, showcasing the power of this integrated platform to overcome barriers in FBS while expanding the scope of chemical exploration.	BI	standard
Thu	12:30 - 13:30	Lunch + posters		Abstract ☕️	Respirium	☕️
Thu	13:30 - 13:35	pdChemChain – Interactive building of pandas chemistry pipelines	Esben Jannik Bjerrum	Abstract pdChemChain is a toolkit intended for interactive building of pandas dataframe processing with a chemical scope. A simplified and Pythonic API with three intuitive dogmas makes it easy to create processor pipelines in a few lines of code. Links can be created and tested in interactive sessions, such as jupyter, and subsequently combined chains. Special links provide branching and unions, and serial chunking and parallelism links makes it possible to conserve memory and process millions of rows efficiently on modest hardware. Code line overhead in custom link creation has been minimized to enable easy expansion of the capabilities. Full specifications of the interactively build pipelines can be written to JSON or Yaml, and loaded or reused from the command line utilities for backend and server deployment. This lightning talk would be of interest to people how need to process chemical information in pandas dataframes for exploratory data analysis in Jupyter sessions and who want a simple tool to build data processing pipelines. https://github.com/EBjerrum/pdchemchain	BI	lightning
Thu	13:35 - 13:40	Scikit-Mol Updates and New Features	Esben Jannik Bjerrum	Abstract Scikit-Mol Updates and New Features Scikit-mol is a toolkit developed to make it easier to integrate and reuse chemical prediction models using RDKit featurization and Scikit-Learn models. It was started at a RDKit hackathon in 2022 and featured in a lightning talk in 2023. It has been developed by various volunteer contributors over The last two years. The lightning talk would give a brief overview of core-features such as the scikit-learn pipeline compatible transformers as well as some of the updates, such as pandas output, safe inference mode and applicability domain estimators. This lightning-talk is of interest to people who want a way to efficiently build simple to moderately complex, but fully documented and deployable QSAR models with scikit-learn and RDKit in a few lines of code. https://github.com/EBjerrum/scikit-mol	BI	lightning
Thu	13:40 - 13:45	Developing an Open Source and FAIR Ecosystem for Cheminformatics	Martin Sicho	Abstract This short talk presents QSPRpred, Spock, and DrugEx, an open-source cheminformatics ecosystem designed to accelerate drug discovery by integrating predictive modeling, molecular docking, and generative chemistry. QSPRpred is a modular Python framework for building robust QSPR/QSAR models with automated preprocessing and hyperparameter optimization. Spock serves as an automated molecular docking framework that generates and stores ligand-protein complexes, enabling structure-based virtual screening to identify promising ligands. Complementing these, DrugEx leverages both Spock pipelines and QSPRpred models to guide de novo molecular generation. Together, these tools facilitate streamlined workflows from property prediction and docking to compound design, improving reproducibility and reducing complexity through interoperable, version-controlled components. Case studies demonstrate how this ecosystem enhances virtual screening campaigns by combining data-driven modeling with structure-based docking, supported by open-source development and community collaboration.	BI	lightning
Thu	13:45 - 13:50	BLINCS: Breadth-first Line Notation for Chemical Structures	Wim Dehaen	Abstract The usage of line notations in chemical language models (CLMs) has been highly successful. Nonetheless, atoms that are topologically close in the chemical graph are not necessarily close in the SMILES string. A breadth-first traversal of the chemical graph would ensure this distance correspondence better. To investigate this effect, we propose a new line notation with a SMILES-like syntax but a breadth-first graph traversal. This notation is based on the insight that a degree sequence plus ring closure operations can encode any simple graph concisely. Thus, with the addition of atom symbols and bond symbols any chemical graph can be encoded. We compare the properties of this notation with other line notations, and apply it to CLMs. We provide an RDKit-based open-source implementation of a BLINCS parser that can convert molecular graphs from and to BLINCS.	BI	lightning
Thu	13:50 - 13:55	Synthony: Synthon-Based Novel Retrosynthesis Approach	Mher Matevosyan	Abstract We have developed a novel retrosynthesis approach that constructs the synthesis tree in reverse—not from the target molecule to buyable building blocks, but starting from the building blocks and working forward. This is achieved by enumerating the space of possible synthons and identifying combinations that can be assembled into the target molecule. By focusing solely on complete retrosynthetic routes, our method avoids spending computational resources on incomplete or unproductive pathways. On the USPTO-50K benchmark, our approach achieves a top-1 accuracy of 75.6%. It is important to note that the method implicitly restricts the search space to a predefined database of building blocks, rather than exploring all theoretically possible precursors.	BI	lightning
Thu	14:00 - 14:30	RandomNets Improve Neural Network Regression Performance via Implicit Ensembling	Esben Jannik Bjerrum	Abstract Artificial feed-forward neural networks have long been recognized as powerful machine learning models and are widely used in QSAR and QSPR modeling of molecular properties. Inspired by Random Forest models and the robust techniques of sample and feature bagging, the RandomNets model was developed as an efficient, vectorized solution for ensemble creation, training, and inference. The model adds an extra dimension to the tensors passing through the neural network, combined with input feature masking and optional subsampling of the dataset during training. This vectorized approach improves efficiency and simplifies training and inference of the implicit ensemble. Training a 25-member implicit ensemble requires only twice the time of a comparable baseline network but significantly improves prediction performance, as measured by R² and MSE on test sets from 133 bioactivity datasets, with an average performance increase of around 25%. Compared to the conceptually similar input masking technique using dropout, the implicit ensemble demonstrates reduced sensitivity to hyperparameter choices, similar or improved performance, and a fourfold reduction in training time. Additionally, the implicit ensemble provides the standard deviation of individual predictions, which can help identify uncertain predictions. This talk is of interest to people who want to build efficient RDKit and Feed Forward Neural Network based QSAR models using RDKit. https://chemrxiv.org/engage/chemrxiv/article-details/67656cfa81d2151a02603f48 https://github.com/EBjerrum/RandomNets/tree/main	BI	standard
Thu	14:30 - 15:00	Accelerating RDKit on GPUs for high throughput computational chemistry workflows	Kevin Boyd	Abstract RDKit is critical for preprocessing and data generation for computational chemistry workflows. Several RDKit functions can bottleneck high-throughput workloads. We present work to accelerate several key RDKit functions on GPUs, including conformer generation, MMFF relaxation, and pairwise Tanimoto Similarity. We also present several CPU optimizations contributed to RDKit where GPU ports are intractable.	BI	standard
Thu	15:00 - 15:30	Coffee + posters		Abstract ☕️	Respirium	☕️
Thu	15:30 - 16:00	What's new 3 (Greg) + Interactive atom-based property visualisation in the browser	Paolo Tosco	Abstract I will be presenting the implementation in the rdkit-structure-renderer package of interactive visualisation of atom-based properties in the browser. In particular, I will show how pKa values can be conveniently visualised on a molecule within a dashboard.	BI
Thu	16:00 - 16:30	Improved MCSS coming to RDKit soon	Ed Griffen	Abstract We''re working on faster maximum common structure searching, and will be porting it into RDKit. We'll talk about why this matters and what we've done.	BI	standard
Thu	16:30 - 17:00	Roger's talk		Abstract	BI	standard
Thu	17:00 - 17:30	Wrap up		Abstract	BI	standard
Thu	19:30 - ???	Conference Dinner #2		Abstract ☕️	Masaryk Cafeteria	☕️