Government Funding Graph RAG
In this article, I present my latest open-source project — Government Funding Graph. The inspiration for this project came from a desire to make better tooling for grant writing, namely to suggest research topics, funding bodies, research institutions, and researchers. I have made Innovate UK grant applications in the past, so I have had an interest in […]
Why Most Cyber Risk Models Fail Before They Begin
Cybersecurity leaders are being asked impossible questions. “What’s the likelihood of a breach this year?” “How much would it cost?” And “how much should we spend to stop it?” Yet most risk models used today are still built on guesswork, gut instinct, and colorful heatmaps, not data. In fact, PwC’s 2025 Global Digital Trust Insights […]
How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals
The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost. Beyond the headlines and online buzz, how can we assess the model’s reasoning abilities […]
Data Science: From School to Work, Part IV
Introduction Let’s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front and rear lights work. But if the lights […]
Enterprise AI: From Build-or-Buy to Partner-and-Grow
Not long ago, a cooperation partner casually approached me with an AI use case at their organization. They wanted to make their onboarding process for new staff more efficient by using AI to answer the repetitive questions of newcomers. I suggested a practical chat approach that would integrate their internal documentation, and off they went […]
Explained: How Does L1 Regularization Perform Feature Selection?
Feature Selection is the process of selecting an optimal subset of features from a given set of features; an optimal feature subset is the one which maximizes the performance of the model on the given task. Feature selection can be a manual or rather explicit process when performed with filter or wrapper methods. In these […]
How to Get Performance Data from Power BI with DAX Studio
Introduction To put things straight: I will not discuss how to optimize DAX Code today. More articles will follow, concentrating on common mistakes and how to avoid them. But, before we can understand the performance metrics, we need to understand the architecture of the Tabular model in Power Bi.The same architecture applies to Tabular models […]
Beginner’s Guide to Creating a S3 Storage on AWS
Introduction AWS is a well-known cloud provider whose primary goal is to allocate server resources for software engineers to deploy their applications. AWS offers many services, one of which is EC2, providing virtual machines for running software applications in the cloud. However, for data-intensive applications, storing data inside EC2 instances is not always the optimal […]
Building a Personal API for Your Data Projects with FastAPI
How many times have you had a messy Jupyter Notebook filled with copy-pasted code just to re-use some data wrangling logic? Whether you do it for passion or for work, if you code a lot, then you’ve probably answered something like “way too many”. You’re not alone. Maybe you tried to share data with colleagues […]
Beyond the Code: Unconventional Lessons from Empathetic Interviewing
Recently, I’ve been interviewing Computer Science students applying for data science and engineering internships with a 4-day turnaround from CV vetting to final decisions. With a small local office of 10 and no in-house HR, hiring managers handle the entire process. This article reflects on the lessons learned across CV reviews, technical interviews, and post-interview […]