Lecture 1 - Véronique Van Vlasselaer (Customs fraud detection)
1) What are the steps in an analytics lifecycle? Describe the four main parts, and how they relate
to the customs case, as seen by Véronique Van Vlasselaer.
The analytics lifecycle generally consists of four main steps: Model Development, Model Deployment,
Decisioning and one overarching element ModelOps.
Develop model => deploy model => monitor model => make decisions based on the model => get
results => get feedback => monitor performance => retrain the model => deploy the model
Model Development: (Preparing data, exploring data and developing the model)
This step involves the creation of analytical models using machine learning (ML) algorithms. It
includes data preparation, feature engineering, model training, and validation.
In the customs case, historical data about packages (e.g., consignor, consignee, country of origin,
contents description, compliant/non-compliant) is used to develop models that can predict the
likelihood of a package being non-compliant or suspicious.
The development process emphasizes both data-centric and model-centric approaches. Data-centric
involves improving data quality and creating meaningful features, while model-centric involves
selecting and tuning the best-performing algorithms. => hybrid between both.
Feature engineering => process in which a set of meaningful features is derived from the raw data set
that improve data quality and machine learning model performance
Importance of feature engineering to augment models with business expertise, turned out to be the
secret ingredient for the success of all analytical models.
Statistical features and Business features engineering.
Statistical features engineering can be highly automated. => Feature Machine automatically
assesses data quality issues and automatically generates new features by performing the appropriate
feature transformations to obtain the optimal feature set.
Creation of RFM features
But a challenge is data imbalance, so highly skewed class distribution because only about 1% of the
cases is incompliant. So, you want to use re-sampling techniques when your data is skewed.
1) Undersampling: randomly select compliant cases and remove them.
2) Oversampling: duplicate the incompliant cases.
3) Hybrid approach: combination of under- and oversampling → they used this method.
But even with the over- and undersampling, it wasn’t enough → they applied the SMOTE technique
→ focusing on the minority class and they derived the nearest neighbors. = > create synthetic
samples
,Model centric approach: the better the algorithm, the better the results. The advice she gave was
that you should run multiple auto-tuned algorithms and choose the best one, not the most exotic
one. The best algorithm is defined as the combination between analytical accuracy and business
relevance and requirements. They use Recall and Precision as statistical fit statistics.
1. Recall: percentage of suspicious packages detected by the model
2. Precision evaluates how many of the true suspicious packages that were detected, of all
packages that were suspicious.
=> importance of linking evaluation metrics with business impact and requirements.
Legal considerations => Interpretability, global (how do features contribute) and local interpretability
(explain how the model comes to a certain decision for data observation) of the model.
Model Deployment:
Once developed, models are deployed into production environments where they can start processing
real-time data.
This step is about deploying, monitoring and updating the model when needed.
It is important to measure and manage the performance of models over time. Because as time
passes, the performance of most models drop over time => their predictions become less accurate.
This means identing issues such as data drift (changes in input data over time) and concept drift
(changes in model outcomes) that could negatively impact model performance. => expected model
performance.
Customs authorities track the model's performance by comparing predicted outcomes with actual
inspection results, adjusting the models as necessary to maintain accuracy. Actual model
performance.
In the customs case this is done automated monitoring with alerting systems, that check for the
different metrics and alert when issues arise. Additionally, they use challenger and champions
models. Based on the ongoing model performance, different champions and challengers can be
chosen. Base on this => re-train models or re-build models
Update models and rules frequently.
Decisioning:
The decisioning phase is about integrating the models and combine them with rules to measure the
results: rules → decisions → action → results.
AI MODELS ARE INTEGRATED INTO THE DECSIONING PROCESS
→ The decision logic consists of analytical insight + business rules + flow logic. This step is crucial for
effective fraud detection.
The final step focuses on using the insights generated by the models to make informed decisions.
This involves combining analytical insights with business rules and human expertise to drive
operational decisions.
In customs fraud detection, decisioning integrates predictive model outputs with business rules
(e.g., specific countries of origin triggering automatic inspections) and human expertise to decide
which packages to inspect.
This integration helps optimize workload and improves the efficiency of customs operations by
prioritizing high-risk packages for inspection.
,The modelOps process:
− Model Lifecycle: Continuous monitoring and improvement are essential to ensure that the fraud
detection models remain effective over time. This involves regularly re-evaluating model
performance and making necessary adjustments.
− The analytics lifecycle in the customs case requires continuous monitoring to ensure that the
models remain accurate and relevant, reflecting the dynamic nature of fraudulent activities.
Model Opps is the approach to go through the analytics lifecycle; many initiatives fail because they
are not operationalized.
These four steps form a continuous cycle, ensuring that models are not only accurate and effective at
the point of deployment but remain so through regular updates and refinements based on
performance monitoring and feedback from operational use. In the customs case, this lifecycle helps
maintain a high level of vigilance against fraud while streamlining the inspection process, thereby
enhancing overall operational efficiency.
, 2) Model performance monitoring:
• Indicate where it is situated in the analytics lifecycle model.
• What is measured, and how can it be measured. Make a distinction between post-hoc and
ex-ante evaluation techniques.
• What if the model performance is no longer sufficient?
Analytics lifecycle: Model development => Model deployment => Decisioning
Model performance monitoring is situated in the model deployment stage of the analytics lifecycle.
After deployment the performance of models is monitored to ensure ongoing effectiveness in real-
world scenarios.
Continuously monitor models => prediction of models drops off as time goes on, models become less
predictive and diverge from their true labels, malperformance.
WANT TO MEASURE MODEL PERFORMANCE OVER TIME
Ex ante and post hoc evaluation technique:
• Ex ante evaluation techniques are used to assess expected model performance. At the time
of model deployment, the true outcome of the target variable is often unknown. These
metrics allow us to evaluate the expected model performance without the need for the
actual value of the target variable. => data drift, concept drift, FCI
• Post hoc evaluation technique, these methods are used to assess the actual model
performance and require the true value of the target variable to be known. These methods
evaluate the model's performance based on actual outcomes compared to predictions. =>
Roc curve, AUC, lift, Gini, KS statistic
Data drift (input variable drift):
When the model is deployed into production, it faces real-world data. As the environment changes,
the data might differ from the data that the model was trained on. Data drift refers to changes in the
distribution (statistical properties) of the input data (e.g. change in volume of packages over time)
over time. => changes in the distribution of the input data over time.
These shifts can point to significant changes in behaviour that are due to changing external factors,
economic downturns/upturns, behavioural changes, regulation etc.
This shift in input data distribution can lead to a decline in the model's performance. The reason is,
when you create a machine learning model, you can expect it to perform well on data similar to the
data used to train it.
Want to measure the data stability over time. This can be done by calculating the shift in the
distribution on which the data was trained and a new current dataset that is now being fed into the
model. => Data stability report
Following formula can be used to quantify this:
This formula captures the divergence between the actual distribution of data and the expected
(original) distribution.
A low deviation index indicates that the distribution has remained stable, while a higher deviation
index suggests that the distribution has changed. If the training data set and the current data set
have identical distributions for a variable, the variable's deviation index is equal to 0.