Explainable AI (XAI): Transparency in Artificial Intelligence Systems

Conrad Rebello
Jul 1, 2024
6 min read

AI drives diverse sectors, but opaque models raise bias concerns, necessitating understandable, ethical advancements.
Explainable AI (XAI) is a field focused on making AI systems' decision-making processes transparent and understandable.
Explainability includes local, global, post-hoc, and in-built methods, with self-interpreting or model-agnostic approaches for clarity.
LIME, SHAP, counterfactual explanations, decision trees, and linear regression are types of XAI that help make models more interpretable.

Artificial Intelligence - Powering The Future

Artificial intelligence (AI) has become a game-changer for decision-making across industries. While some AI models entertain or answer basic questions, others are tackling complex tasks in fields like finance and healthcare. However, a major hurdle has emerged: the lack of transparency in how some AI models reach their conclusions.

Two heads, one with a bulb on the inside one with brains

These opaque models, often called "black boxes," can make impactful decisions that are questionable for their creators and users. This raises concerns about bias. Since AI systems learn from the data they're trained on, they can perpetuate societal biases present in that data. The lack of explanation is particularly worrisome in critical areas like criminal justice, recruitment, and healthcare. In these fields, biased or inexplicable decisions can have serious consequences.

In the realm of AI technologies, particularly deep learning and neural networks, there lies a pivotal challenge: ensuring these advanced AI models are not only powerful but also understandable to users. To combat this challenge, the field of Explainable AI (XAI) has taken root.

What is Explainable AI (XAI)?

Explainable Artificial Intelligence (XAI) is a field that sheds light on how AI systems reach their decisions. It helps us understand the "why" behind the "what" of AI, ensuring fairness and building trust in its applications and responsible AI.

A robot holding a board spelling out the words "here's why"

In the realm of machine learning, understanding how models arrive at their decisions is vital for developers and users. This builds trust and allows us to leverage their power more effectively. Fortunately, several approaches can shed light on this process.

Before we dive into the world of explainable machines, let us unpack a few key terms.

Types of Explanations:

Local Explanations:

These explanations delve into the rationale behind individual predictions. They unveil the specific factors within the algorithm that influenced a model's decision for a particular input. This approach is invaluable for dissecting a model's behaviour in specific instances, but it may not illuminate its overall decision-making framework.

Global Explanations:

Here, the objective is to gain a comprehensive understanding of the model's behaviour across the entire spectrum of potential inputs. Global explanations reveal the broader patterns and features that the model prioritizes when making predictions. While this approach provides valuable insight into the model's general logic, it might not offer granular detail for every prediction.

Timing and Approach to Explainability:

Post-hoc Explanations:

This is the most prevalent approach in the use of AI, consisting of techniques applied after a model has completed its training. These techniques aim to unveil how the model arrived at its conclusions. Notably, both local and global explanations can be generated through post-hoc methods.

In-built Explanations:

Some models, particularly those designed with interpretability in mind, have built-in explanation capabilities. These explanations are generated during the training process itself, offering a more holistic understanding of how the model learns and makes decisions.

Approaches to Explainable Models:

Self-interpreting Models:

Interpretable models are inherently designed to be easy to understand. Their internal workings are relatively transparent, which allows for a clear understanding of their decision-making process for human users. Examples include decision trees and some simpler linear models.

Advantages:

Self-interpreting models offer readily understandable explanations and shed light on the model's reasoning.

Disadvantages:

Achieving the same level of performance as complex models might be challenging with self-interpreting models.

Model-Agnostic Explainers:

These techniques offer versatility as they can be applied to any machine learning model, irrespective of its internal structure. They treat the model as a "black box" and focus on the relationship between the model's inputs and outputs.

Advantages:

Model-agnostic explainers provide a flexible solution that can be applied to various models.

Disadvantages:

A deep understanding of the model's internal workings might be elusive with these techniques.

The optimal approach for understanding a model's explanations hinges on specific needs. If comprehending individual predictions is critical, local explanations or self-interpreting models are valuable tools. Conversely, for a broader understanding of the model's overall behaviour, global explanations or model-agnostic techniques are more suitable.

Types of AI Explainability models:

With a solid understanding of Explainable AI (XAI) in mind, let's delve deeper and explore the various types of AI Explainability models available

LIME:

LIME, which stands for Local Interpretable Model-Agnostic Explanations, is a powerful technique that sheds light particularly focusing on individual model predictions. LIME can help bridge the understanding of which factors within that data point most heavily influenced the outcome. It constructs a simpler model that replicates the behaviour of the complex model, but only within the immediate vicinity of the specific prediction. LIME then makes slight modifications to the data point's features and observes how these changes affect the predictions from the simplified model. Based on these adjustments, it builds a straightforward model, often linear, that approximates the reasoning of the complex model. This allows it to identify which features had the greatest influence on the original prediction.

Example: For a house price prediction, LIME might show that for a specific house, its location and size were the main reasons for its high predicted price.

SHAPLEY:

SHAP, which stands for SHapley Additive exPlanations, is another technique used to understand the inner workings of how complex models of AI work. Unlike LIME, which focuses on explaining individual predictions, SHAP takes a more global approach. It tackles understanding complex AI models by applying Shapley values, a concept from game theory. SHAP analyzes all possible feature combinations to see how each feature influences predictions when included with different others. By calculating a Shapley value for each feature, it reveals the average impact that feature has on the model's prediction across all scenarios. This approach provides a clear picture of which features are generally most important and how they interact within the model.

Example : In a spam email detection model, SHAP might highlight the presence of specific keywords and suspicious sender addresses as the most important features for identifying spam.

Counterfactual Explanations:

Counterfactual explanations show what needs to be changed in order to get a different result from a model. It answers the question, "What would need to be different to get the desired outcome?" This is useful because it gives clear, actionable information. This works by analyzing the original data and systematically modifying it to see how the prediction changes. By identifying the smallest adjustments needed for a different outcome, counterfactuals ensure that the system pinpoints which features in the dataset had the strongest influence on the original prediction. Counterfactuals are especially helpful in situations where a user needs to understand how a specific goal is achievable. They offer a balance between interpretability and privacy, as they don't require revealing the full model or dataset.

Counterfactual explanation where a reject and accept region is shown. The accept region shows different possibilities for output

Example: For a credit card application that was denied, a counterfactual explanation might say: "If you had $2,000 less credit card debt, your application would have been approved.”

Decision Trees:

Decision trees are self-interpreting models that make predictions by following a series of if-then rules. These rules are organized in a tree-like structure, with each internal node representing a decision based on a feature, each branch representing the outcome of that decision, and each leaf node representing a final prediction or classification. The model's decision-making process can be easily visualized and understood by following the path from the root to a leaf node. Decision trees can handle both numerical and categorical data and are used for both classification and regression tasks. They're valued for their simplicity and interpretability but may struggle with complex relationships in data and can be prone to over-fitting if not properly pruned or regulated.

Example: A decision tree for loan approval might first split on income, then credit score, and finally on employment history.

Linear Regression:

Linear regression is a method used to predict numerical outcomes by fitting a straight line through data points. It analyzes how various factors influence the prediction target. The goal is to determine coefficients for each factor, indicating their impact on the prediction: positive coefficients increase the prediction, while negative ones decrease it. Additionally, the model includes an intercept, which represents the starting point of the prediction. Its simplicity and interpretability have made linear regression widely adopted across many fields. However, its effectiveness relies on the assumption that relationships between variables are linear; when relationships are more complex, its performance may diminish.

Regression graph line with plotted points

Example: A company using linear regression to predict sales based on advertising spend, revealing how budget changes impact sales. This helps optimize advertising strategies and forecast future sales effectively.

In Conclusion

The future of explainable AI (XAI) holds immense potential for advancing the field of AI technologies, particularly deep learning and neural networks. By making AI systems explainable, XAI aims to foster user trust and ensure that the AI is making good decisions. As AI is used in various applications, understanding the data used and the machine learning algorithms behind it becomes crucial. AI researchers are increasingly focused on responsible AI guidelines and AI governance to ensure ethical practices. There are many advantages to interpretable AI, as it enhances transparency and accountability. XAI research is key to unlocking these benefits and providing future opportunities for XAI in diverse sectors. By explaining how advanced AI models work, we can ensure that the use of machine learning is both effective and trustworthy. As XAI may continue to evolve, it will play a vital role in shaping the future of AI, enabling both users and developers to trust that the AI is making informed and reliable decisions.