An Internship Project at The Strategy Unit, NHS
Aug 14, 2025
This presentation will cover
The process for enhancing the artificial dataset followed these steps
The project began by generating a correlation matrix in Python to confirm the lack of meaningful linear relationships between numerical attributes. This step provided the clear justification for the data enhancement.
The Data Before Data Injection
Based on the requirements of the NHP demand model, two key inter-column relationships were identified:
The final output will be an enhanced artificial HES dataset. The dataset was validated to confirm that the newly created relationships were present by generating a new correlation matrix.This confirms that the newly created relationships are present while also ensuring the data remains non-disclosive and suitable for the open-source NHP model.
The Data After Data Injection
Before:
The initial correlation matrix showed no meaningful relationships. This confirmed the need for data enhancement.
After:
The new correlation matrix confirms that the rules successfully injected relationships. The correlation values for the target columns have significantly increased.
While the rule-based approach was a practical and effective solution, future work could explore using Generative Adversarial Networks (GANs).
GANs can learn more complex, non-linear relationships that a rule-based system would miss, creating an even more realistic synthetic dataset.
The slides are available publicly at aiwithash.me