Recently my team has finished a paper about Anomaly Detection. We proposed a novel unsupervised Anomaly Detection model (LAG) based on LDA, Autoencoder, and GMM. Our model can be used on both structured and unstructured data and provides a comprehensive solution for various Anomaly Detection tasks in different industries. Particularly, we provide a way to perform our model on financial transactions. Our model outperforms state-of-the-art anomaly detection models with more than 8% F1 score improvement on the public benchmark datasets.
The innovation of our work includes the following aspects: Firstly, we propose a way to conduct tokenization for the transaction data, which can convert a transaction to a word and a batch of transactions to a document that represents the financial behavior of a customer. Secondly, we provide a way to deal with the unstructured data by exploiting LDA, which can transform text data or any discrete data into a low-dimensional space and the low-dimensional topic vector generated by LDA will be very helpful in the downstream tasks. Thirdly, we combine the LDA, Autoencoder, and GMM as an entire model to perform anomaly detection.
[pdf-embedder url=”https://frankworkshophome.files.wordpress.com/2021/01/df0c3-lag_pwc.pdf”%5D