My new paper: Anomaly Detection based on LDA, Autoencoder and GMM

Recently my team has finished a paper about Anomaly Detection. We proposed a novel unsupervised Anomaly Detection model (LAG) based on LDA, Autoencoder, and GMM. Our model can be used on both structured and unstructured data and provides a comprehensive solution for various Anomaly Detection tasks in different industries. Particularly, we provide a way to perform our model on financial transactions. Our model outperforms state-of-the-art anomaly detection models with more than 8% F1 score improvement on the public benchmark datasets.

The innovation of our work includes the following aspects: Firstly, we propose a way to conduct tokenization for the transaction data, which can convert a transaction to a word and a batch of transactions to a document that represents the financial behavior of a customer. Secondly, we provide a way to deal with the unstructured data by exploiting LDA, which can transform text data or any discrete data into a low-dimensional space and the low-dimensional topic vector generated by LDA will be very helpful in the downstream tasks. Thirdly, we combine the LDA, Autoencoder, and GMM as an entire model to perform anomaly detection. 

[pdf-embedder url=”https://frankworkshophome.files.wordpress.com/2021/01/df0c3-lag_pwc.pdf”%5D

 

Published by frank victor xu

I am a data science practitioner. I love math, artificial intelligence and big data. I am looking forward to sharing experience with all data science enthusiasts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: