Date
2017
Notes
Honolulu, juin 2017
Dewey
Informatique générale
Sujet
Email analysis; Word2vec; LSA; process mining; process modeling
Conference date
2017
Book title
2017 IEEE International Conference on Cognitive Computing (ICCC)
Publisher
IEEE - Institute of Electrical and Electronics Engineers
Publisher city
Piscataway, NJ
ISBN
978-1-5386-2007-6
Author
Jlailaty, Diana
Grigori, Daniela
Belhajjame, Khalid
Type
Communication / Conférence
Item number of pages
112-119
Abstract (EN)
Due to its wide use in personal, but most importantly, professional contexts, email represents a valuable source of information that can be harvested for understanding, reengineering and repurposing undocumented business processes of companies and institutions. Few researchers have investigated the problem of extracting and analyzing the process-oriented information contained in emails. In this paper, we go forward in this direction by proposing a new method to discover business process activities from email logs. Towards this aim, emails are grouped according to the process model they belong to. This is followed by sub-grouping and labeling the emails of each process model into business activity types. These tasks are applied by deploying an unsupervised mining technique accompanied by semantic similarity measurement methods. Two representative similarity measurement methods are examined: Latent Semantic Indexing (LSA) and Word2vec. These methods are compared to prove that Word2vec provides a better performance than LSA in grouping emails according to what process model they are related to, and in discovering emails belonging to the same activity type. Experimental results are detailed to illustrate and prove our approach contributions.