Our use of ordinary desktop applications (email, Web, calendars, etc) is often a manifestation of the activities with which we are engaged. Planning a conference trip involves visits to airline and hotel sites, travel expense forms, etc. Renovating a kitchen involves sketches, product specifications, email with the architect, spreadsheets for tracking expenses, etc. Every enterprise has (often implicit) processes for managing customer queries, requesting maintenance, hiring someone, and so forth. Unfortunately, ordinary desktop applications don't know anything about these activities. Within an enterprise, it may be possible to coerce people to use a specialized workflow system to execute some processes; for example, customer relationship management systems ensure that customer queries are handled properly. But most activities exist entirely in our heads, forcing us to rely on crude techniques such as manual search, file directories, and email folders/threads to remember which documents and data are associated with which activities.
In this talk, I describe techniques for automatically discovering and tracking activities in email. A common theme is the use of machine learning to generalize over previous activities. First, I discuss highly structured activities such as e-commerce transactions. A consumer purchasing an item may receive email messages confirming the order, warning of a delay, and then a shipment notification. Existing email clients do not understand this structure, so users must manage their transactions by sifting through lists of messages. As a step toward providing high-level support for structured activities, I consider the problem of automatically learning an activity's structure. I formalize activities as finite-state automata (where states correspond to the status of the process, and transitions represent messages sent between participants), and propose several unsupervised machine learning algorithms in this context.
Second, I discuss less structured activities such as organizing meetings or collaboratively editing documents. I describe machine learning approaches to activity discovery (ie, grouping messages according to activities) and semantic message analysis (ie, extracting metadata about how messages within an activity relate to one another and to the activity progress). Our key innovation compared to related work is that we exploit the relational structure of these two activities. Instead of attacking these two problems separately, in our synergistic collective classification approach, activity identification is used to assist semantic analysis, and vice versa.