COMPUTER SCIENCE DEPARTMENT

COLLOQUIUM, Fall 2020 – Spring 2021
Speaker Affiliation Date Time (ET)
Brendan O'Connor UMASS Oct 2, 2020 3:30–4:30pm
Wolfgang Gatterbauer NEU Oct 5, 2020 2:00–3:00pm
Eduard Hovy CMU Oct 9, 2020 5:00–6:00pm
Antonio Torralba MIT Oct 16, 2020 3:30–4:30pm
Jordan Boyd Graber UMD Oct 23, 2020 3:30–4:30pm
Dina Demner NIH/NLM Oct 30, 2020 3:30–4:30pm
Ted Pedersen UMN Nov 6, 2020 4:30–5:30pm
Kenneth Mandl Harvard Nov 13, 2020 3:30–4:30pm
Rongxing Lu UNB Nov 20, 2020 3:30–4:30pm
Marinka Zitnik Harvard Dec 4, 2020 2:00–3:00pm
Dan Roth UPenn Dec 10, 2020 3:30–4:30pm
Alexander Rush Cornell Dec 18, 2020 3:30–4:30pm
Heng Ji UIUC Feb 19, 2021 3:30–4:30pm
Philip Resnik UMD Feb 26, 2021 3:30–4:30pm
Matthew Lease UT Austin Mar 5, 2021 2:00–3:00pm
Timothy Bickmore NEU Mar 12, 2021 3:30–4:30pm
Sameer Singh UCI Mar 19, 2021 3:30–4:30pm
Shuchin Aeron Tufts Mar 26, 2021 3:30–4:30pm
Noah Smith UW Apr 9, 2021 3:30–4:30pm
Jacob Andreas MIT Apr 16, 2021 3:30–4:30pm
He He NYU Apr 23, 2021 3:30–4:30pm
Vanessa Frias-Martinez UMD Apr 30, 2021 3:30–4:30pm

Upcoming Talks

Matthew Lease 

Matthew Lease, University of Texas at Austin

Host: Hadi Amiri, UMASS, Lowell

Time: Mar 5, 2021 at 2:00–3:00pm ET

Location: https://uml.zoom.us/j/97264164573

Password: cstalks

Title: Adventures in Crowdsourcing: Toward Safer Content Moderation and Better Supporting Complex Annotation Tasks

Abstract
'll begin the talk discussing content moderation. While most user content posted on social media is benign, other content, such as violent or adult imagery, must be detected and blocked. Unfortunately, such detection is difficult to automate, due to high accuracy requirements, costs of errors, and nuanced rules for acceptable content. Consequently, social media platforms today rely on a vast workforce of human moderators. However, mounting evidence suggests that exposure to disturbing content can cause lasting psychological and emotional damage to some moderators. To mitigate such harm, we investigate a set of blur-based moderation interfaces for reducing exposure to disturbing content whilst preserving moderator ability to quickly and accurately flag it. We report experiments with Mechanical Turk workers to measure moderator accuracy, speed, and emotional well-being across six alternative designs. Our key findings show interactive blurring designs can reduce emotional impact without sacrificing moderation accuracy and speed. See our online demo at: http://ir.ischool.utexas.edu/CM/demo/. The second part of my talk will discuss aggregation modeling. Though many models have been proposed for binary or categorical labels, prior methods do not generalize to complex annotations (e.g., open-ended text, multivariate, or structured responses) without devising new models for each specific task. To obviate the need for task-specific modeling, we propose to model distances between labels, rather than the labels themselves. Our models are largely agnostic to the distance function; we leave it to the requesters to specify an appropriate distance function for their given annotation task. We propose three models of annotation quality, including a Bayesian hierarchical extension of multidimensional scaling which can be trained in an unsupervised or semi-supervised manner. Results show the generality and effectiveness of our models across diverse complex annotation tasks: sequence labeling, translation, syntactic parsing, and ranking.

Bio
Matthew Lease is an Associate Professor in the School of Information at the University of Texas at Austin, where he is co-leading Good Systems (http:goodsystems.utexas.edu/), an eight-year Grand Challenge to design responsible AI technologies. In addition, Lease is an Amazon Scholar, working on Amazon Mechanical Turk, SageMaker Ground Truth and Augmented Artificial Intelligence (A2I). He also worked previously at CrowdFlower. Lease received the Best Paper award at the 2016 AAAI Human Computation and Crowdsourcing conference, as well as three early career awards for crowdsourcing (NSF, DARPA, IMLS). From 2011-2013, Lease co-organized the National Institute of Standards and Technology (NIST) Text Retrieval Conference (TREC) crowdsourcing track.

Timothy Bickmore 

Timothy Bickmore, Northeastern University

Host: Hadi Amiri, UMASS, Lowell

Time: Mar 12, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/92439134605

Password: cstalks

Title: Health Counseling Dialog Systems: Promise and Peril

Abstract
The current pandemic provides a compelling case for automated solutions to health behavior change, from mask wearing and social distancing, to behaviors that can prevent or treat psychological distress or substance misuse, to vaccination intent. Changing these behaviors can play a major role in how the pandemic is managed, and will help determine when it ends and the trajectory of our societal recovery. I will present a range of embodied conversational agents that have been used in medicine and public health to promote compliance with recommended healthcare regimens, and discuss how they could be used to help control COVID-19 and help us prepare for the next pandemic. I will discuss how conversational agents have been shown to be particularly effective at addressing health disparities for underserved populations, and why this is crucially important in pandemic response. I will also discuss some of the inherent risks in using natural language interfaces for medical counseling systems and outline some solutions to prevent users from harming themselves by following incorrect advice.

Bio
Dr. Timothy Bickmore is a Professor and Associate Dean for Research in the Khoury College of Computer Sciences at Northeastern University in Boston. The focus of his research is on the development and evaluation of embodied conversational agents, virtual and robotic, that emulate face-to-face interactions between health providers and patients. These agents have been used in automated health education and long-term health behavior change interventions, spanning preventive medicine and wellness promotion, chronic disease management, inpatient care, substance misuse screening and treatment, mental health treatment, and palliative care. His systems have been evaluated in multiple clinical trials with results published in medical journals including JAMA and The Lancet. Prior to Northeastern, Dr. Bickmore served as an Assistant Professor of Medicine at the Boston University School of Medicine. Dr. Bickmore received his Ph.D. from MIT, doing his dissertation work in the Media Lab studying interactions between people and embodied conversational agents in task contexts, such as healthcare, in which social-emotional behavior can be used to improve outcomes.

Sameer Singh 

Sameer Singh, University of California, Irvine

Host: Hadi Amiri, UMASS, Lowell

Time: Mar 19, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/93777620265

Password: cstalks

Title: Evaluating and Testing Natural Language Processing Models

Abstract
Current evaluation of the generalization of natural language processing (NLP) systems, and much of machine learning, primarily consists of measuring the accuracy on held-out instances of the dataset. Since the held-out instances are often gathered using similar annotation process as the training data, they include the same biases that act as shortcuts for machine learning models, allowing them to achieve accurate results without requiring actual natural language understanding. Thus held-out accuracy is often a poor proxy for measuring generalization, and further, aggregate metrics have little to say about where the problem may lie. In this talk, I will introduce a number of approaches we are investigating to perform a more thorough evaluation of NLP systems. I will first provide an overview of automated techniques for perturbing instances in the dataset that identify loopholes and shortcuts in NLP models, including semantic adversaries and universal triggers. I will then describe recent work in creating comprehensive and thorough tests and evaluation benchmarks for NLP that aim to directly evaluate comprehension and understanding capabilities. The talk will cover a number of NLP tasks, including sentiment analysis, textual entailment, paraphrase detection, and question answering.

Bio
Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine (UCI). He is working primarily on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he also worked at Microsoft Research, Google Research, and Yahoo! Labs. He was selected as a DARPA Riser, and has been awarded the grand prize in the Yelp dataset challenge, the Yahoo! Key Scientific Challenges, UCI Mid-Career Excellence in research award, and recently received the Hellman and the Noyce Faculty fellowships. His group has received funding from Allen Institute for AI, Amazon, NSF, DARPA, Adobe Research, Base 11, and FICO. Sameer has published extensively at machine learning and natural language processing conferences and workshops, including paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020.

Shuchin Aeron 

Shuchin Aeron, Tufts University

Host: Anna Rumshisky, UMASS, Lowell

Time: Mar 26, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/95541491953

Password: cstalks

Title:

Abstract

Bio

Noah Smith 

Noah Smith, University of Washington

Host: Hadi Amiri, UMASS, Lowell

Time: Apr 9, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/97794620415

Password: cstalks

Title:

Abstract

Bio

Jacob Andreas 

Jacob Andreas, Massachusetts Institute of Technology

Host: Hadi Amiri, UMASS, Lowell

Time: Apr 16, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/92245469889

Password: cstalks

Title:

Abstract

Bio

He He 

He He, New York University

Host: Hadi Amiri, UMASS, Lowell

Time: Apr 23, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/96595540049

Password: cstalks

Title:

Abstract

Bio

Vanessa Frias-Martinez 

Vanessa Frias-Martinez, University of Maryland

Host: Hadi Amiri, UMASS, Lowell

Time: Apr 30, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/91315501257

Password: cstalks

Title:

Abstract

Bio



Previous Talks

Philip Resnik 

Philip Resnik, University of Maryland

Host: Hadi Amiri, UMASS, Lowell

Time: Feb 26, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/99309189281

Password: cstalks

Title: Computational Analysis of Language and the Assessment of Suicide Risk

Abstract
This talk, to be given remotely in the middle of a pandemic, will be about a problem that already existed long prior to COVID-19 as a kind of international pandemic in its own right. Suicide has a worldwide death toll approaching 800,000 people per year worldwide, and in the U.S. in 2016 it became the second leading cause of death among those aged 10-34. Now compounding these existing problems is an “echo pandemic” of suicide and mental illness emerging in the wake of COVID-19, as people struggle with isolation, stress, and sustained disruptions of day to day life. I'll talk about computational linguistics research related to the problem of suicide, raising issues connected with computational research on mental health more generally and including not only the technological angle but also questions of data access, ethical considerations, and the role of computational technologies into the mental health ecosystem.

Bio
Philip Resnik is Professor at University of Maryland in the Department of Linguistics and Institute for Advanced Computer Studies. He earned his bachelor's in Computer Science at Harvard and his PhD in Computer and Information Science at the University of Pennsylvania, and does research in computational linguistics. Prior to joining UMD, he was an associate scientist at BBN, a graduate summer intern at IBM T.J. Watson Research Center (subsequently awarded an IBM Graduate Fellowship) while at UPenn, and a research scientist at Sun Microsystems Laboratories. Resnik's most recent research focus has been in computational social science, with an emphasis on connecting the signal available in people's language use with underlying mental state – this has applications in computational political science, particularly in connection with ideology and framing, and in mental health, focusing on the ways that linguistic behavior may help to identify and monitor depression, suicidality, and schizophrenia. Outside his academic research, Resnik has been a technical co-founder of CodeRyte (NLP for electronic health records, acquired by 3M in 2012), and is an advisor to Converseon (social strategy and analytics), FiscalNote (machine learning and analytics for government relations), and SoloSegment (web site search and content optimization). He was named an ACL Fellow in 2020.

Heng Ji 

Heng Ji, University of Illinois at Urbana-Champaign

Host: Hadi Amiri, UMASS, Lowell

Time: Feb 19, 2021 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/94521966294

Password: cstalks

Title: How to Write a History Book?

Abstract
Understanding events and communicating about events are fundamental human activities. However, it's much more difficult to remember event-related information compared to entity-related information. For example, most people in the United States will be able to answer the question “Which city is Columbia University is located in?”, but very few people can give a complete answer to “Who died from COVID-19?”. Human-written history books are often incomplete and highly biased because “History is written by the victors”. In this talk I will present a new research direction on event-centric knowledge base construction from multimedia multilingual sources, and then perform consistency checking and reasoning. Our minds represent events at various levels of granularity and abstraction, which allows us to quickly access and reason about old and new scenarios. Progress in natural language understanding and computer vision has helped automate some parts of event understanding but the current, first-generation, automated event understanding is overly simplistic since it is local, sequential and flat. Real events are hierarchical and probabilistic. Understanding them requires knowledge in the form of a repository of abstracted event schemas (complex event templates), understanding the progress of time, using background knowledge, and performing global inference. Our approach to second-generation event understanding builds on an incidental supervision approach to inducing an event schema repository that is probabilistic, hierarchically organized and semantically coherent. This facilitates inducing higher-level event representations analysts can interact with, and allow them to guide further reasoning and extract events by constructing a novel structured cross-media cross-lingual common semantic space. When complex events unfold in an emergent and dynamic manner, the multimedia multilingual digital data from traditional news media and social media often convey conflicting information. To understand the many facets of such complex, dynamic situations, we have developed various novel methods to induce hierarchical narrative graph schemas and apply them to enhance end-to-end joint neural Information Extraction, event coreference resolution, and event time prediction.

Bio
Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department of University of Illinois at Urbana-Champaign. She is an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as “Young Scientist” and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include “AI's 10 to Watch” Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Award in 2014-2018, and ACL2020 Best Demo Paper award. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030. She is the lead of many multi-institution projects and tasks, including the U.S. ARL projects on information fusion and knowledge networks construction, DARPA DEFT Tinker Bell team and DARPA KAIROS RESIN team. She has coordinated the NIST TAC Knowledge Base Population task since 2010. She has served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2021. Her research has been widely supported by the U.S. government agencies (DARPA, ARL, IARPA, NSF, AFRL, DHS) and industry (Amazon, Google, Bosch, IBM, Disney).

Alexander Rush 

Alexander Rush, Cornell University

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 18, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/93871406559

Password: cstalks

Title: Deployable Language Systems

Abstract
Natural language models for translation and classification now work relatively well, and there is demand for widespread use in real systems. Models developed for research however do not naturally translate to deployment scenarios, particularly on resource constrained devices like mobile phones. In this talk I will discuss two axes that make it difficult to deploy NLP models in practice: a) Serial generation in translation models makes them difficult to optimize, and b) Fine-tuned parameter size in classification makes models difficult to deploy to end-users. I propose two approaches that aim to circumvent these issues, and discuss some practical work on deploying large NLP models on edge devices.

Bio
Alexander “Sasha” Rush is an Associate Professor at Cornell Tech in NYC. His group's research is in the intersection of natural language processing, deep learning, and structured prediction with applications in text generation and efficient inference. He contributes to several open-source projects in NLP and works part time on HuggingFace Transformers. He was recently senior Program Chair of ICLR and developed the MiniConf tool used to run ML/NLP virtual conferences. His work has received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award, and several industrial faculty awards.

Dan Roth 

Dan Roth, University of Pennsylvania

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 10, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/99389698582

Password: cstalks

Title: It's Time for Reasoning

Abstract
The fundamental issue underlying natural language understanding is that of semantics – there is a need to move toward understanding natural language at an appropriate level of abstraction in order to support natural language understanding and communication. Machine Learning has become ubiquitous in our attempt to induce semantic representations of natural language and support decisions that depend on it; however, while we have made significant progress over the last few years, it has focused on classification tasks for which we have large amounts of annotated data. Supporting high level decisions that depend on natural language understanding is still beyond our capabilities, partly since most of these tasks are very sparse and generating supervision signals for it does not scale. I will discuss some of the challenges underlying reasoning – making natural language understanding decisions that depend on multiple, interdependent, models, and exemplify it using the domain of Reasoning about Time, as it is expressed in natural language. If time suffice, I will touch upon other inference problems that challenge our ability to understand natural language, addressing issues in Information Pollution.

Bio
Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.” Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was sold to Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.

Marinka Zitnik 

Marinka Zitnik, Harvard University

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 4, 2020 at 2:00–3:00pm ET

Location: https://uml.zoom.us/j/91465138533

Password: cstalks

Title: Graph Neural Networks for Biomedical Data

Abstract

The success of machine learning depends heavily on the choice of representations used for downstream tasks. Graph neural networks have emerged as a predominant choice for learning representations of networked data. Still, methods require abundant label information and focus either on nodes or entire graphs. In this talk, I describe our efforts to expand the scope and ease the applicability of graph representation learning. First, I outline SubGNN, the first subgraph neural network for learning disentangled subgraph representations. Second, I will describe G-Meta, a novel meta-learning approach for graphs. G-Meta uses subgraphs to generalize to completely new graphs and never-before-seen labels using only a handful of nodes or edges. G-Meta is theoretically justified and scales to orders of magnitude larger datasets than prior work. Finally, I will discuss applications in biology and medicine. The new methods have enabled the repurposing of drugs for new diseases, including COVID-19, where our predictions were experimentally verified in the wet laboratory. Further, the methods enabled discovering dozens of combinations of drugs safe for patients with considerably fewer unwanted side effects than today's treatments. The methods also allow for molecular phenotyping, much better than more complex algorithms. Lastly, I describe our efforts in learning actionable representations that allow users of our models to receive predictions that can be interpreted meaningfully.

Bio
Marinka Zitnik is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Blavatnik Institute, Broad Institute of MIT and Harvard, and Harvard Data Science. Dr. Zitnik is a computer scientist studying machine learning, focusing on challenges brought forward by data in science, medicine, and health. She has published extensively on representation learning, knowledge graphs, data fusion, graph ML (NeurIPS, JMLR, IEEE TPAMI, KDD, ICLR), and applications to biomedicine (Nature Methods, Nature Communications, PNAS). Her algorithms are used by major institutions, including Baylor College of Medicine, Karolinska Institute, Stanford Medical School, and Massachusetts General Hospital. Her work received several best paper, poster, and research awards from the International Society for Computational Biology. She has recently been named a Rising Star in Electrical Engineering and Computer Science (EECS) by MIT and also a Next Generation in Biomedicine by the Broad Institute, being the only young scientist who received such recognition in both EECS and Biomedicine.

Rongxing Lu 

Rongxing Lu, University of New Brunswick

Host: Xinwen Fu, UMASS, Lowell

Time: Nov 20, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/99726805928

Password: cstalks

Title: Privacy-Preserving Computation Offloading for Time-Series Activities Classification in eHealthcare

Abstract
The convergence of Internet of Things (IoT) and smart healthcare technologies has opened up various promising applications that can significantly improve the quality of healthcare services. Among those applications, predicting patients’ physical health based on their routine activities data collected from IoT devices is one of the most popular applications, where patients’ data are considered as time-series activities and patients’ physical health can be predicted by a classification model. Though many existing works have been exploited in this application, they either impose the computational costs of the classification on the healthcare center (e.g., hospitals) or delegate the cloud to process the classification without considering the privacy issues. However, since the healthcare center may not be powerful in computing and the cloud is not fully trusted, there is a high demand in offloading the computational cost of the healthcare center to the cloud while preserving the privacy of classification result against the cloud. Aiming at this challenge, in this work, we present a novel privacy-preserving time-series activities classification algorithm by using hidden markov model (HMM). Specifically, we first design a variant of forward algorithm of HMM and further introduce a privacy-preserving variant of forward (PPVF) protocol for the variant of forward algorithm. Then, based on the PPVF protocol, we propose our classification algorithm, which can offload the computational cost of the healthcare center to the cloud and preserve the privacy of classification result. Finally, security analysis and performance show that our proposal is not only privacy-preserving but also efficient in terms of lower computational cost.

Bio
Rongxing Lu (S’99-M’11-SM’15) is an associate professor at the Faculty of Computer Science (FCS), University of New Brunswick (UNB), Canada, since August 2016. Before that, he worked as an assistant professor at the School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), Singapore from April 2013 to August 2016. Rongxing Lu worked as a Postdoctoral Fellow at the University of Waterloo from May 2012 to April 2013. He was awarded the most prestigious “Governor General’s Gold Medal”, when he received his PhD degree from the Department of Electrical & Computer Engineering, University of Waterloo, Canada, in 2012; and won the 8th IEEE Communications Society (ComSoc) Asia Pacific (AP) Outstanding Young Researcher Award, in 2013. He is presently a senior member of IEEE Communications Society. His research interests include applied cryptography, privacy enhancing technologies, and IoT-Big Data security and privacy. He has published extensively in his areas of expertise (with citation 20,700+ and H-index 71 from Google Scholar as of November 2020), and was the recipient of 9 best (student) paper awards from some reputable journals and conferences. Currently, Dr. Lu serves as the Vice-Chair (Conferences) of IEEE ComSoc CIS-TC (Communications and Information Security Technical Committee). Dr. Lu is the Winner of 2016-17 Excellence in Teaching Award, FCS, UNB.

Kenneth Mandl 

Kenneth Mandl, Harvard University

Host: Hadi Amiri, UMASS, Lowell

Time: Nov 13, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/91745228147

Password: cstalks

Title: Parsimonious Standards for Extraordinary Outcomes: a Universal, Regulated API for Healthcare

Bio
Mandl directs the Computational Health Informatics Program at Boston Children's Hospital and is the Donald A.B. Lindberg Professor of Pediatrics and Professor of Biomedical Informatics at Harvard Medical School. His work at the intersection of population and individual health has had a unique and sustained influence on the developing field of biomedical informatics. He was a pioneer of the first personally controlled health record systems, the first participatory surveillance system, and real time biosurveillance. Mandl co-developed SMART, a widely-adopted approach to enable a health app written once to access digital data and run anywhere in the healthcare system. The 21st Century Cures Act made SMART a universal property of the healthcare system, enabling innovators to rapidly reach market-scale and patients and doctors to access data and an “app store for health.” He applies open source inventions to lead EHR research networks and is a leader of the Genomics Research and Innovation Network. Mandl was advisor to two Directors of the CDC and chaired the Board of Scientific Counselors of the NIH's National Library of Medicine. He has been elected to multiple honor societies including the American Society for Clinical Investigation, Society for Pediatric Research, American College of Medical Informatics and American Pediatric Society. He received the Presidential Early Career Award for Scientists and Engineers and the Donald A.B. Lindberg Award for Innovation in Informatics.

Ted Pedersen 

Ted Pedersen, University of Minnesota in Duluth

Host: Hadi Amiri, UMASS, Lowell

Time: Nov 6, 2020 at 4:30–5:30pm ET

Location: https://uml.zoom.us/j/94850255401

Password: cstalks

Title: Automatically Identifying Islamophobia in Social Media

Abstract
Social media continues to grow in its scope, importance, and toxicity. Hate speech is ever-present in today’s social media, and causes or contributes to dangerous situations in the real world for those it targets. Anti-Muslim bias and hatred has escalated in both public life and social media in recent years. This talk will overview a new and ongoing project in identifying Islamophobia in social media using techniques from Natural Language Processing. I will describe our methods of data collection and annotation,and discuss some of the challenges we have encountered thus far. In addition I’ll describe some of the pitfalls that exist for any effort attempting to identify hate speech (automatically or not).

Bio
Ted Pedersen is a Professor in the Department of Computer Science at the University of Minnesota, Duluth. His research interests are in Natural Language Processing and most recently are focused on computational humor and identifying hate speech. His research has previously been supported by the National Institutes of Health (NIH) and a National Science Foundation (NSF) CAREER award. More details are available at http://www.d.umn.edu/~tpederse.

Dina Demner 

Dina Demner, NIH, National Library of Medicine

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 30, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/98198637523

Password: cstalks

Title: Looking for information and answers during a pandemic

Abstract
COVID-19 caused the first ever infodemic – an avalanche of scientific publications, as well as official and unofficial communications related to the disease caused by the novel coronavirus. Most of these publications intend to inform clinicians, researchers, policy makers, and patients about the health, socio-economic, and cultural consequences of the pandemic. Leveraging this stream of information is essential for developing policies, guidelines and strategies during the pandemic, for recovery after the COVID-19 pandemic, and for designing measures to prevent recurrence of similar threats. In collaboration with the National Institute of Standards (NIST), Ai2 and UTHealth and OHSU researchers, we have developed datasets for retrieval of COVID-19 information and automatic question answering. These datasets allowed us to (1) conduct community-wide evaluations of the information retrieval and question answering systems; (2) develop novel approaches to meeting information needs as they evolve during pandemics; and (3) automatically detect misinformation. I will discuss the resources and some of the lessons learned in the five rounds of the TREC-COVID evaluation, the ongoing Epidemic Question Answering Challenge (EPIC-QA), and our approaches to detecting misinformation about COVID-19 within the TREC 2020 Misinformation track evaluation.

Bio
Dr. Dina Demner-Fushman is an Investigator at the Lister Hill National Center for Biomedical Communications, NLM, NIH. Her group studies approaches to Information Extraction for Clinical Decision Support, Clinical Data Processing, and Image and Text Indexing for Clinical Decision Support and Education. The outgrowths of this research are the evidence-based decision support system in use at the NIH Clinical Center since 2009, an image retrieval engine, Open-i, launched in 2012, and an automatic question answering service CHiQA launched in 2018. Dina Demner-Fushman is a Fellow of the American College of Medical Informatics (ACMI), an Associate Editor of the Journal of the American Medical Informatics Association (JAMIA), and a founding member of the Association for Computational Linguistics Special Interest Group on biomedical natural language processing. As the secretary of this group, she has been an essential organizer of the yearly ACL BioNLP Workshop since 2007. Dr. Demner-Fushman has received sixteen staff recognition and special act NLM awards since 2002. She is a recipient of the 2012 NIH Award of Merit, a 2013 NLM Regents Award for Scholarship or Technical Achievement and a 2014 NIH Office of the Director Honor Award.

Jordan Boyd Graber 

Jordan Boyd Graber, University of Maryland, College Park

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 23, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/94630726928

Password: cstalks

Title: Artificial intelligence isn't a game show (but it should be)

Abstract
Artificial intelligence is viewed as a goal in science (let's build intelligent machines) and in education (let's train software engineers to build smart assistants). Despite the serious implications for the economy and society, the most widely-accepted view of the end goal of Artificial Intelligence is a parlor game: a trivial “imitation game” (known today as the Turing Test). Likewise, many of the watersheds in the public understanding of AI progress have been in frivolous games like chess or go. Sometimes, they're a literal game show like Jeopardy! After discussing why existing game show exhibitions have given an inaccurate impression of how well we're doing with question answering, I'll discuss how we can use the skills and strategies of high school trivia competitions to improve the science of AI, communicate the limitations of AI, and to broaden participation in computer science and artificial intelligence.

Bio
Jordan Boyd-Graber is an associate professor in the University of Maryland's Computer Science Department, iSchool, UMIACS, and Language Science Center. Jordan's research focus is in applying machine learning and Bayesian probabilistic models to problems that help us better understand social interaction or the human cognitive process. He and his students have won “best of” awards at NIPS (2009, 2015), NAACL (2016), and CoNLL (2015), and Jordan won the British Computing Society's 2015 Karen Spärk Jones Award and a 2017 NSF CAREER award. His research has been funded by DARPA, IARPA, NSF, NCSES, ARL, NIH, and Lockheed Martin and has been featured by CNN, Huffington Post, New York Magazine, and the Wall Street Journal.

Antonio Torralba 

Antonio Torralba, Massachusetts Institute of Technology (MIT)

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 16, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/93071258047

Password: cstalks

Title: Learning from vision, touch and audition

Abstract
Babies learn with very little supervision, and, even when supervision is present, it comes in the form of an unknown spoken language that also needs to be learned. How can kids make sense of the world? In this talk, I will talk about several ways in which one can discover meaningful representations without requiring manually annotated data. I will show that an agent that has access to multimodal data (like vision, audition or touch) can use the correlation between images and sounds to discover objects in the world without supervision. I will show that ambient sounds can be used as a supervisory signal for learning to see and vice versa (the sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings). I will describe an approach that learns, by watching videos without annotations, to locate image regions that produce sounds, and to separate the input sounds into a set of components that represents the sound from each pixel. I will also discuss our recent work on capturing tactile information. I will also show how Generative Adversarial Networks (GANs) can learn meaningful internal representations without supervision.

Bio
Antonio Torralba is the Thomas and Gerd Perkins Professor and head of the AI+D faculty at the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology (MIT). From 2017 to 2020, he was the MIT director of the MIT-IBM Watson AI Lab, and, from 2018 to 2020, the inaugural director of the MIT Quest for Intelligence, a MIT campus-wide initiative to discover the foundations of intelligence. He is also member of CSAIL and the Center for Brains, Minds and Machines. He received the degree in telecommunications engineering from Telecom BCN, Spain, in 1994 and the Ph.D. degree in signal, image, and speech processing from the Institut National Polytechnique de Grenoble, France, in 2000. From 2000 to 2005, he spent postdoctoral training at the Brain and Cognitive Science Department and the Computer Science and Artificial Intelligence Laboratory, MIT, where he is now a professor. Prof. Torralba is an Associate Editor of the International Journal in Computer Vision, and has served as program chair for the Computer Vision and Pattern Recognition conference in 2015. He received the 2008 National Science Foundation (NSF) Career award, the best student paper award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in 2009, and the 2010 J. K. Aggarwal Prize from the International Association for Pattern Recognition (IAPR). In 2017, he received the Frank Quick Faculty Research Innovation Fellowship and the Louis D. Smullin (’39) Award for Teaching Excellence. And the 2020 PAMI Mark Everingham Prize.

Eduard Hovy 

Eduard Hovy, Carnegie Mellon University (CMU)

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 9, 2020 at 5:00–6:00pm ET

Location: https://uml.zoom.us/j/92887951351

Password: cstalks

Title: From Simple to Complex QA

Abstract
Recent automated QA system achieve some strong results using a variety of techniques. How do complex/deep/neural QA approaches differ from simple/shallow ones? In early QA, pattern-learning and -matching techniques identified the appropriate factoid answer(s). In deep QA, neural architectures learn and apply more-flexible generalized word/type-sequence 'patterns'. However, many QA tasks require some sort of intermediate reasoning or other inference procedures that go beyond generalized patterns of words and phrases. One approach focuses on learning small access functions to locate the answer in structured resources like tables or databases. But much (or most) online information is not structured, and what to do in this case is unclear. Most current 'deep' QA research takes a one-size-fits-all approach based on the hope that a multi-layer neural architecture will somehow learn to encode inference steps automatically. The main problem facing this approach is the difficulty in determining exactly what reasoning is required, and what knowledge resources are needed in support. How should the QA community address this challenge? In this talk I outline the problem, define four levels of QA, and propose a general direction for future research.

Bio
Eduard Hovy is a research professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He also holds adjunct professorships in CMU's Machine Learning Department and at USC (Los Angeles) and BUPT (Beijing). Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987, and was awarded honorary doctorates from the National Distance Education University (UNED) in Madrid in 2013 and the University of Antwerp in 2015. He is one of the initial 17 Fellows of the Association for Computational Linguistics (ACL) and is also a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). Dr. Hovy’s research focuses on computational semantics of language, and addresses various areas in Natural Language Processing and Data Analytics, including in-depth machine reading of text, information extraction, automated text summarization, question answering, the semi-automated construction of large lexicons and ontologies, and machine translation. In late 2019 his Google h-index was 80, with over 30,000 citations. Dr. Hovy is the author or co-editor of six books and over 400 technical articles and is a popular invited speaker. From 2003 to 2015 he was co-Director of Research for the Department of Homeland Security’s Center of Excellence for Command, Control, and Interoperability Data Analytics, a distributed cooperation of 17 universities. In 2001 Dr. Hovy served as President of the international Association of Computational Linguistics (ACL), in 2001–03 as President of the International Association of Machine Translation (IAMT), and in 2010–11 as President of the Digital Government Society (DGS). Dr. Hovy regularly co-teaches Ph.D.-level courses and has served on Advisory and Review Boards for both research institutes and funding organizations in Germany, Italy, Netherlands, Ireland, Singapore, and the USA.

Wolfgang Gatterbauer 

Wolfgang Gatterbauer, Northeastern University

Host: Tingjian Ge, UMASS, Lowell

Time: Oct 5, 2020 at 2:00–3:00pm ET

Location: https://uml.zoom.us/j/95441518174

Password:

Title: Algebraic Amplification for Semi-Supervised Learning from Sparse Data

Abstract
Node classification is an important problem in graph data management. It is commonly solved by various label propagation methods that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on knowing the compatibility matrix that must be provided by either domain experts or heuristics. Can we instead directly estimate the correct compatibilities from a sparsely labeled graph in a principled and scalable way? We answer this question affirmatively and suggest a method called distant compatibility estimation that works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph sketches. We refer to algebraic amplification as the more general idea of leveraging algebraic properties of an algorithm's update equations to amplify sparse signals. We show that our estimator is by orders of magnitude faster than an alternative approach and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap preprocessing step for any existing label propagation method and removes the current dependence on heuristics.
VLDB 2015: Linearized and single-pass belief propagation link1, link2, link2
SIGMOD 2020: Factorized Graph Representations for Semi-Supervised Learning from Sparse Data link1, link2, link3
CODE https://github.com/northeastern-datalab/factorized-graphs/

Bio
Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. Prior to joining Northeastern, he was a postdoctoral fellow in the database group at the University of Washington and an Assistant Professor in the Tepper School of Business at Carnegie Mellon University. One major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing trust, provenance, explanations, and uncertain & inconsistent data. He is a recipient of the NSF Career award and “best-of-conference” mentions from VLDB 2015, SIGMOD 2017, and WALCOLM 2017. In earlier times, he won a Bronze medal at the International Physics Olympiad, worked in the steam turbine development department of ABB Alstom Power, and in the German office of McKinsey & Company. https:db.khoury.northeastern.edu/

Brendan T. O'Connor 

Brendan T. O'Connor, UMASS, Amherst

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 2, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/98102693544

Password: cstalks

Title: Social Factors in Natural Language Processing

Abstract
What can text analysis tell us about society? News, social media, and historical documents record events, beliefs, and culture. Natural language processing has the promise to quickly discover patterns and themes in large text collections. At the same time, findings from the social sciences can better inform the design of artificial intelligence. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter from geolocated public messages, through a demographically supervised model to identify AAE-like language associated with geo-located messages. We verify that this language follows well-known AAE linguistic phenomena – and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers. We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.

Bio
Brendan O'Connor http://brenocon.com is an associate professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst, who works in the intersection of computational social science and natural language processing – studying how social factors influence language technologies, and how to better understand social trends with text analysis. For example, he has investigated racial bias in NLP technologies, political events reported in news, language in Twitter, and crowdsourcing foundations of NLP. He is a recipient of the NSF CAREER and Google Faculty Research awards, has received a best paper award, and his research has been cited thousands of times and been featured in the media. At UMass Amherst, he is affiliated with the Computational Social Science Institute and Center for Data Science. His PhD was completed in 2014 from Carnegie Mellon University's Machine Learning Department, and he has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and worked in the Facebook Data Science group and at the company Crowdflower; he started studying the intersection of AI and social science in Symbolic Systems (BS/MS) at Stanford University.

Contact