COMPUTER SCIENCE DEPARTMENT

COLLOQUIUM, FALL 2020
Speaker Affiliation Date Time (ET)
Brendan O'Connor UMASS Oct 2, 2020 3:30–4:30pm
Wolfgang Gatterbauer NEU Oct 5, 2020 2:00–3:00pm
Eduard Hovy CMU Oct 9, 2020 5:00–6:00pm
Antonio Torralba MIT Oct 16, 2020 3:30–4:30pm
Jordan Boyd Graber UMD Oct 23, 2020 3:30–4:30pm
Dina Demner NIH/NLM Oct 30, 2020 3:30–4:30pm
Ted Pedersen UMN Nov 6, 2020 4:30–5:30pm
Kenneth Mandl Harvard Nov 13, 2020 3:30–4:30pm
Rongxing Lu UNB Nov 20, 2020 3:30–4:30pm
Marinka Zitnik Harvard Dec 4, 2020 2:00–3:00pm
Dan Roth UPenn Dec 10, 2020 3:30–4:30pm
Alexander Rush Cornell Dec 18, 2020 3:30–4:30pm

Upcoming Talks


Marinka Zitnik 

Marinka Zitnik, Harvard University

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 4, 2020 at 2:00–3:00pm ET

Location: https://uml.zoom.us/j/91465138533

Password: cstalks

Title: Graph Neural Networks for Biomedical Data

Abstract

The success of machine learning depends heavily on the choice of representations used for downstream tasks. Graph neural networks have emerged as a predominant choice for learning representations of networked data. Still, methods require abundant label information and focus either on nodes or entire graphs. In this talk, I describe our efforts to expand the scope and ease the applicability of graph representation learning. First, I outline SubGNN, the first subgraph neural network for learning disentangled subgraph representations. Second, I will describe G-Meta, a novel meta-learning approach for graphs. G-Meta uses subgraphs to generalize to completely new graphs and never-before-seen labels using only a handful of nodes or edges. G-Meta is theoretically justified and scales to orders of magnitude larger datasets than prior work. Finally, I will discuss applications in biology and medicine. The new methods have enabled the repurposing of drugs for new diseases, including COVID-19, where our predictions were experimentally verified in the wet laboratory. Further, the methods enabled discovering dozens of combinations of drugs safe for patients with considerably fewer unwanted side effects than today's treatments. The methods also allow for molecular phenotyping, much better than more complex algorithms. Lastly, I describe our efforts in learning actionable representations that allow users of our models to receive predictions that can be interpreted meaningfully.

Bio
Marinka Zitnik is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Blavatnik Institute, Broad Institute of MIT and Harvard, and Harvard Data Science. Dr. Zitnik is a computer scientist studying machine learning, focusing on challenges brought forward by data in science, medicine, and health. She has published extensively on representation learning, knowledge graphs, data fusion, graph ML (NeurIPS, JMLR, IEEE TPAMI, KDD, ICLR), and applications to biomedicine (Nature Methods, Nature Communications, PNAS). Her algorithms are used by major institutions, including Baylor College of Medicine, Karolinska Institute, Stanford Medical School, and Massachusetts General Hospital. Her work received several best paper, poster, and research awards from the International Society for Computational Biology. She has recently been named a Rising Star in Electrical Engineering and Computer Science (EECS) by MIT and also a Next Generation in Biomedicine by the Broad Institute, being the only young scientist who received such recognition in both EECS and Biomedicine.

Dan Roth 

Dan Roth, University of Pennsylvania

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 10, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/99389698582

Password: cstalks

Title:

Abstract

Bio

Alexander Rush 

Alexander Rush, Cornell University

Host: Hadi Amiri, UMASS, Lowell

Time: Dec 18, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/93871406559

Password: cstalks

Title:

Abstract

Bio



Previous Talks


Rongxing Lu 

Rongxing Lu, University of New Brunswick

Host: Xinwen Fu, UMASS, Lowell

Time: Nov 20, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/99726805928

Password: cstalks

Title: Privacy-Preserving Computation Offloading for Time-Series Activities Classification in eHealthcare

Abstract
The convergence of Internet of Things (IoT) and smart healthcare technologies has opened up various promising applications that can significantly improve the quality of healthcare services. Among those applications, predicting patients’ physical health based on their routine activities data collected from IoT devices is one of the most popular applications, where patients’ data are considered as time-series activities and patients’ physical health can be predicted by a classification model. Though many existing works have been exploited in this application, they either impose the computational costs of the classification on the healthcare center (e.g., hospitals) or delegate the cloud to process the classification without considering the privacy issues. However, since the healthcare center may not be powerful in computing and the cloud is not fully trusted, there is a high demand in offloading the computational cost of the healthcare center to the cloud while preserving the privacy of classification result against the cloud. Aiming at this challenge, in this work, we present a novel privacy-preserving time-series activities classification algorithm by using hidden markov model (HMM). Specifically, we first design a variant of forward algorithm of HMM and further introduce a privacy-preserving variant of forward (PPVF) protocol for the variant of forward algorithm. Then, based on the PPVF protocol, we propose our classification algorithm, which can offload the computational cost of the healthcare center to the cloud and preserve the privacy of classification result. Finally, security analysis and performance show that our proposal is not only privacy-preserving but also efficient in terms of lower computational cost.

Bio
Rongxing Lu (S’99-M’11-SM’15) is an associate professor at the Faculty of Computer Science (FCS), University of New Brunswick (UNB), Canada, since August 2016. Before that, he worked as an assistant professor at the School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), Singapore from April 2013 to August 2016. Rongxing Lu worked as a Postdoctoral Fellow at the University of Waterloo from May 2012 to April 2013. He was awarded the most prestigious “Governor General’s Gold Medal”, when he received his PhD degree from the Department of Electrical & Computer Engineering, University of Waterloo, Canada, in 2012; and won the 8th IEEE Communications Society (ComSoc) Asia Pacific (AP) Outstanding Young Researcher Award, in 2013. He is presently a senior member of IEEE Communications Society. His research interests include applied cryptography, privacy enhancing technologies, and IoT-Big Data security and privacy. He has published extensively in his areas of expertise (with citation 20,700+ and H-index 71 from Google Scholar as of November 2020), and was the recipient of 9 best (student) paper awards from some reputable journals and conferences. Currently, Dr. Lu serves as the Vice-Chair (Conferences) of IEEE ComSoc CIS-TC (Communications and Information Security Technical Committee). Dr. Lu is the Winner of 2016-17 Excellence in Teaching Award, FCS, UNB.

Kenneth Mandl 

Kenneth Mandl, Harvard University

Host: Hadi Amiri, UMASS, Lowell

Time: Nov 13, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/91745228147

Password: cstalks

Title: Parsimonious Standards for Extraordinary Outcomes: a Universal, Regulated API for Healthcare

Bio
Mandl directs the Computational Health Informatics Program at Boston Children's Hospital and is the Donald A.B. Lindberg Professor of Pediatrics and Professor of Biomedical Informatics at Harvard Medical School. His work at the intersection of population and individual health has had a unique and sustained influence on the developing field of biomedical informatics. He was a pioneer of the first personally controlled health record systems, the first participatory surveillance system, and real time biosurveillance. Mandl co-developed SMART, a widely-adopted approach to enable a health app written once to access digital data and run anywhere in the healthcare system. The 21st Century Cures Act made SMART a universal property of the healthcare system, enabling innovators to rapidly reach market-scale and patients and doctors to access data and an “app store for health.” He applies open source inventions to lead EHR research networks and is a leader of the Genomics Research and Innovation Network. Mandl was advisor to two Directors of the CDC and chaired the Board of Scientific Counselors of the NIH's National Library of Medicine. He has been elected to multiple honor societies including the American Society for Clinical Investigation, Society for Pediatric Research, American College of Medical Informatics and American Pediatric Society. He received the Presidential Early Career Award for Scientists and Engineers and the Donald A.B. Lindberg Award for Innovation in Informatics.

Ted Pedersen 

Ted Pedersen, University of Minnesota in Duluth

Host: Hadi Amiri, UMASS, Lowell

Time: Nov 6, 2020 at 4:30–5:30pm ET

Location: https://uml.zoom.us/j/94850255401

Password: cstalks

Title: Automatically Identifying Islamophobia in Social Media

Abstract
Social media continues to grow in its scope, importance, and toxicity. Hate speech is ever-present in today’s social media, and causes or contributes to dangerous situations in the real world for those it targets. Anti-Muslim bias and hatred has escalated in both public life and social media in recent years. This talk will overview a new and ongoing project in identifying Islamophobia in social media using techniques from Natural Language Processing. I will describe our methods of data collection and annotation,and discuss some of the challenges we have encountered thus far. In addition I’ll describe some of the pitfalls that exist for any effort attempting to identify hate speech (automatically or not).

Bio
Ted Pedersen is a Professor in the Department of Computer Science at the University of Minnesota, Duluth. His research interests are in Natural Language Processing and most recently are focused on computational humor and identifying hate speech. His research has previously been supported by the National Institutes of Health (NIH) and a National Science Foundation (NSF) CAREER award. More details are available at http://www.d.umn.edu/~tpederse.

Dina Demner 

Dina Demner, NIH, National Library of Medicine

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 30, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/98198637523

Password: cstalks

Title: Looking for information and answers during a pandemic

Abstract
COVID-19 caused the first ever infodemic – an avalanche of scientific publications, as well as official and unofficial communications related to the disease caused by the novel coronavirus. Most of these publications intend to inform clinicians, researchers, policy makers, and patients about the health, socio-economic, and cultural consequences of the pandemic. Leveraging this stream of information is essential for developing policies, guidelines and strategies during the pandemic, for recovery after the COVID-19 pandemic, and for designing measures to prevent recurrence of similar threats. In collaboration with the National Institute of Standards (NIST), Ai2 and UTHealth and OHSU researchers, we have developed datasets for retrieval of COVID-19 information and automatic question answering. These datasets allowed us to (1) conduct community-wide evaluations of the information retrieval and question answering systems; (2) develop novel approaches to meeting information needs as they evolve during pandemics; and (3) automatically detect misinformation. I will discuss the resources and some of the lessons learned in the five rounds of the TREC-COVID evaluation, the ongoing Epidemic Question Answering Challenge (EPIC-QA), and our approaches to detecting misinformation about COVID-19 within the TREC 2020 Misinformation track evaluation.

Bio
Dr. Dina Demner-Fushman is an Investigator at the Lister Hill National Center for Biomedical Communications, NLM, NIH. Her group studies approaches to Information Extraction for Clinical Decision Support, Clinical Data Processing, and Image and Text Indexing for Clinical Decision Support and Education. The outgrowths of this research are the evidence-based decision support system in use at the NIH Clinical Center since 2009, an image retrieval engine, Open-i, launched in 2012, and an automatic question answering service CHiQA launched in 2018. Dina Demner-Fushman is a Fellow of the American College of Medical Informatics (ACMI), an Associate Editor of the Journal of the American Medical Informatics Association (JAMIA), and a founding member of the Association for Computational Linguistics Special Interest Group on biomedical natural language processing. As the secretary of this group, she has been an essential organizer of the yearly ACL BioNLP Workshop since 2007. Dr. Demner-Fushman has received sixteen staff recognition and special act NLM awards since 2002. She is a recipient of the 2012 NIH Award of Merit, a 2013 NLM Regents Award for Scholarship or Technical Achievement and a 2014 NIH Office of the Director Honor Award.

Jordan Boyd Graber 

Jordan Boyd Graber, University of Maryland, College Park

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 23, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/94630726928

Password: cstalks

Title: Artificial intelligence isn't a game show (but it should be)

Abstract
Artificial intelligence is viewed as a goal in science (let's build intelligent machines) and in education (let's train software engineers to build smart assistants). Despite the serious implications for the economy and society, the most widely-accepted view of the end goal of Artificial Intelligence is a parlor game: a trivial “imitation game” (known today as the Turing Test). Likewise, many of the watersheds in the public understanding of AI progress have been in frivolous games like chess or go. Sometimes, they're a literal game show like Jeopardy! After discussing why existing game show exhibitions have given an inaccurate impression of how well we're doing with question answering, I'll discuss how we can use the skills and strategies of high school trivia competitions to improve the science of AI, communicate the limitations of AI, and to broaden participation in computer science and artificial intelligence.

Bio
Jordan Boyd-Graber is an associate professor in the University of Maryland's Computer Science Department, iSchool, UMIACS, and Language Science Center. Jordan's research focus is in applying machine learning and Bayesian probabilistic models to problems that help us better understand social interaction or the human cognitive process. He and his students have won “best of” awards at NIPS (2009, 2015), NAACL (2016), and CoNLL (2015), and Jordan won the British Computing Society's 2015 Karen Spärk Jones Award and a 2017 NSF CAREER award. His research has been funded by DARPA, IARPA, NSF, NCSES, ARL, NIH, and Lockheed Martin and has been featured by CNN, Huffington Post, New York Magazine, and the Wall Street Journal.

Antonio Torralba 

Antonio Torralba, Massachusetts Institute of Technology (MIT)

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 16, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/93071258047

Password: cstalks

Title: Learning from vision, touch and audition

Abstract
Babies learn with very little supervision, and, even when supervision is present, it comes in the form of an unknown spoken language that also needs to be learned. How can kids make sense of the world? In this talk, I will talk about several ways in which one can discover meaningful representations without requiring manually annotated data. I will show that an agent that has access to multimodal data (like vision, audition or touch) can use the correlation between images and sounds to discover objects in the world without supervision. I will show that ambient sounds can be used as a supervisory signal for learning to see and vice versa (the sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings). I will describe an approach that learns, by watching videos without annotations, to locate image regions that produce sounds, and to separate the input sounds into a set of components that represents the sound from each pixel. I will also discuss our recent work on capturing tactile information. I will also show how Generative Adversarial Networks (GANs) can learn meaningful internal representations without supervision.

Bio
Antonio Torralba is the Thomas and Gerd Perkins Professor and head of the AI+D faculty at the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology (MIT). From 2017 to 2020, he was the MIT director of the MIT-IBM Watson AI Lab, and, from 2018 to 2020, the inaugural director of the MIT Quest for Intelligence, a MIT campus-wide initiative to discover the foundations of intelligence. He is also member of CSAIL and the Center for Brains, Minds and Machines. He received the degree in telecommunications engineering from Telecom BCN, Spain, in 1994 and the Ph.D. degree in signal, image, and speech processing from the Institut National Polytechnique de Grenoble, France, in 2000. From 2000 to 2005, he spent postdoctoral training at the Brain and Cognitive Science Department and the Computer Science and Artificial Intelligence Laboratory, MIT, where he is now a professor. Prof. Torralba is an Associate Editor of the International Journal in Computer Vision, and has served as program chair for the Computer Vision and Pattern Recognition conference in 2015. He received the 2008 National Science Foundation (NSF) Career award, the best student paper award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in 2009, and the 2010 J. K. Aggarwal Prize from the International Association for Pattern Recognition (IAPR). In 2017, he received the Frank Quick Faculty Research Innovation Fellowship and the Louis D. Smullin (’39) Award for Teaching Excellence. And the 2020 PAMI Mark Everingham Prize.

Eduard Hovy 

Eduard Hovy, Carnegie Mellon University (CMU)

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 9, 2020 at 5:00–6:00pm ET

Location: https://uml.zoom.us/j/92887951351

Password: cstalks

Title: From Simple to Complex QA

Abstract
Recent automated QA system achieve some strong results using a variety of techniques. How do complex/deep/neural QA approaches differ from simple/shallow ones? In early QA, pattern-learning and -matching techniques identified the appropriate factoid answer(s). In deep QA, neural architectures learn and apply more-flexible generalized word/type-sequence 'patterns'. However, many QA tasks require some sort of intermediate reasoning or other inference procedures that go beyond generalized patterns of words and phrases. One approach focuses on learning small access functions to locate the answer in structured resources like tables or databases. But much (or most) online information is not structured, and what to do in this case is unclear. Most current 'deep' QA research takes a one-size-fits-all approach based on the hope that a multi-layer neural architecture will somehow learn to encode inference steps automatically. The main problem facing this approach is the difficulty in determining exactly what reasoning is required, and what knowledge resources are needed in support. How should the QA community address this challenge? In this talk I outline the problem, define four levels of QA, and propose a general direction for future research.

Bio
Eduard Hovy is a research professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He also holds adjunct professorships in CMU's Machine Learning Department and at USC (Los Angeles) and BUPT (Beijing). Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987, and was awarded honorary doctorates from the National Distance Education University (UNED) in Madrid in 2013 and the University of Antwerp in 2015. He is one of the initial 17 Fellows of the Association for Computational Linguistics (ACL) and is also a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). Dr. Hovy’s research focuses on computational semantics of language, and addresses various areas in Natural Language Processing and Data Analytics, including in-depth machine reading of text, information extraction, automated text summarization, question answering, the semi-automated construction of large lexicons and ontologies, and machine translation. In late 2019 his Google h-index was 80, with over 30,000 citations. Dr. Hovy is the author or co-editor of six books and over 400 technical articles and is a popular invited speaker. From 2003 to 2015 he was co-Director of Research for the Department of Homeland Security’s Center of Excellence for Command, Control, and Interoperability Data Analytics, a distributed cooperation of 17 universities. In 2001 Dr. Hovy served as President of the international Association of Computational Linguistics (ACL), in 2001–03 as President of the International Association of Machine Translation (IAMT), and in 2010–11 as President of the Digital Government Society (DGS). Dr. Hovy regularly co-teaches Ph.D.-level courses and has served on Advisory and Review Boards for both research institutes and funding organizations in Germany, Italy, Netherlands, Ireland, Singapore, and the USA.

Wolfgang Gatterbauer 

Wolfgang Gatterbauer, Northeastern University

Host: Tingjian Ge, UMASS, Lowell

Time: Oct 5, 2020 at 2:00–3:00pm ET

Location: https://uml.zoom.us/j/95441518174

Password:

Title: Algebraic Amplification for Semi-Supervised Learning from Sparse Data

Abstract
Node classification is an important problem in graph data management. It is commonly solved by various label propagation methods that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on knowing the compatibility matrix that must be provided by either domain experts or heuristics. Can we instead directly estimate the correct compatibilities from a sparsely labeled graph in a principled and scalable way? We answer this question affirmatively and suggest a method called distant compatibility estimation that works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph sketches. We refer to algebraic amplification as the more general idea of leveraging algebraic properties of an algorithm's update equations to amplify sparse signals. We show that our estimator is by orders of magnitude faster than an alternative approach and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap preprocessing step for any existing label propagation method and removes the current dependence on heuristics.
VLDB 2015: Linearized and single-pass belief propagation link1, link2, link2
SIGMOD 2020: Factorized Graph Representations for Semi-Supervised Learning from Sparse Data link1, link2, link3
CODE https://github.com/northeastern-datalab/factorized-graphs/

Bio
Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. Prior to joining Northeastern, he was a postdoctoral fellow in the database group at the University of Washington and an Assistant Professor in the Tepper School of Business at Carnegie Mellon University. One major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing trust, provenance, explanations, and uncertain & inconsistent data. He is a recipient of the NSF Career award and “best-of-conference” mentions from VLDB 2015, SIGMOD 2017, and WALCOLM 2017. In earlier times, he won a Bronze medal at the International Physics Olympiad, worked in the steam turbine development department of ABB Alstom Power, and in the German office of McKinsey & Company. https:db.khoury.northeastern.edu/

Brendan T. O'Connor 

Brendan T. O'Connor, UMASS, Amherst

Host: Hadi Amiri, UMASS, Lowell

Time: Oct 2, 2020 at 3:30–4:30pm ET

Location: https://uml.zoom.us/j/98102693544

Password: cstalks

Title: Social Factors in Natural Language Processing

Abstract
What can text analysis tell us about society? News, social media, and historical documents record events, beliefs, and culture. Natural language processing has the promise to quickly discover patterns and themes in large text collections. At the same time, findings from the social sciences can better inform the design of artificial intelligence. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter from geolocated public messages, through a demographically supervised model to identify AAE-like language associated with geo-located messages. We verify that this language follows well-known AAE linguistic phenomena – and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers. We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.

Bio
Brendan O'Connor http://brenocon.com is an associate professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst, who works in the intersection of computational social science and natural language processing – studying how social factors influence language technologies, and how to better understand social trends with text analysis. For example, he has investigated racial bias in NLP technologies, political events reported in news, language in Twitter, and crowdsourcing foundations of NLP. He is a recipient of the NSF CAREER and Google Faculty Research awards, has received a best paper award, and his research has been cited thousands of times and been featured in the media. At UMass Amherst, he is affiliated with the Computational Social Science Institute and Center for Data Science. His PhD was completed in 2014 from Carnegie Mellon University's Machine Learning Department, and he has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and worked in the Facebook Data Science group and at the company Crowdflower; he started studying the intersection of AI and social science in Symbolic Systems (BS/MS) at Stanford University.

Contact