Selected Presentations

2017

Geometries of Word Embeddings,

Abstract: Real-valued word vectors have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities via simple geometrical operations. In this talk, we demonstrate further striking geometrical properties of the word vectors. First we show that a very simple, and yet counter-intuitive, postprocessing technique, which makes the vectors "more isotropic", renders off-the-shelf vectors even stronger. Second, we show that a sentence containing a target word is well represented by a low rank subspace; subspaces associated with a particular sense of the target word tend to intersect over a line (one-dimensional subspace). We harness this Grassmannian geometry to disambiguate (in an unsupervised way) multiple senses of words, specifically so on the most promiscuously polysemous of all words: prepositions. A surprising finding is that rare senses, including idiomatic/sarcastic/metaphorical usages, are efficiently captured. Our algorithms are all unsupervised and rely on no linguistic resources; we validate them by presenting new state-of-the-art results on a variety of multilingual benchmark datasets.
Talks at: MIT (April, 2017), Berkeley, Stanford and Princeton (May 2017) and EPFL (June 2017).
References:
1. Geometry of Compositionality, AAAI '17, https://arxiv.org/abs/1611.09799
2. Geometry of Polysemy, ICLR, '17, https://arxiv.org/abs/1610.07569
3. Representing Sentences as Low-rank subspaces, ACL '17, https://arxiv.org/abs/1704.05358
4. Prepositions in Context, preprint, https://arxiv.org/abs/1702.01466

  
Anonymity in the Bitcoin Peer-to-Peer Network,

Abstract: Bitcoin enjoys a public perception of being a privacy-preserving financial system. In reality, Bitcoin has a number of privacy vulnerabilities, including the well-studied fact that transactions can be linked through the public blockchain. More recently, researchers have demonstrated deanonymization attacks that exploit a lower-layer weakness: the Bitcoin peer-to-peer (P2P) networking stack. In particular, the P2P network currently forwards content in a structured way that allows observers to deanonymize users by linking their transactions to the originating IP addresses. In this work, we first demonstrate that current protocols exhibit poor anonymity guarantees, both theoretically and in practice. Then, we consider a first-principles redesign of the P2P network, with the goal of providing strong, provable anonymity guarantees. We propose a simple networking policy called Dandelion, which achieves nearly-optimal anonymity guarantees at minimal cost to the network's utility. We present a practical implementation of Dandelion, whose source code is available at https://github.com/gfanti/bitcoin.
References:
1. Dandelion: Redesigning the Bitcoin Network for Anonymity, Sigmetrics '17, https://arxiv.org/abs/1701.04439
2. Anonymity Properties of the Bitcoin P2P Network, preprint, https://arxiv.org/abs/1703.08761

  
Tutorial: Information Limits on Finding and Hiding Message Sources on Networks: Social Media and Cryptocurrencies,

Abstract: Broadcasting is a common theme in networks, from the spread of diseases to the propagation of viral tweets. There is a long history of modeling and analyzing the spread of epidemics, where network and spreading models are typically derived from the real world. Problems of interest include analyzing the speed of contagion, computing a disease’s probability of extinction, and identifying the patient zero for an epidemic. Renewed interest has emerged in this topic with the advent of online networks like social media and cryptocurrencies, which ask users to broadcast content over a computer network. However, these new applications differ from epidemics in one key respect: they require the source of broadcasted messages to remain anonymous. Reasons for anonymity range from freedom of speech to financial privacy. With this added constraint comes a key advantage: network designers are often able to control the network structure and/or propagation models in computer networks. This has led to a complementary line of work that attempts to design networking stacks that inherently protect the anonymity of a message source.
In this tutorial, we present recent results on source identification and hiding in the context of applications that broadcast information over networks. We discuss models and results studied by the information theory, physics, and computer science communities. We then provide a gentle introduction to methods for source identification, including the theoretical tools needed to study these problems. We conclude by discussing recent methods for designing networks that inherently hide the source of a message. The tutorial will bring together tools from different areas like branching processes, measure concentration, and extreme value theory, all in the context of cutting-edge technologies like cryptocurrencies and anonymous social media.
Tutorial at IEEE International Symposium on Information Theory (ISIT) 2017.