Muriello on security, culture at facebook
"One type of attack that doesn't fall into either of those is something we call 'likejacking', or 'clickjacking' where you're sent to some random site, sexybartenders.com or something like that." Dan Muriello on security, culture, and technology at facebook
David: The date today is May 31st. We are at Facebook headquarters in Menlo Park, California and our guest today is Dan Muriello. Dan, how are you doing?
Dan Muriello: Doing pretty good, David. How you are doing?
David: Doing really well. So, Dan and I met when we were undergraduates at the University of Illinois, what were the exact circumstances, I think we were taking a class together, right?
Dan: Yeah, I believe we took CS 173 together my sophomore years.
David: Yeah exactly and so, the first question that we always ask people is like, how did you get into engineering as a kid, what was about it that let you down this path initially?
Dan: I think, my mom was a big early adopter. We had a cable modem in our house since they were first available, and my mom likes to say that she bought her first Apple computer the day or the week that I was born.
David: The week you were born, OK.
Dan: Yes. So back in '86.
Dan: I'm dating myself.
David: So you had a...Yeah, so you were born in '86, you said. We've interviewed some people that're older, in their 30s, you know and they have all these varied stories about lasers and all kinds of other, kind of random things like that but, yeah, so, it was in your family, obviously, you had a Mac and a cable modem. I started with a 14.4 modem back in the day, a long time ago but how did you, like what about you in particular inspired you to get into computer science. I think that's what you studied in college, right?
Dan: I started actually as a computer engineer. I didn't really know what computer science or computer engineering was, when I got into it, to be quite frank. In high school, I worked for the IT department, so I did lots of troubleshooting stuff. I thought that that would launch me on my career in tech and realized eventually that IT is very different from computer engineering, which is very different from computer science.
Ravi: So at the time you joined college, you thought computer science was pretty much doing IT work?
Dan: Not really, but sort of. I was not really in the know. But at UIUC you get in the know really fast. You have to. It's definitely a sink or swim environment at UIUC. It's a very, very difficult program.
David: You did this from a really young age. You went to Oak Park River Forest I think? Or which high school?
David: They must have supported you pretty well. You said you started off in computer engineering. Did the chips repel you, or what happened?
Dan: Most of the reason why I ended up switching to computer science was some of the subtleties in the culture between the two programs at UIUC. The computer engineering department had been around for a longer period of time, they had professors who were not quite as new.
David: They're older on the whole, I would say.
Dan: They're older on the whole. That had something to do with it, but the main thing was that in the computer engineering program, it seemed like students were heavily incentivized not to help one another. The curves were very tough, and it was the, "If you have to ask, you'll never know, no you can't work in my study group," sort of feeling. At least that's what I had.
David: Did you attribute that to the maturity of the field, do you think?
Dan: You know, I don't really know. But what I can say is that my first computer science classes, the culture was totally different. Everybody wanted to help out everybody else. Things were very difficult and tricky and it just ended up feeling a lot more warm and supportive. The professors were much younger, those at the just new CS building. Everything was sort of new. And everything was sort of comforting.
Ravi: The technologies you learned were newer, as well, since the professors who were teaching you are younger, were more in the know with what's going on...
Dan: Yeah, the CS department at UIUC I think is pretty cutting edge.
David: Yeah, OK. So, did you do any internships then, when you were there?
Dan: No, actually I didn't. When I was an undergrad, this is also on the theme of me not really knowing what I'm doing, at least historically. Definitely historically. [laughter]
I was sort of against the whole idea of going into industry. I didn't really want to work for the man, I didn't want to work on a deadline, I didn't want to be part of that culture. When I was an undergrad, I was constantly thinking of going to grad school, getting a Ph.D., becoming a professor, because I really admired my professors when I was an undergrad.
So I didn't do any internships and I focused on summer research programs. I did a few of those with Professor Dan Roth, mostly in the fields of artificial intelligence and information extraction and machine learning.
Ravi: I see. And did you end up working for Dan Roth? I think you did, for Professor Roth.
Dan: Yes. The year after graduation, I worked for Dan Roth as a research programmer in his cognitive computation group.
The project I was working on was called "People Search" and it was this sort of information extraction, web-crawling sort of project where you take the university phone book and you take a little bit of information that you have about a specific person and you go try to crawl the web, try to classify what pages, classify information, bring it together, sort of automatically construct this corpus of knowledge about a person, sort of going from a little bit to a lot, a useful set of information about people who are at the university.
So for people who are grad students and professors, useful information includes not just what classes they teach or have taught and materials from those classes, but also papers they have written, talks they have given or are going to give, just trying to construct this dataset from nothing.
David: I see. Like I said, coming after classification, you use some machine-learning techniques in this, it sounds like. Right? So, did you some kind of clustering, or how did it work?
Dan: That's a good question.
David: I imagine it's been a while.
Dan: It's been a while, and the project didn't really get to the point where we were really doing the nitty-gritty in getting it good. I did use this framework which was popular in that group, called learning-based Java. What was I using at the time? You know what, I can't even remember.
David: I see. But the basic process of this was that, did you spider things?
Dan: Yeah. Yeah. I bootstrapped off of Google. So, given somebody's phone book entry, I generated a list of queries I wanted to do about that person.
Using transformations of their name—Dan, Danny, Daniel, stuff like that—injecting certain subsets of things that I already knew about the person. Like the fact that they work at University of Illinois, the fact that they're in this department or that, but they used this email or phone numbers, even. So, crawl the Google results, which was a challenge in and of itself, because they don't really want you to do that. [laughs]
David: I see. So, like using wget to spider Google's results, essentially?
Dan: Yes. Yes. And then from there, packaging up those Web...getting those documents down, and then putting them together and extracting information from them in an interesting way.
But really, to be honest, the project sort of stopped at the point of, "now I have these documents, and I figured out which ones I care about". Actually getting information out of the documents was still a TBD.
David: It's a lot harder. So, you could find the pages, but in terms of understanding the semantics of the pages, that was really hard, you're saying?
Dan: Yeah. That was definitely going to be very hard. But that was the direction that the project would have gone. Interesting, this is kind of ancient history for me.
David: Yeah. You probably haven't talked about it in a while. Is it still in use, do you know?
Dan: Oh, no. It never launched.
David: OK, it never launched. So what was the merit of it, in terms of the scientific or research value?
Dan: The idea of demonstrating that you can programmatically, using the Internet and existing search technologies, to go from a little bit of information about a person to a lot of information about a person, was very valuable.
David: In a completely automated way, you're saying?
Dan: In a completely automated way.
David: I see. And when was this done, this was 2008?
Dan: This was 2008.
David: OK. So, I'm just trying to put this in the perspective of what else had been out then. Facebook had launched then, so there was some, it wasn't as popular as it is now, or as ubiquitous.
Ravi: What about Wolfram, did they do something like this?
David: Wolfram Alpha, I think? That was much later, wasn't it? I think that was 2010, maybe. Something like that.
David: So, you worked for a while for Professor Dan Roth, you said. What's his field?
Dan: Professor Roth, his field is artificial intelligence and machine learning. And specifically, natural-language processing.
David: OK. So did you do any natural-language processing in this project?
Dan: That was where it would have ended up but no.
David: So you didn't actually do any natural-language processing. That's unfortunate. OK.
Dan: Yeah. Yeah. I mean, I used out-of-the-box things like named entity recognizers. But I wasn't pushing the cutting edge of that sort of stuff. My project was more of an applied, more of a, what can you do that's useful and new with existing technologies.
David: I see, like an experience report, maybe. So, you're at Facebook now, but you joined Facebook directly out of that research group. So, what was the process there? How did they find you?
Dan: So, I found them.
So after a year working in that group on this project, I decided to see what opportunities were open. So I applied to a few different grad schools, and I applied to a few different companies.
David: Ph.D. programs, I imagine?
Dan: Yeah, Ph.D. programs. I was definitely cold-calling companies, more or less, just submitting my resume to /careers. Facebook, Google, Apple, Microsoft, Amazon, Yahoo.
Ravi: The usual suspects.
Dan: The usual suspects.
David: A lot of these are west coast companies, so did you think you were going to move at the time?
Dan: Yeah. I was up for moving. I had been in Champaign-Urbana for five years, and I would have been happy staying there. I still have lots of really wonderful memories about the area, but it kind of felt like time to move on to the next big thing.
David: Try something new. You talked about joining Facebook. This was 2009 when you were coming on?
Dan: August 17, 2009 was my first day.
David: Was your first day. What was the interview process like for that whole thing?
Dan: The interview process ended up being three phone interviews, I should back up a second. A couple of emails back and forth with the recruiter. A phone call with the recruiter to make sure that I wasn't a serial killer. [laughter]
A couple of technical phone interviews, another technical phone interview and then they flew me out to 1601 California in Palo Alto which they had just moved into at the time.
David: This was the first Palo Alto office, or that was the second one?
Dan: Well, first is a nebulous term, because they started in one office in downtown and then they ended up having many offices in downtown Palo Alto, In different buildings, and then they consolidated into 1601 California Avenue, and then from there, it sort of expanded into several neighboring buildings. And then now we're over here in Menlo Park.
David: Was 1601 the big warehouse that you were talking about?
Dan: Yes. 1601 South California Avenue is the building that used to be an HP manufacturing building.
David: Palantir used to be in that building as well, right?
Dan: I don't think so. Palantir has been in the place that they are on at Forest and Loma for a while now.
David: Maybe I'm mistaken about that.
Dan: I don't know who was in 1601 California before Facebook. It's just this big warehouse building that was just full of computer programmers.
David: I see. You said three phone interviews and then you had a flyout?
Dan: Yes. Then I had a flyout.
Ravi: Did Facebook not come to the Illinois campus at that time?
Dan: They had. I don't think they had that particular year. And now they do all the time. As I said earlier, I was not going to career fairs.
Ravi: Oh, I see.
Dan: I wasn't really interested in playing the game, going to internships and things like that.
Ravi: You were going to work on your Ph.D.
Dan: I just wanted to be an academic. Part of the whole reason I steered away from academia was through the experience of this year of doing programming for a research project, I found that what was more satisfying for me was building something that was useful, and building something that you could touch.
David: Not just coming with an idea but actually shipping something.
Dan: Yes, actually shipping something.
Ravi: Having customers use it.
Dan: Putting something together. Having people use it, having it be real. Spending weeks reading papers to write a few lines of code to try to push the cutting edge a millimeter further, I think is satisfying for a lot of people, but I found that, doing these things at the same time, that actually building was more satisfying for me.
David: And that's interesting. We've talked to a number of people that work in academia or have spent time in graduate school. It certainly is a spectrum with one end being pure implementation, and the other being pure research. People do move back and forth during their careers, though, maybe that is something you can do later.
Dan: Yes, you know. It's kind of tough to think about what I will be doing in 10 years.
David: I see. You are still pretty young. So am I. When you got to the company you said that it was a "giant warehouse full of computer programmers". Tell me a little about that. That sounds really exciting. People at tables running around, what happened?
Dan: It was just really huge open spaces.
David: No offices, right?
Dan: There were a few offices for people who really didn't feel like they were productive in an open air environment.
Ravi: Conference rooms.
Dan: Conference rooms to meet and stuff and like that, but just, like, really big open spaces, you could throw a football and not hit a wall. That contributed to a feeling of togetherness, a feeling of camaraderie or something like that.
We still try to have as open of spaces as possible here in Menlo Park, but simply because we're moving into buildings that were built before us, I mean we knocked down tons of walls, but it's not quite the same as it used to be. We have as much of that as we can muster.
David: I see. I remember in the old space there were flags hanging from the ceiling I think, for all the different countries?
Dan: Yeah, that was for, the growth team had that, and they still have that actually, just a floor above us.
Ravi: I noticed that through the window actually when I was driving by.
Dan: Yeah, the growth team, I mean they're excellent about keeping a global mindset.
David: I see. As a regards to languages, font displays I imagine, and other things too, right?
Dan: I mean, these days, the big story in growth is mobile.
David: Obviously, right.
Dan: Yeah. Yeah, and especially internationally, because in places, in a lot of countries in the world, people have much easier access to a mobile phone, not an iPhone but more of a feature phone, than they do of a desktop computer.
David: So you said big open warehouse space. How many people were at the company when you joined? I think probably about 800 maybe?
Dan: Maybe a little bit more, maybe around 1,200. There were 300 engineers, was really the figure that I remember from when I joined. I was 305, something like that.
David: Something like that. OK. Sure. Did they have the boot camp at the time when you joined?
Dan: Oh yeah, yeah. Definitely. Definitely went through boot camp with Boz as my boot camp manager at the time.
David: Who's Boz?
Dan: Oh Boz. He is now a director of engineering. He was a very early Facebook engineer who worked on News Feed. He was sort of, one of the guys that did News Feed right when it was first done.
David: I see. So what did you do in boot camp? Like was it hard...
Ravi: Tell us a bit about it because we've heard it's incredibly hard and sometimes people drop out of it.
Dan: Yes. Yes. That is very true. Not very many, but you can only tell so much from a technical interview. Reversing linked lists is great and all, but that's not where the rubber hits the road. The rubber hits the road at boot camp where you get dropped into this really gigantic code base -- unless you're coming from some other place with a gigantic code base which I was not. You just have to pick up skills like, finding the thing you're looking for.
Ravi: Now, do all engineers go through it? Like whether they're experienced hires, or coming straight out of college?
Dan: Absolutely. Absolutely. We've recently started putting people who are not going to be engineers through boot camp. If they're going to be working with engineers a lot or if they're going to be doing a lot of log analysis, it ends up being very useful to see the code that's writing that log line, to know exactly what it means.
David: I see. So it sounds like a bit of a probationary period.
Dan: I wouldn't put it that way. Boot camp is way to get ramped up really fast. The only way to like, really start working at Facebook is to hit the ground running. I think for some people it doesn't work out because there's something about the pace, there's something about the rapidity, the rapidness of the experience.
Ravi: Do you guys actually do actually work with production code at boot camp or is it more...
Ravi: Oh OK. So it's not like you're learning C++, or PHP or something. You're actually saying, all right here are the tools you have. You have Google with you, bBuild us the feature for this particular...
Dan: When you're in boot camp, you are sitting in an area of people who are in boot camp.
David: How many people, 10?
Dan: These days we've got boot camp classes of 40 or 50, which is intense.
David: It's six weeks, right?
Dan: Yeah, it's six weeks. I mean, I learned PHP while I was in boot camp. Now, I write in PHP all the time, like, I had not written a line of it. I just learned off of php.net tutorials and the guy sitting next to me.
David: [laughs] That's fun. The other thing I wanted to ask about was when you join the company a lot of the culture that you always hear about from the outside, sounds like it was already in place. In particular I remember Mike Schroepfer, Schroep I guess you call him, coming to Seattle in 2009 I think for the engineering roadshow. He talked about a lot of this stuff and I really admired it. In particular the whole "done is better than perfect" thing, Ravi, you saw that on the way in on a iPad holder. It's on the walls around here.
Ravi: It's the mantra here. I think everyone is...
Dan: Yeah. That, I think, comes out more out of just philosophies around how to write software. How we need to write our software in order to make it good.
David: It's important. I guess on that topic of scale...so infrastructure. Let's talk about infrastructure because we're at a company that is probably second or first biggest in the world in terms of the deployed infrastructure that it has. Just for the listeners to put this in perspective.
There are a lot of very hard problems here. Let's say you have hundreds of millions of users. Maybe 500 million, maybe 800 million, however many hundreds of millions of users you have. So how do you figure out, given two people, who their mutual friends are? Because that's something you do in advance, right?
Dan: No, you just fetch the two friends' lists and intersect them.
David: OK, so is this done in real time then?
Ravi: The question is doing it fast, right?
Dan: The question is doing it fast. I believe we might have a couple of fun optimizations that I don't know. But it would be fast enough to just fetch a user's friend list in real time and intersect them.
David: So let's say there are 5,000 people on this list. So you're saying just to do the set intersection, just to sort them and then just merge them? How does that work?
Dan: Well, the thing is, if you use them as keys in a PHP array, you end up with two hash sets. So you don't end up really needing to sort them explicitly. You're not comparing two arrays, you're comparing...
David: Two associative arrays.
Dan: ...two associated arrays.
David: I see.
Dan: And one of the key reasons it ends up not being very much data to throw around is that we do a lot of lazy loading of things. Like when I say "get your friend list" in our PHP code, we're not getting the names of all of your friends. We're not getting the birthdays of all of your friends, we're not getting their genders, we're not getting their profile pic at all.
Ravi: We're just getting some sort of IDs.
Dan: We're just getting their user IDs. That's all we're getting.
David: I see. So you're doing exactly as much work as necessary. But even still, how long does this take to load a friend list, like 50 milliseconds or 100 or 120?
Dan: Oh much, much, much, much less than that. That's actually probably a good question for somebody else other than me.
David: So the other thing was data integrity. You work in site integrity, so can you tell us a little bit about that, like, what is it?
Dan: Sure. So site integrity is a team that's responsibility is maintaining and enhancing the trust between a user of Facebook and Facebook as a system. That's all about fighting spam, that's all about keeping users in control of their accounts and keeping control of the user's account out of the hands of a hacker.
And so we detect and respond to spam attacks. We also detect and respond to phishing attacks, malware attacks. And we also maintain and push forward the user-facing security features, like login notifications, where you get a notification whenever a new device logs into your account, login approvals which is our two-factor authentication system, as well as the login system in general, and the password reset system which is a very, very critical piece of code.
David: Let's talk about one of those. The password reset system, what's complicated about this?
Dan: The password reset system ends up being a handshake of several different things. It's sort of a series of handshakes, and at each step we're trying to protect different things. The very first part of the password reset process is identifying what account you want to perform the password reset on, and so we provide a few pieces of search functionality. You can search by email, by phone, for example, dan.muriello, my vanity URL, or with your name and a friends name, if you can't remember the email you had on your account, you didn't have a phone on your account, you never set up a vanity url, and you don't have a user name.
And so at this point, what we're trying to protect is the mapping between an email address, and the fact that there is a facebook account for that email address; this is something we do not want to leak en masse.
David: Just to recap it's the fact that there's a Facebook account for that email address, so even that piece of information is considered confidential, is what you're saying.
Dan: Not confidential exactly, but something that we don't want other people to scrape.
David: Oh, I see.
Dan: I mean it's something that we obviously are revealing to the user who's doing the password reset.
Ravi: That, hey, there is, this email account is valid, for Facebook.
Dan: Right, so we need to let people use that search functionality, but we need to have brute force protections in place and we need to have, and so we use a sort of step-up system where we start throwing captchas at you and then worse like actual blocks...
Ravi: I remember going into one of Facebook's captchas actually when I had forgotten my password and one of those is identify your friends.
Dan: That's actually not something we use for the password reset system. That is part of a couple of our other flows, mainly our anti-phishing flows. So if you're coming in from a totally different place that you've never been to before, and you're using a device that you've never used before...
Ravi: Or made from a different unrated IP address from another country.
Dan: Or maybe your ISP shifted around their IP blocks last night, I mean, and you're on a new computer because we sort of keep track of a bit of device history. So if you take your laptop on vacation, we'll hold back. But basically one of the ways we let you prove that you are who you say you are in that flow, is passing something that we call "friend photo captcha" where we show you photos of your friends and we put a little box around somebody's face and we say, "Who is that"? And we give you six choices, of the same gender, obviously, then you go and say "Oh, that's Dave and that's Ravi and that's Dan".
David: Right, so I guess more generally, what types of threats do you see against the site? Obviously, having a third party take over an account is an obvious one, but what are some other ones?
Dan: Fake accounts are a big problem.
David: So, somebody wants to create multiple identities on Facebook?
Dan: I mean people who want to spam. It depends on exactly what they're trying to do. It might be slightly easier for them to make a new account and send friend request to people as that account.
David: I see. So you protect against that by how, captchas?
Dan: So we have a complex and in-depth classification service which happens at this point of registration, and reruns at the point of friend request time.
And this looks at a lot of different signals and tries to determine whether or not the account is fake. And we always use a mixture of handwritten rules by looking, like actual people and engineers and user operations analysts, looking at tons of fake accounts and divining patterns, and writing rules to detect those patterns and automated machine learning classifiers.
David: I see. What else? So fake accounts, taking control of accounts, what about friend spam?
Dan: Friend spam. That was probably one of the biggest problems on the site a few years ago.
David: Which is amazing because it's almost invisible today; I don't get any friend spam at all on facebook.
Dan: You're welcome. [laughter]
David: Thank you, Dan.
Ravi: On the contrary, I've gotten friend spam on other social networks.
Dan: Yeah, friend spam is one of the biggest things that site integrity does.
David: To be clear, this is when someone adds you as a friend, that you don't know?
Dan: Yes. A few years ago, we simply did not block any friend request other than by rate-limiting, and the rate was not very small.
David: So somebody did more than a certain number...
Dan: So we didn't look at any statistics about who are you, who is this user you're sending a friend request to, do we think that there's some sort of link or have you been sending a lot of friend requests that have been rejected, things like this. So we keep statistics about how often a given user's friend requests are hidden by the "not now" button, how often they're sort of, marked as spam by the user clicking "no" to the follow up question of do you know this person Outside of Facebook.
So now we keep statistics, and now we have a very nuanced system of user education. The very first thing that'll start happening if you're sending bad friend requests, "bad" meaning to people that you don't know, we'll start throwing up a dialog right after you click the add friend button saying, "Just a reminder Facebook is a place to connect with the people you know in real life. Do you really want to send this friend request?" And you can say yes.
David: Right, because people might not realize that.
Dan: Yeah. I think a lot of people might not realize that. That's a big part of this user education that we do is just reminding people of that, reinforcing that. If you click through several of these dialogues, warnings we call them, and those friend requests are still reacted to negatively, then we'll temporarily kick you out of your account and then when you log back in, we'll take you through something called checkpoint flow.
We'll remind you you've been sending bad friend requests, a little bit more content, and sort of a reiteration. As you become a repeat offender, we lock you out of sending friend requests for gradually telescoping periods of time. At first, you would be locked out for a day and maybe next one would be three or four days, then seven days, then 30 days.
David: There are other threats, too. What about spidering the site?
Dan: Scraping, yes.
David: I'm some guy and I want to collect a bunch of marketing data so maybe I log into the site or obtain a credential or token somehow, and I want to download all of my friends' information. What about that?
Dan: We have scraping protection, now, down at the point of data access; logged in scraping protection, as well as logged out scraping detection. So you can't do that. [laughter] We keep counters in cache that increase every time you look at somebody's profile, increase every time you perform a search.
David: Message spam, obviously, is another thing, instant message spam or messages spam.
Dan: Chat spam?
Dan: That's definitely one of the places where we've got people fighting attacks over that channel very actively.
David: There are lots of things you could do maliciously over that. Try to get money from your friends, or try to solicit commercial interests, all kinds of things.
Dan: Also there's what we call a "419 attack" which is another word for advance fee fraud.
David: The Nigerian 419 scam.
Dan: The Nigerian 419 scam. In this case, someone phishes your login credentials.
David: I was just telling my mother about this the other day, actually, the existence of this scam. It wasn't obvious to her immediately that any of this stuff would be possible let alone done, in scale.
Dan: Yeah. I read an article not too long ago. We should give the listeners a little bit of background about exactly what this is. Maybe, perhaps, we should ground this in the specific context of facebook. Imagine one of your friends sends you a chat messages saying I was on vacation in London, I lost my wallet, I got mugged, thank goodness I still have my passport, I just need money for a plane ticket home.
Ravi: This is your friend telling you?
Dan: This is your friend telling you. Imagine someone has stolen your friend's login credentials and logged in as your friend and now is chatting with you. They say things like, "I'm at an Internet cafe in Devonshire or something like this, and I really just need you to wire me some money" -- $400, $500, sometimes thousands of dollars. This is one of your friends talking to you so there is a lot of trust there.
This actually does happen at a bit of scale, enough, the thing is it's not massive because there actually is another person on the other end of the line chatting with you; it's not a program. Yet, the damage to a user who falls for this is very high.
David: There's a lot of damage to the brand and the company if something happens, right?
Dan: Extreme amounts of damage. If you lose a couple grand because of this, that's terrible! Really, really bad. This is an attack that we fight through a lot of different ways. Like I said before, typically when we fight attacks like this it's a combination of rules and machine learning classifiers.
David: I see. Is it code that's on the other side of these attacks a lot of times, or is it mostly manual efforts? What do you find?
Dan: It's manual efforts. There's actually some person in an Internet cafe in a country on the other side of the world that's actually chatting with the victims.
David: I see. For this particular type of scam. What about the other ones, though? It's come sometimes. Spidering would be code, right?
Dan: Yeah. If you think about scraping, that's for sure done with a program. High volume spam attacks are definitely done with a program. One type of attack that doesn't fall into either of those is something we call "likejacking" or "clickjacking" where you're sent to some random site, sexybartenders.com or something like that. You have to click on something in order to see something.
If the author of that site had hidden a Facebook "Like" button, like one of our iframes for our social plug-ins underneath the thing that you have to click on...
David: When you say underneath you mean in the CSS so that it's hidden?
Dan: It's underneath like, literally, you can't see it in the web page yet, it is below what you're clicking on so your click event travels through the thing you think you're clicking on into the Facebook iframe.
David: The goal here would be to get a lot of likes?
Dan: Yes. Then once it happens, it gets posted to your profile and to your friends' News Feeds and it just spreads virally from there.
David: The incentives here, the person who's doing this type of fraud wants to get a lot of likes. Presumably, they want to spread their website or something?
Dan: Yeah, spread their website. Spread whatever they're trying to sell. Free iPads, recently it's been a lot of free iPads.
David: I've seen some of these. I don't think I've ever seen clickjacking, though, but that's the whole point; you don't "see" it.
Dan: Typically you don't know that that's what it is. All you see on Facebook is spam in your news feed.
David: How in the heck do you discern a clickjacked click, versus a legitimate like?
Dan: At this point is when we really leverage user feedback. We look at what domain is being shared, what URL is being shared, how often it's been shared, and basically the statistics of the feedback that have been given to those shares.
Dan: This is where we use like comment counts, and very, very specifically "mark as spam" clicks. In News Feed, if you hover over a story, you can click a little menu and say, "report this as spam". If it's a fairly new domain, that has an inordinate number of spam reports, then the soft enforcement will be that, when you click on a "Like" social plug-in on the domain we won't just let the action happen, we'll actually bust out of the iframe.
David: And show the user more specific prompting about whether they want to do the like?
Dan: Yes, which would be really bad if we were doing it for The New York Times, but on joeevil.biz, it makes a lot of sense.
David: I guess a couple of general patterns I'm seeing here are, number one, you assume that the user is acting benign. You never tell them, hey, stop doing this, or hey, stop being an idiot. You don't hit them over the head. You kind of gradually educate them towards what proper behavior is.
Dan: Yes, absolutely. Even if you're doing something as simple as trying to share a link that we know is abusive, the messaging is, "we can't let you post this" and we show you the part of the message you're trying to post. This domain right here is abusive so we can't let you.
David: It's very respectful.
Dan: Arturo Bejar, the Director of Engineering who oversees site integrity, is our moral compass in a lot of ways. A recent article called him the "Compassion Czar" of facebook. He really sets down this idea that we really need to assume that the user is benign, the user is benevolent, the user doesn't understand what they're doing. They've been tricked into doing something that we know that we can't allow by somebody who is malicious. The user is never malicious.
David: I see. Do you find that there are cultural differences around these things? I know that different cultures have different norms of interruption or what would be considered acceptable to post in a news feed versus spam? How do you handle that?
Dan: Yeah, that's definitely an issue. One specific issue I recall having to deal with when I was working at the friend spam problem is that in some countries in the world it's actually very socially acceptable to send a friend request to somebody that you don't know and for them to reject that friend request. I think that in some cultures, the receiver of that friend request doesn't really consider it to be spam, but more of a compliment like, hey, your profile photo looks really nice.
David: So maybe they can reject it. How do you handle that? That seems really hard.
Dan: If you imagine using what country the user is in as a feature in the classifier we can definitely...
Dan: We can definitely discern. We can separate out the behavior and the dynamics of these situations based on country because we know what country the user's coming from based on their IP and if they state that I live in Great Britain.
David: I guess just a couple of technical things, we've discussed some of this stuff before, but how do deployments work here? If you want to push code out, obviously, there are a lot of different servers that are all running this code at the same time. What's the overall process there?
Dan: It's an interesting question, and for the people who are really interested, we have published a couple of videos on this, out Grand Master Pusher Chuck Rossi.
David: Grand Master Pusher? [laughs]
Dan: The Push Master. Manager of the release engineering team. Chuck Rossi, or chuckr, as he's referred to around here, has published some pretty interesting videos on that. But I can give a brief overview. Basically, every week, we push our main repository, our web tier, the tier that you're talking to when you go to facebook.com, which is a very, very large number of computers, running in several different regions: West Coast, Oregon, now North Carolina and Ashville, Virginia. We've got all these different tiers, and then we push. We push them every day, but the really big one is Tuesday evening.
David: Every week though?
Dan: Every week. Basically, if you commit a revision to "trunk" (we use subversion terminology).
David: Even though you use git now?
Dan: Everybody uses, at least for the main code repository, it's like a git/svn hybrid. Every developer is running a local git repository, but instead of pushing to origin/master, you're committing to trunk. The actual origin is a subversion server. If I commit a rev to trunk, by default, if I don't do anything else, it gets pushed the following Tuesday.
Ravi: You get to see results within a week. It's pretty awesome.
Dan: But if I need it to go out sooner, I can put in a merge request, and then it will go with the daily push.
David: So it's a little more out of band. Do you run the site locally when you're developing? How does that work?
Dan: Yes, when we're developing features, when we're debugging, we have development servers. I've got a private development server, and I run my little copy of facebook there.
David: The entire site?
Dan: Well, the site is made up of a lot of different pieces. I'm running the main web server process. I don't have copies of all of the databases on my dev server.
David: That's why I ask, because this must be difficult. It's not like you can just boot facebook. Does this connect to all of the production services, then? How does that work?
Dan: It connects to all of the other production services. It will go and talk to a production memcache tier.
David: That's interesting. When people are doing development, it seems like the philosophy is you develop the piece that you're working on, and then you use the rest of the deployed or production pieces, is what you're saying.
David: In particular, this is really hard for data, because there's so much production data that obviously you can't run a copy of that on your local computer. The standard strategy is just to interface with the production services all the time.
Dan: Yes, but it makes testing difficult. Testing is something that we continue to iterate on, because...
David: Yeah, you're absolutely at the forefront of this. There's no doubt.
Dan: If a particular database is down, if a production database is down, you don't want your unit test to start failing. A few years ago, that was the case. A DB would go down, tons of tests would start failing, and all of a sudden, we're blind to potential bugs in large portion of our code.
David: Do you have really complicated mocking frameworks or something?
Dan: They're becoming complicated, I don't know if complicated is the right word. But we use SQL shimming technology; what we do is, write a unit test that talks to the production data services. Then we will run the test in what we call a record mode. We will write down, basically, in a local sqlite file, all of the data that flies back between my unit test and the production data services.
Then on subsequent runs of that test, instead of going and talking to production data services, it talks to the local sqlite file, which contains all of that data that we would have seen.
David: I see. There's a ruby gem called VCR that does something similar to this for doing API stuff.
Dan: Yeah. It's an incredibly important thing if you're testing anything that isn't trivial in a place like this. But even then, it becomes difficult because our data access patterns will change. All of a sudden, if somebody adds a new column to a table, and that column wasn't present when you recorded your shimmed sqlite file, all of a sudden, your non-record mode version of your test won't have access to that data; you might start failing.
Nowadays, or just a few days ago, our test engineering guys implemented this feature where, if your test starts failing in non-record mode, where you're reading from the local file, yet if you rerun it in record mode—you basically go and refetch and rerecord the transaction with the production data services, then it will start passing—it will automatically go do that and update the local file. It used to be that you know, you'd have to go and do that and submit a revision for the new basically, transaction file.
David: I guess and...
Dan: That's automatic.
David: Around testing, do you usually write tests before your code or do you write them after, or how do you do that? I imagine it varies between groups.
Dan: It varies between groups. It varies a lot between groups. A friend of mine recently joined the ads team and he's told me that when you're going to push something to the ads service, you have to test it a lot because it could very well, I mean very directly affect the company's revenue.
David: Obviously yeah.
Dan: So they do a lot of testing, and, my team works on login services and these security critical services. So for us it's very important to test.
David: You have a little bit more of a conservative engineering culture.
Dan: A little bit more of a conservative engineering culture, yes. At the same time, it's not part of, at least, my personal development cycle, is not to code the test along with the feature. Which I think is what some people do. To sort of let the test sort of the test and your feature develop.
Dan: In parallel.
David: Yeah, that's how I develop.
David: I know a lot of people don't do that. I just find that when you have complicated interlocking systems I find that it can often be easier to co-develop only because there are so many touch points with other stuff that it's very hard to test after the fact.
Dan: Yeah, so with the touch points with other things...
David: Like services and API's and databases and this, right?
Dan: Yeah so, we have, our test framework allows us to stub out functions, mock out functions. The new hotness in testing here at Facebook is dependency injection.
Where instead of, all of a sudden out of nowhere fetching something from the database, your service, or your class, or your function takes as an argument a connection to the database, and then talks through that. So when your code is being run on tests, instead of having to go behind the scenes and mock out these sort of global symbols, all that you have to do is pass in a mock of that database connection.
David: Yeah we've done a lot of that too. Like passing in an instance of the data model, I did that a lot at Wishery, so it's easier because you can inject your mock. Right.
Dan: Yup. Yup, and that's sort of the new hotness that a guy named Aaron Donahue around here is really championing, and I can't wait to see it like really happen, because testing is a sort of a constant fight.
David: Yeah, it's an evolving discipline too. So I guess you know on that topic, you guys do a lot of open source contributions, hphp is a big one.
David: Like what are some of the other ones? I use a lot of the tools, like we're talking about, but I use Jasmine a lot which is done by Pivotal Labs, and like a lot of open source mocking tools, but you talked a bit about the frameworks used at facebook, as if that was distinct.
Dan: Yeah, so we actually use a lot of open source technologies for our testing framework. I'm more talking about phpunit and then things that we've developed on top of phpunit. I don't know, and I doubt that they're open sourced, but as far as thing that we do open source, the big one is hphp. Another big one is what we call the Open Compete Project. Where we...
David: Right, the PHP compiler.
Dan: So hphp is the PHP compiler. The Open Compete Project is where we open source the designs for our service.
David: Yeah, and I actually wanted to talk about that. That's on the list of things here like their standardizing data center design.
David: And that's the whole point and I think that the original stated purpose of this was to try and get the cost of data center deployments down, if I'm not mistaken?
Dan: Yeah, so the idea is that if we, you know put a good amount of effort into designing a very inexpensive, very efficient server, and then we open source that design and other companies go take that, improve upon it, and build with very similar hardware, the whole community, obviously, this is the whole open source idea, is that the whole community benefits from any one person's contribution. If a lot of people are using very similar pieces of hardware then the cost will also go down, benefiting the whole community.
David: Right and when you talk about open sourcing servers, you're not talking about operating system images or processors, you're talking about pieces of metal. Right?
Dan: I'm talking about, yeah, I'm talking about like the motherboard, pieces of metal, hard drives, processors, fans. Stuff like this. I have not touched any of that stuff so I'm not a great person to talk about specifics.
David: Yeah, and just to be clear, like a lot of other Internet startups here, you guys don't host your servers here, like, they're all in datacenters around the world, right?
Dan: Yep. A lot in Santa Clara, here or very close to here. A lot in a, that was our first data centers. Next was the east coast in Ashville, Virginia and then after that the one that we built which was in Prineville and then we built another one in North Carolina and I believe we...
Ravi: And you mentioned Oregon right?
Dan: Yeah Oregon, Prineville Oregon. "Region three"
David: And why these—region three, I see—and why these locations? I think Oregon has cheap power, maybe?
Dan: Oregon has cheap power and actually I should mention for those that are interested in a lot of background knowledge, there are some really wonderful videos about, that we've published, about the Prineville data center, about its various things that it does to increase efficiency, stay green and specific things about why it was built there. But one thing I do know, that power is very cheap but also it is very cool and very dry. And so the air that we pull into the data center is of an advantageous condition.
David: Yeah and pretty much year-round, too?
Ravi: Pretty much year-round.
David: Right. So it doesn't get super-cold, or super-hot.
David: I don't know why anybody would ever built a datacenter in Chicago, but I think so many have. It gets really hot and really cool. That's interesting so that's called the open compute project you said.
David: Open compute and then hphp.
David: Do you guys use GitHub at all here?
Dan: We do for some of our open source projects. I believe, like the Facebook Connect code that you can download and use and, like, your third party software in order to perform Facebook Connect actions.
David: A couple of other things I wanted to discuss. One thing is, like, what database technologies do you guys use here? I think, it was MySQL for a long time? I think it probably still is, to do a lot of the graph storage.
Dan: At the base of a lot of things, it's still MySQL.
David: Yeah, which is kind of notable, because you hear about all the NoSQL stuff that's going on around, like, HBase, and what else? MongoDB.
David: But it's still MySQL at the base, you said.
Dan: Yeah. Yeah. I'm not a great person to talk to about that. But I should say, one of the, actually, one of the notable exceptions to the MySQL rule is Facebook messages, which is now stored in HBase.
David: HBase? OK. And can you talk a little bit about Facebook Chat? I mean, again, it's just fascinating to get a sense of the broad variety of tools that are used here, you know. And you have thousands of engineers all working together. So, Facebook Chat was in erlang, if I remember correctly.
Dan: Yeah, I believe the chat servers, or many of the chat servers are still in erlang. The sort of, handling that part of the transaction. I've never worked on it, so I don't know a lot of the specifics.
David: Yeah. It's very polyglot. I think a lot of that, Google started, I would say. But I mean, you just said you have production code running in erlang, also in php, and some of the services are in other languages, I imagine?
But, yeah. And I mean, obviously, people who work on the Android operate in Java. People who work on iOS apps write in Objective-C.
David: All here.
Dan: And Java, yes, all here.
David: Wow, OK. And then, for storage technologies, you mention HBase, MySQL. Do you know of anything else that's in use? You're not really sure?
Dan: No, I'm not really sure. There might be something, but I'm not sure.
David: OK. That's incredible.
Ravi: The whole gamut.
David: Yeah, pretty much.
Dan: Yeah, I mean, our photo storage is not in any of those things. We use something called Haystack to store photos, but.
David: OK. And I think that's based on flat file storage?
Dan: I believe so, yeah.
David: Yeah. So, actually, when Schroep came out to Seattle, he talked about this...
Ravi: And he did talk about Haystack, actually.
David: Yeah, about, like, just serving photos out of Haystack and then, eventually, I think, memcached, there was a time when memcached went down and they didn't even notice. Which I thought was pretty incredible, just to have your cache go down.
Dan: I hope that we would notice something like that.
David: Yeah, but it was so performant that they didn't. That was what was incredible about it. I think that was back in like, '08.
So, I guess, wrapping things up a little bit, the IPO was just last week. I would be remiss if I didn't mention that. So, the NASDAQ button hack. This got covered by the news a lot. Like, what happened there?
Dan: As far as I know, one of our engineers put a sensor on the bell. When Zuck hit the bell, the senor would trip and that triggered an open graph action on Zuck's timeline. We've got a lot of open graph actions which read, "Dan read this article," "Dan listened to this on Spotify." Zuck's got this one on his timeline saying, "Zuck listed Facebook on NASDAQ."
David: That's awesome. It was a good photo op, too. You do have some people here that know how to hack hardware a little bit, pull apart boxes, and do things like that.
Dan: We definitely got some folks which take a lot of joy in that.
David: That's pretty great.
Dan: That's pretty cool.
David: There's that, and then there was the IPO Hackathon as well. Can you comment on the project you did during that?
Dan: Sure. I was just trying to put together a new iOS app. I've been doing a lot of work on our iOS apps recently to bake in security features into those apps. Basically I was trying to put together something that had to do with a digital photo frame.
If you imagine a photo frame you buy from Walgreen's, is really crappy. Maybe it's got an SD card. it's got a really low res display. Really the best thing you could do is use an iPad. If you could imagine saying, "OK, I want to maybe rotate through photos of my friend Dave and have it pull photos through Facebook." For me, my really killer use case is the fact that my wife sends me a photo of our three month old daughter every day at some point in the afternoon.
She sends it to me using Facebook messages thread. What I would love is to have this photo frame automatically show the most recent photo of my baby girl. With a lot of features on the main site, you can really get a lot done in one night. On iOS, I think it tends to be sort of a different story. Just because of a lot of what's involved there and how you have to tell a larger story in the software sense.
David: I get that too. It feels a little heavier weight. I think that also the testing tools aren't quite as mature too. Because it's compiled there's a longer cycle time.
Dan: There's a longer cycle time. When you're launching an iOS app, you need to bake in a login functionality. If you're building something for the facebook main website, people just go to it. You're already logged in. There's already all of this stuff happening.
David: Yeah, and it's probably worth mentioning, too, the main site. It's a website, so it gets sent down to the user every single time they load it. When you have a packaged mobile app like that, you do have to replicate a lot of the features. But it's funny how, like, a lot of development is moving back, almost, from web, like, to native client stuff now. Has that touched you at all?
Dan: Yeah, yeah. So, I think that the direction has been, like, we love mobile web, but in a lot of cases, it's not quite there yet. It's not quite, like, if you think about something as simple as scrolling performance. Doing, like, scrolling through an HTML view, is much more processor intensive than scrolling through a native view.
David: A native view, of course.
Dan: So, we feel that very deeply. And we seek to really push the performance of our mobile technologies across all platforms and try to really provide the best user experience possible.
Dan: I can't talk about it.
David: So, I guess one other thing is around attracting and retaining talent. So, now that the IPO has passed, have you hired many people since the IPO, your team? Or how have they grown?
Dan: I actually don't think I can talk about that, either.
David: All right, well. Nice to talk to you today. [laughter]
Dan: Sorry to like kind of end on a sour note.
David: That's OK. Nice to talk to you, Dan. Thanks for having us today and I'll see you soon.
Ravi: Thanks a lot, I appreciate it. Thank you.