EIGHTH INTERNET GOVERNANCE FORUM
BALI
BUILDING BRIDGES ‑ ENHANCING MULTI‑STAKEHOLDER COOPERATION FOR GROWTH AND SUSTAINABLE DEVELOPMENT
OCTOBER 24, 2013
2:30 P.M.
SESSION WS 81
MULTI‑STAKEHOLDER DIALOGUE: BIG DATA, SOCIAL GOOD AND PRIVACY
The following is the output of the real-time captioning taken during the Eigth Meeting of the IGF, in Bali, Indonesia. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.
>> Robert here testing, one, two, three, four.
>> Robert, can you hear us?
>> ROBERT KIRKPATRICK: Testing, testing, testing.
>> We're sorry about this, everybody. I will put that on mute. Mute. How can I put the microphone on mute?
>> Guys, can you hear us?
>> We can hear you.
>> Is this Robert? Is this Robert?
>> ROBERT KIRKPATRICK: This is Robert.
>> CHRISTIAN REIMSBACH-KOUNATZE: This is Christian and I can hear you clearly.
>> Okay so now they can hear us.
>> DAVID GROSS: Okay I think we may have now sorted out our technical difficulties. The Internet is a wonderful thing, but it is not a perfect thing. My name is David Gross and I have the great honor of moderating this panel. We have both here in person and online from around the world an extraordinary group of panelists to talk about a set of issues that I know is much on the minds and lips of people here at the IGF and elsewhere around the world.
We will focus on the issues of big data and privacy and related issues.
We're going to, I think, try to go in the order ‑‑ sort of somewhat geographic order. We will go remote first, in part ‑‑ I probably shouldn't say it this way, but I'm more concerned about losing our remote panelists technically. I'm not concerned about losing our panelists here live. Let us start. Robert, if you can hear us, let us start with a brief presentation by Robert. We will then try to go to Karan and then to Christian. I have asked all of our panelists to be very brief in their presentations and so that we can have as much dialogue both amongst the panelists and importantly with you all in the audience.
Obviously, as you have learned over the past few days, audiences here are equal participants with the panelists in terms the interaction and the like.
With that, Robert, could you start us off?
>> ROBERT KIRKPATRICK: Absolutely. So my name is Robert Kirkpatrick. I directed Global Pulse initiative in the Secretary General's office. Global Pulse came out of the global financial crisis, going back to 2009, and it was at that time that there was a recognition among the world leaders and the Secretary General was hearing very broadly that they knew the crisis was going to be affecting vulnerable populations and their countries but all of the household level information that they had on well being, on health, on livelihoods, on food security, was in the form of low frequency statistics that predated the onset of the crisis.
So there was a recognition that in today's policy making landscape, we need realtime information on what's happening and they can allow us to act quick enough to change the outcomes. At the same time, there was a recognition that while policymakers are still struggling with traditional 20th century tools for data collection, essentially using 3‑year‑old data to make 5‑year plans, people are creating oceans of data around us, as they buy and sell goods, as they search for information on the Internet, as they transfer money over mobile phones and share their lives on social media, they are producing an immense amount of information that the private seconder has for sometime been learning how to master in order to understand their customers in realtime, track market trends in realtime and monitor their own operations.
So the question arose, how can we adapt these new forms of data, 9 new technologies that allow you to analyze this information as it's produced and a new field of data science that allows you to mine this information for meaningful patterns up to the challenges of fighting, hunger, poverty and disease.
As the initiative launched, we realized there are essentially two kinds of information out there, most broadly that fit into the category of big data. There's what people say, which is what they are talking about online, through social media, but what can also run into that, online news, online retail advertising by organisations. At the same time, there's a lot of information about what people do. So financial transactions, communication patterns through mobile phones, how people call and above about cities. This information is very different. It's being collected behind corporate firewalls by businesses and being used as a foundation for competitions. It's not being shared both out of fear of how it would be used by one's competition but out of genuine concerns over data privacy and how one could use that information without creating new vulnerabilities in the population you want to help.
So we see a tremendous opportunity in all of this new data to understand, you know, how people are being affected by crises, to understand simply what's happening on the ground as it unfolds and as well to measure the impact of programs and policy decisions.
There's one caveat, of course, which is that we see in big data, a tremendous opportunity for development, unless we fail to protect privacy in the process. If we fail to protect privacy in the process, if we can't pull together the partnerships and the expertise to find safe and responsible ways to analyze this information, we are creating new risks and a risk, not only to privacy, but by extension, a significant risk to human rights as well.
There's a lot of information out there we could talk about. I want to be brief. Let me just give you a couple of examples of some of the information that we see as valuable. In social media, of course, there's a great deal of content.
If you look at a network like Twitter, we opened our first pulse lab in Jakarta in October of last year, in partnership with the government of Indonesia, and this is a country that is ‑‑ depending on the day of the week, either number four or five on both Facebook and on Twitter. Jakarta is the number one city in the world in terms of the use of Twitter. They are ahead of Tokyo. And when you filter out the conversations around sports and celebrities, what you find is a great deal of the discussion of the affordable of employment and food, and whether government programs are helping them.
In one project we are working on with UNICEF and the World Health Organization, we found that a lot of parents in Indonesia are sharing through Twitter the fact that they don't intend to vaccinate their children and they talk about why.
Being able to detect increases in different parts of the country and misperceptions about the risk of vaccinations would be helpful before it leads to a misperception and kids are not vaccinated.
There's lots of work that's been done on social media analysis.
Another type of information that we are very interested in, is the information that's generated by mobile phone networks. So as people move through cities, making calls and sending text messages, mobile phone networks are able to collect the records of that behavior. We are not talking about recording the calls or the content of the text messages. A mobile carrier can see the population of a country moving around on a map in realtime, which no one has been able to do before. They can see spending patterns on air time. So it turns out that if you buy 20 cents every three days in terms of air time credit, that person is earning a lot less money than someone who spends $10 every two weeks. You can essentially construct something beginning to approach a realtime map of poverty. You can get a sense of an income of a population and the disparities simply by looking at airtime purchase behavior. And you can monitor for changes to understand when a population might be under stress due to natural disaster.
There are ways to model the spread of malaria and others. So, of course, if we are looking at East Africa, which is where we are opening our second lap in Uganda, we are interested mobile money transfers, where you have inbound money transfers over SMS would be a powerful tool for triggering an intervention.
So in sum, we think there's a tremendous public good, if we can begin to bring into the conversation that it's not a polarized discussion and private sector companies pushing the envelope on profitability but to say that there's a poll in this discussion, which is that big data is a raw public good, about you only if we can figure out ways to use it safely and responsibly. We have been engaging with the private sector around a concept we call data philanthropy. We are asking, lookK. we find a way for you to share some of the data that powers your business in a way that's aggregate and anonymized to protect your own competitiveness but can give us the realtime early signals of a population in crisis or a new evidence base that our policies are or are not working.
I will stop there. Thank you.
>> DAVID GROSS: Well, thank you very much, Robert. I appreciate it.
I would normally have cut somebody off a little bit sooner, although for two reasons I did not cut off Robert. One, it was, of course, fascinating to points he's making about the relationship between big data, development, public health and the like, but also since it's 3:00 in the morning where Robert is, in New York, I'm just so grateful that he's here and I want to make sure he stays awake.
We will go to Karan who I think is in Malaysia at this point. Thank you. Karan, are you with us?
Karan, I understand that we electronically think you are still there. So don't give up. Christian, can we go to you next?
>> CHRISTIAN REIMSBACH-KOUNATZE: Okay. Thank you and hello, everyone. My name is Christian Reimsbach with OECD, I work with 9 director for science being technology and industry which is responsible for innovation, more broadly and in particular, the digital economy, the Internet economy.
So in 2010, we started a project on the economics of personal data, which looks at the role of personal data for ‑‑ as a currency in the Internet economy and upon innovation which also led to the review of the OHD privacy guidelines which were released in September this year.
And another kind of input to that work on the privacy guideline which I would like to inform you about is the work on data as a new source of growth, which is the reason why I'm talking here which look at their role of data, not only personal data, I have to say, but also industrial data, also weather data and environmental data and so on. As a source for productivity growth and social well being.
So the work we are doing here looks at how can we use data to increase transparency in the public sector. How can we use data as a new source for research and science, and also how can we use data for making healthcare maybe, for example, smarter.
And I have a couple of things to touches across all the different topics that we are addressing. One is obviously privacy and this is the big data world. And it won't go away, and so we are working hard on this topic.
Another one is employment. We believe big data raises a lot of issues in terms of skills requirement, which is also very important for developing countries, but it also raises a lot of issues in terms of employment more broadly because many of the big data applications that we are looking at regarding productivity can have a negative impact as well that we need to consider.
What I would like to raise now for the last minute, one minute that I have is the theme of open data and it's an important theme for us, because we believe that ‑‑ and I think it's related to what Robert just told us, the idea that is data is a public good. If it's a public good you want to think how do you look at society at large. Think about open data is much easier in the context of the public seconder. They are paying taxes. Okay, if we fund that data, do they have access to it?
That's why in the area of public sector, the public sector, the idea is much more developed. But we are also thinking about the concept of open data much more broadly in the context of the private sector and also in the context of others. I would like to end with two examples. One is the idea that, okay, mobile companies are sitting on much data set and some of those companies are also ‑‑ have seen the potential of that data set for their commercial insurance, and some of them are selling those data to point navigation companies and there is also as Robert said, the idea that we could use it for the context development, so it's also related to that.
And the other idea that I would like to highlight is the notion of let's say providing access to their own data so that they can decide whether and to whom they want to share it. This is an example we heard in the UK with the data where the consumers are granted the possibility to go to the companies owning their data and having the possibility to share the data with them so that they can share it with other companies as well, as society at large.
So these are the topics that we are really interested in. And I think I would like to end here and everything else can be discussed during the discussion. Thank you very much.
>> DAVID GROSS: Thank you very much, Christian. That was terrific.
Let me ask if Karan is on the line. I'm ‑‑ I see from an email from him that he can hear us fine, but he ‑‑ are you there?
The email says that we can't hear you and I certainly cannot hear him right now. Karan?
All right. We will try again. I think we are getting closer.
Alex, can you fill in for Karan for a moment?
>> ALEXANDRINE PIRLOT DE CORBION: Yeah, I can fill in, although I don't think we have the same perspective on the issue. So just a quick note about introducing privacy international for those who don't know the organisation, you can understand the context and what our mission is.
So privacy international is the first organisation to specifically work only and specifically on the right to privacy. And so through different actors and different thematics, we are trying to push for national, regional and international frameworks to respect and promote the right to privacy and data protection. So that's what we do in a nut shell. So what we wanted to raise is some of the concerns about the increased use of technology and the development and the humanitarian sector with well intended, you know, objectives of helping people and for social good purposes. We don't discredit the advantages that have emerges from using these technologies, especially in difficult environments and conflict, post conflict and post natural disaster context.
There have been very useful in overcoming the practical challenges of reaching out to people and providing people access to aid, health care and food. But within that, we want to raise four concerns about the use of technology, and particularly mobile technology in these different programs.
One is related to the logistic frameworks, which in most countries were these technologies and these progress and being run, there is no legislative framework to protect data protection of data owners. There's often a right to privacy included at the constitutional level, but there's no enforceability of this right for privacy and this can lead to serious human rights violations, not only to the right to privacy, but also in tern context violation of the right to freedom of expression, freedom of movement, if these ‑‑ if people are able to be reidentified and arrested in cases of human rights or journalists. Through this use of technology, you have an investment of huge data that can be aggregated and used for unintended purposes such as reidentification.
And this is within a legal void. So that's one of the concerns we have.
Another one is linked to the multiple actors that are involved in the deployment of these programs. So you have often the host, like the government, itself, that's involved, but you also have the private sector, who provides access to these services such as telecom companies or Internet providers and in the context especially of development the sector, you also have the donor community who can play a role and also international NGOs who are the ones maybe pushing for these programs through programs like etransfers, or cash learning partnership programs.
And there's no ‑‑ again, it's linked to the lack of framework to understand issues linked to who is responsible for the data collected in terms of ownership, also accountability in case of violations.
So, you know, this raises quite a few issues about the individual itself, in terms of the information they receive, when they are part of these programs because I think in certain circumstances, people look at the social good and having access to health care, food, and other basic essential services, without looking at the impact it can have on the longer term on their ‑‑ on their privacy and what is being done with their data.
Another element and it's linked to the context in which these programs are being deployed is in a way, the vulnerability of the beneficiaries and the link to what I was saying now. Are people informed in these programs that the data is collected, that there's a possibility that it can be shared? That's the changer of this mass collection of their data and the centralized database. There are issues concerning the security, and the data collected. So these are all concerns we wanted to raised. One last one is linked to the mobile system and it was brought forward by Robert as a positive outside of using mobile technology to be able to identify trends, movements of people for social good purposes, but another ‑‑ you know, to be able to monitor people's movement and that's linked to the surveillance, knowing where people are, and monitoring their movements every step of the way because mobile devices have to be in constant connections with cell sites. In certain circumstances that can have very tragic effects on people's human rights. So there's definitely something to take into account as we are developing this forward.
Data has definitely been seen as a new conflict resource, as a one size fits all approach to solve, you know, poverty, poor access to social services but there's definitely to take into account privacy, which can have impacts in the short term and the long term.
>> DAVID GROSS: Thank you very much, Alex. Let's try one more time to see if Karan is back on the line.
>> KARAN HENRIK PONNUDURAI: Hi, can you hear me?
>> DAVID GROSS: Yes, we can. We are relieved. Go ahead, Karan.
>> KARAN HENRIK PONNUDURAI: Great. Hi, everyone, I'm the chief information the of a regional mobile telephone. We have operations throughout south Asia. I'm responsible for all digital products for big data and mobile advertising. So I've got a slight interest in these issues from a professional perspective. But, you know, apart from just that, I have a personal interest. Not many of you would remember a few years ago in Israel there was a case of a few terrorists being caught because the intelligence agencies had been doing a semantic analysis on email threat.
And through a variety of means they were able to uncover a clock to a particular bus at a particular time and this of interest to me because later on when the evidence came out in court, it turned out that I was scheduled to be at that particular place and at that particular time. And so if someone didn't have the capability to analyze the data and not address those privacy issues, I wouldn't be here to share my thoughts with you today.
So I do have a particular interest in the role of privacy and the role of privacy from a very personal experience.
Coming back to our perspective from a mobile operator. I think Robert made a good case about the fact that mobile operators have a lot of information and also the context of that information that is highly unique in the world and to be fair, most of us are struggling very much with this issue and the rise of Internet players, like Facebook and Google are really pushing those issues which are very new to us. And so conferences such as these, really enable us not to say that we have very good view points, but it helps us understand how to approach this.
I think there are three major issues that I would like to put forward, not as an expert but really as a way to try to stimulate the debate around security, but as well as the rights on data. I think we heard a lot about the right to personal privacy and that is important perspective. However, the data content does not exist on an individual basis. A lot of issues, as this information is linked, is shared, the fact that I do something and someone else does something provides a new data point and each creates a new data point. So currently the issues of right, and the issue of shared rights and how do you address something that is shared with not only one other person, but many millions of others.
And the their issue that I hope to be able to discuss is obviously one of enforcement.
The second of this principal, I would like to talk about is the fact that there are many different perspectives we can have to big data privacy. One of these is from an individual perspective, but as ‑‑ how do we manage the individual shareholder perspective, as well as organisations. All of us will have different view points about the same topic. So I think it's a way of opening this.
>> DAVID GROSS: Well, thank you very much. That was very helpful, and the fact that you could communicate with us is almost remarkable.
Rohan, we heard about development. We heard about 9 economic uses of data. Rohan, you are an expert on all of these things. Enlighten us.
>> ROHAN SAMARAJIVA: Thank you.
I work with an organisation called Lirne Asia which works across a number of developing Asian countries, and we actually were working on big data before the term became popular, in that we were taking symptomatic information, using mobile phones, putting them in to a database and running software from Carnegie Mellon on the data to see ‑‑ identify the patents. Now, this is biomedical research, of course, and we went through all the privacy clearances and ethics clearances and so on.
The issue with this kind of big data is that it's very costly to produce and as soon as the project funding ends, the data production also ends.
So I was looking at a subset of big data which was originally calls transsection generated information by Tom Macmanus in a piece that he did. I used the term TGI, the term that Tom came up with but it's better described as a subset of big data called transsection generated data because this doesn't actually cost any additional resources to produce. It is produced as a byproduct of some other activity.
So that's where we come to the mobile data sets that people have been talking about and which is, of course, attracting a great deal of interest.
Now, when we as an organisation that seeks to do research, that is pro market and pro poor, we are interested in big data that will have the greatest coverage of poor people.
So, for example, if we are looking at bank information or supermarket information, which is available in computer readable form, we would be looking at very small subsets of rich people in the countries that we work in. In actual fact, there's own one data set that has comprehensive coverage of everybody and that is mobile data. I would even argue that fixed telephone data which some people have used for research, for example, in the UK, will not do the trick in our country, since a minuscule minority uses fixed ones.
Now, when it comes to this data, we actually have obtained these data sets from multiple operators. So we are not talking about the subject in the abstract. We are getting our hands dirty working on the legal niceties of getting access to the data, of storing it, just, for example recently I was on an aircraft with two ‑‑ 5 terabit hard disks in my suitcase and I didn't even know what a terabit was until a few years ago and I'm hauling around 10 terabit amounts of memory because that's the kind of volume that's coming in.
What are the kinds of questions that mobile data allows us to address? Quite a number of development agencies and development thinkers are beginning to see cities as the engines of growth in the 21st century and there's a great interest in unclogging our cities. I think if you have been to any of our cities, Bangkok, Jakarta, you understand that we are almost immobile by now.
So we need to unclog our cities so that they can be the engines of growth. So there's a lot of interest that we wanted to focus on the size of cities, the transportation patterns, perhaps even, for example, the divides between different parts of cities do the rich parts and the poor parts actually communicate and so on.
Now, for this purpose, we ‑‑ when we negotiated access to the data, we asked for historical and anonymized data. I think I want to emphasize and Robert mentioned realtime and I do not want at this stage of the game to touch realtime data with a barge pole because there are many serious implications that flow from realtime data, because too many government people have watched the minority report and they have a certain interest in predicting the future.
So my preference is, of course to work with historical and anonymous data. And from the getgo, that's all we are working with. What is, I think, unique about our work, compared to a lot of the big data research that's coming out is that we are not dealing with a single phone company's data. We are dealing with multiple phone company data. And so conclude, I would say that a lot of things that we talk about in the abstract, once you get into the data, you find that that is not necessarily the case.
So, for example, I assumed that we would have the mobility data from the base stations as a sim moves from place to place, that we would be able to get this data. The majority that we talked to, they flash this data. They don't bring it in. The different companies do not collect the same information because you must remember, storage costs money. And these people are scrambling to store the information, to have the data analysts to work on them, and so on.
So I think you sort of got from Karan's comments, they are not also ‑‑ they are also lying incognito. They are not exactly sure what they are doing and they want to do the right thing, I believe. They want to do what is good for the companies and they are doing some work that is, for example, helping them to identify the customers who are about to leave and so on and so forth.
The last point I want to make, you see, you cannot come at this problem with old mind‑sets and western frames. So for example, in the west, it's relatively easy to think about a mobile connection or a number as being associated with a particular human being. In our countries, we discussed for people to go around with five sims. It's known that some people have 70, 80 sims illegally registered in their names by the people who sell sims. I can tell you how these things happen during discussion time.
So there's no relationship with a sim an an actual human being for the most part and most of our people, 99%, 95%, are on prepaid cuts not on postpaid. The whole idea of the classic informed and consent framework being applied to big data, I think is quite problematic and as somebody who was present at the founding of privacy international and has written quite a bit, this is an area where we have to be careful not to bring all thinking into a qualitatively different problem area. Thank you.
>> DAVID GROSS: Thank you so much, Rohan.
Before turning to Pat in a moment, two things. One is to remind you after Pat finishes his opening remarks, we will be going to all of you and I'm both ‑‑ by all of you, I mean those who are online, as well as those of you in the room and, second, on behalf, I think, of everyone here, I would like to thank the GSMA for organizing today's extraordinarily interesting panel. Pat?
>> PAT WALSHE: Thank you for your patience in waiting to get the technical issues sorted out. My name is Pat Walshe, I work with the GSMA. The GSMA is a global trade poddy that represents 800 mobile phone operators around the world. I'm an ex‑chief officer and I ran a team that runs government interception.
There are a couple of things that were said for me, we talk about big data and the abtract, actually to use Robert's term, it's an ocean. Well, it's rivets of little bits of data. Little bits of data for me that reveal very private aspects of people's behavior. It's increasingly rich and contextualized. What do I mean when I hold this thing and in parts. World that Rohan, people are going to SmartPhones in different environments and location might be not just where I am, it may also be where I'm not usually which says something about me. It's about the direction of travel. It's increasingly about the people and the things that I'm also connected to.
And with we talk about mobile data, I don't think we are quite clear. Mobile data can be many things. It can be the data on this. I have relationships with the rating system. They collect data about me. The device manufacturer collects data about me, the mobile operator collects data about me, the app that I'm using collects data about me, that shares data with third parties for advertising. So it's quite complex. In the developing environment, and I think what Rohan is talking about is the use of call data records, which is produced and generated by mobile phone networks and that is extremely rich and that is the only single view you will get of location movements in a country, to be honest, and that's why it's unique and that's why it's powerful.
I'm very passionate about this because I believe mobile technology has demonstrated to date the ability to transform people's lives, to empower people and I think now we have data. We have the ‑‑ we have come to a situation where data has the same ability to empower and transform, 'em power individuals to be aware, to manage their health, to transform owe sites who are suffering from air pollution, et cetera.
But I think we've got some real challenges and Alex referred to some of those, which is while this is a global connected world, you know, we have a patch work of geographically bound laws that try to deal with privacy, where data flows between multiple parties in realtime simultaneously.
I think we also have the issue, when we talk about mobile data, of course, that mobile operators are regulated in ways other entities are not. So where an Internet based company may collect your location data, for example, they wouldn't be regulated in the same way.
Some of that regulation has created some unique privacy protections, I think. As an organisation, many of our members don't operate in companies where there's normal data protection legal frameworks and that's why we established mobile principles to help our members establish a baseline. We have app guidelines to help them dedine apps with privacy in minds but when it comes to big data, I think Rohan talked about it. How many of you remember Haiti and the earthquake in Haiti.
Mobile data helped to understand where people had moved to. When we knew where people had moved to, the agencies were able to target and drop food and shelter, et cetera. I think also it helps to understand the spread of disease in some countries too. It's been very powerful, but for me, I think the urban planning environment, the need to reduce air pollution, is very striking because that raises the very real question about the rights of society and the rights of individuals and how to achieve a balance in a way that respects privacy and protects privacy.
So what do I mean by that? You know, Rohan alluded to Bangkok. I was in Malaysia recently and I noticed China issued an air pollution in payzing. And how many of you commute into cities and at the end of the day, you leave. You pollute and leave. You leave the problem behind. Mobile data can help understand and manage more intelligently those flows of traffic. It can help to reduce air pollution and noise pollution for the communities and the individuals who live in those communities.
I went to a smart city in Brussels and it says it costs 1 billion Euro a month to deal with the external costs of air pollution because 70% of those problems are respiratory problems. So big data, mobile drive, big data can help to reduce some of these external costs. It can help to transform and transform people's lives. Let's not focus obsessively as I seem to hear about the fact that somebody knows what DVD and what video I have watched online. Let's focus on some of the broader, big societal issues and let's ask the question, are they able to use the data in wise and responsible ways to address some of the most pressing problems that scientists face?
Is it a human right to have access at clean water? Many people at this conference would say yes S. it a human right to have access to clean air? Many people at this conference would say yes S. it a right to have access to healthcare? Absolutely. Data can help these. I think we have to find a different way to do. This.
I will wrap up quickly. What do we think? We talked about anonymization. In this agile method, I don't think the impact methodologies can understand these risks and mitigate them in a appropriate way. Anonymization. So what's the latency is weekend in a non‑anonymized form and anonymized form and will it reduce other risks associated with a huge database of people? I don't think it will. We need to be careful of that because it creates security risks.
So you know, in many countries as Rohan was referring to, people live in distinct groups and so people can help you understand where they moved to, but, of course in the wrong hands that might be an issue. How do we create the right algorithms and the right encryptions. At the moment, there are different approaches to this.
I think also then we need to understand, therefore, that it's not just about PII and in this big data space, I don't see any common taxonomy. Everybody in this room probably has a different point of view on that.
I think a code of contact is also necessary, I think partly because laws don't exist in many of these regions.
And therefore, something has to be done and that's for the stakeholders to come together, I think.
That's it for me.
>> DAVID GROSS: I hope that's not it for you. I hope we will have a lively conversation. You have been provocative.
I would like to turn to the rest of you seen' see if we have other comments or questions.
>> PARTICIPANT: A month ago I was in rural Tanzania and I was staying in a village and the villages live on average on about $1 a day, and you are talking about mud huts no, running water, and aid projects comes to bring in clean water. And one mobile phone in the village and a separate person has a generator to charge that mobile phone. No one can afford it. There's no electricity. So you are making a big deal about the policy assumptions that can be made from mobile phone data in helping the poor.
In my experience there, it wouldn't necessarily be that useful for the poorest of the poor because they don't have the mobile phones to be used to get that data, to do that analysis.
So it will help down to some level of the poor, but the poorest of the poor will still be missing in that data. So that was just a comment I was making and where that was being taken into act.
>> DAVID GROSS: Thank you very much. Is there anybody who is ‑‑ either Robert, Karan or Christian like to comment? I can't see you. Usually people nod at me when they want to comment. Let me go there first and then I have a bunch of people nodding here.
>> ROBERT KIRKPATRICK: There is Robert. There's certainly concerns around this data. I would certainly say if you are talking about a poor community where there are no mobile phones and no electricity, then you are producing big data and you are not ‑‑ and you are not producing data and you are not represented in big data, and therefore there's going to be bias. You certainly wouldn't want decisions made supposedly in your interest based on an analysis of which you are not part.
You know, our view is that this is a brand new field, right? When people lose their jobs when they get sick, when they begin to struggle economically, we don't know if the signature of these changes in well being or in the data yet. It will take us several years to develop the partnerships, the privacy protection methodologies, the technologies and the analytical understanding of what to look for in data and by that time, we are expecting private sector to have found ways to close the digital divide further so that more and more people have phones.
We are trying to think about how we can be operating from a policy perspective in five years. I would note that big data is also creating a new digital divide, because data centers, even if everything is free and open source, have a huge air conditioning bill and this kind of analysis is putting very different kinds of power in the hands of a very small number of elite today.
>> DAVID GROSS: Thank you very much, Robert will Alex?
>> ALEXANDRINE PIRLOT DE CORBION: I mean, we completely agree and it's something I brought up yesterday on the big data development and privacy. Big data, because of the nature and the way it works is discriminatory and exclusionary. It excludes people that done use the Internet, that don't use Facebook, that don't shop online. You know, all of these things that interconnect, where the data is collected from different sources don't include the poorest of the poor. And that's why we start working on this aspect of development programs and humanitarian aid because if the purpose is to help, you know, they say the poorest of the poor and those who really need the help, if they are not included in the decision making processes and the data used to develop these policies then they will be inefficient and inadequate and they won't solve the roots of the problem.
In Africa, less than 10% of the population is connected to the Internet and that's where a lot of programs are being deployed. So that's definitely a concern.
>> DAVID GROSS: Rohan?
>> ROHAN SAMARAJIVA: One must always be careful about drawing conclusions from one observation.
We do surveys. We have been doing large sample service, 10,000 sample, 12 languages across the bottom of the pyramid in Bangladesh. When we ask if people have used a telephone in the last three months. We have 99% have used. Not that they own, but they have used. I think the date for Tanzania are done by some of our colleagues.
The data that's available, everybody knows that nobody has perfect data. You are talking about the best available data and the best available is not perfect, never is.
The other issue, of course, the more interesting question, is that of collecting privacy. I have some interesting stories that I can tell about collective privacy. That is you can never tell what that individual person in that village is doing because that is being used by everybody in the village. I think people no theorize about this, have no idea about collective privacy rights and one of these days when I get a bit of time, I will work on it.
>> DAVID GROSS: Thank you, Rohan. Pat?
>> PAT WALSHE: I think you raised and exceptionally valid point and our association is busy working around the world. In Africa to build out more networks, more capacity, and I think that really is important, that you do ‑‑ pay have are a section of society excludes because of technology and more and more our relationships with governments and corporations are determined by ‑‑ by our access to technology.
I just wanted to say one other thing in this, that the other thing that I see that's consistent across this space, because people are enmeshed in different relationships is the lack of skinny is and the approach to privacy and I think, you know, individuals deserve for stakeholders to come together to find some common approach.
>> DAVID GROSS: Thank you very much. We will take a couple of questions together because I know there's a lot of audience who would like to participate. And then we'll seek responses.
Mike, you had one?
>> PARTICIPANT: Real quick. Mike Nelson with Microsoft and also I teach Internet studies at Georgetown. Alexandria, you are out numbered here. These people are passionate about big data and you are passionate about privacy. I wanted to give you an opportunity to hear more examples. We heard cell phones fighting academics and helping to clean up after the Haitian earthquake. I would like to hear an example where cell phone data was clearly linked to some terrible outcome, either for an individual or for a group of people. That often can motivate good policy, even a bad example can motivate good policy and I would be particularly interested if you think we need to do what Patrick said, is rethink our ‑‑ the way we classify different types of data. He said that the old PII model may not work when some of this data isn't thought of as PII data but could be used to extract very personal information.
>> DAVID GROSS: Thank you, Mike.
And we'll give Alex more time to think of the perfect couple of examples, we will keep going.
>> PARTICIPANT: 30 years in the data protection authority in Europe. 30 years in the data protection authority in Europe. First, I would say that it's true, that for any study that is needed and of public you don't even ‑‑ you don't always have the data. Even in Japan, with the terrible problem they had, some of that data were not existing and everybody had been collecting the date they needed. One was the level.
Okay. So you need the right data for the right problem. Now I completely agree that there might be public interest to delegate to the secrecy of communication, which is accepted all over the world, hmm? Secrecy of correspondents. It means who is calling who. Okay and even if is a phone is used by many. You have to think about personal data. And the fact ‑‑ and to anonymize, let me ‑‑ give me the example and I will tell you where it is anonymized and why it is not. It's a case‑by‑case question.
Now, we agree that there might be delegation in the name of public order would decides that for management of traffic data on the highway you can use mobile traces. Hmm? I agree. Who decides? How long do you keep the data? Hmm? Who has access to the data and so forth.
I am sure that for public interest, first we have to know who decides. Hmm? Not hidden. Would decides in a democratic way. And secondly, how the data processing is operated and this must be puck lick. Thank you.
>> DAVID GROSS: Next. Thank you.
>> PARTICIPANT: I think one of the very important things if you are trying to get the right balance, between the use of big data and privacy is to frame the question you asked people correctly, if we take the example you have just given of collecting mobile data, if you frame the question, are you happy for your mobile data to be collected as you go along in your car for your vacation and you say we want to do it because we are going to try and reduce air pollution or we want to do it because we're going to reroute you round the fastest route, people will all say, yes.
But if you say we are doing it because we want to make sure that no one is speed, then they will all say no. And so, I mean, that's maybe a rather trivial example, but it's incredibly important to frame the questions for people to get the balance you are seeking. In an appropriate fashion.
>> DAVID GROSS: I know we have another others but let's take that as one block. We will go to Christian first because I understand he's trying to come in, if he's still online.
>> CHRISTIAN REIMSBACH-KOUNATZE: Thank you very much. I would like to respond to the last point because I think it's related to another point that I wanted to make, which is the democratization of the big data. A lot of examples we have heard so far are basically still about citizens or individuals being, if you want, a subject of big data and then you have a small group of users, light researchers or big companies or governments using the data for well ‑‑ I mean, well understandable reasons like preventing the natural disasters ordealing with natural disasters. I think this is one aspect of the big data but we shouldn't restrict ourselves to those examples. Now I link to what the extra just said, I think we need to let people really actively participate in the big data discussion exchange. So it's not only about, yes, asking people, explaining to people, what do we want to do with the data. Yes, many people would agree, if you use the data to prevent disease, outbreaks and so on, or to reduce pollution, most people would agree that, okay, I ‑‑ I'm okay with sharing my date. AI think this is one aspect and we done do it very well so far.
The other aspect is really also letting people themselves use the data not ‑‑ obviously not everyone has the skills to use the data and do big data analytics.
Also I would like to argue that nowadays we have tools in lace that make it easier for even non‑data scientists to use the data.
And so far, we should also look at that aspect which we done do. So I would like to end here with that.
>> DAVID GROSS: Thank you very much. Alex?
>> ALEXANDRINE PIRLOT DE CORBION: I do have a couple of examples, actually, that were part of the my presentation. So it's great.
The first one I will mention is the data for development initiative from Orange Telecom service in Cote deiv iore they are able to make conclusions about social divisions and segregation, based on language, religion and political persuasion, political stance which in the context of Cote deiviore, and it can have an impact in the elections.
In the aftermath of the antigovernment food protest in 2008 in Egypt, the authorities used collated text message data to identify protesters and convict them.
The initiative at the beginning is a good initiative but then it was used for, you know, a wrong purpose.
In terms of the personal data, I don't know if I understood the question correctly.
>> PARTICIPANT: Well, I was following up on what what Pat said. He said the old idea that we have PII and we have other data, and that model may no longer apply that we may need a finer grained system or we need another way ‑‑
>> PAT WALSHE: So there's new context and focusing on the fact that it's Pat Walshe doesn't really do it. There are other context and data that could have security risk for me and it's the context that we should be looking at rather than just a set of data.
>> ALEXANDRINE PIRLOT DE CORBION: Yeah, just on that note, I mean, it's key examples. The example, we are working with a partner in Kenya, working with HIV AIDS patients and Kenya is putting in a lot of e‑health services so that HIV patients get reminded about their appointments and getting their medication and they are using mobile technology for that but it means somewhere down the line this information can be accessed by the government, both and religious groups. And there's a high stigma, unfortunately for HIV patients. So maybe that's one of the examples on that, the dangers of, you know, the assumptions that can be made in person context and not only in developing countries, even in the west that can have an impact.
>> PAT WALSHE: Well, I would like to address the two points by the lady in the corner. They raise some fundamental issues and one is you talk about European law and mobile call so traffic data which is regulated by the privacy directive in Europe, but for me, that doesn't do it, because we need to be looking at the context, because for example, the traffic data that done by mobile data is regulated by the privacy directive but the equivalent data is processed by Internet players VoIP and what's up messaging is not regulated.
If we don't see the privacy apartments emerging in that other space, to what degree does it create good privacy protections. We have to look at what it is. What are the risks we are trying to address now because the risks are not the same when the laws were introused. That's a fundamental question whether or not, and then we have in the corner, that's a really good point about the way that people feel and how you put the question to them but another question is ‑‑ you know, is about choice and limits to choice and that same regulation says that those that are subject to it must have raised or anonymized the data on termination of the communication. So if you are complying with the law and you have anone muss data sets, is there a problem using that data set to understand the movement of traffic?
>> DAVID GROSS: Pat as always is provocative.
We will do another couple of questions together because we are starting to run out of time.
>> PARTICIPANT: Hi, Lynette Taylor, Oxford institute. This is a good and substantial discussion. Thank you all. I wanted to ask a question about the configurations of research that are optimal for this kind of work in developing countries because 9 matter that Pat Walshe brought up, the problems of clean air, clean water, access to health care, these are excellent examples of things which don't have an engineering solution, where you need a political will and you need human rights and you need country level understanding, capacity, enforcement.
So how do we not treat big data as the solution, but as a symptom of the problem and as a guide to how to address it? Who else should get involved along the way and how do we stop it as being the data scientists would receive the data and the information. And the other questions back to GSMA. I think this is problematic because there's no advance in anonymization which has not been followed by an advance in the ability to hack, and to reidentify people.
So how should we conceptualize anonymization in terms of the data that we are thinking about here in terms of collective privacy that Rohan just mentioned which I think is hugely important. Thank you.
>> DAVID GROSS: Excellent. Over here.
>> PARTICIPANT: Thank you, John Dupre from northwestern University. We are hearing from Christian and Robert, and the previous commenter about data scientists, which really when you come down to it, we're talking about statisticians. We are in a panel about big data and if I could ask the room for a show of hands how much people in this room actually have a working knowledge of statistics?
That you actually know what a P value is, if that makes any sense to you. That's for the discussion of big data, that's, in some part essential for your understanding of the complexity of 9 problems involved. And I mean, at some level, you know, that's one of the capacities that you have to build among policymakers for any discussion of big data, you know, both the strengths and the weaknesses and what it can do and what it can't do, because if you don't know the underlying rules of statistics and how statistics work, you really ‑‑ you really don't know what you are talking about in some sense. You don't know the limits.
So I just wanted to throw that out at the panel, about the need for statistical literacy. Thank you.
>> DAVID GROSS: Joe, I know that you always know what you are talking about. So why don't you have a question?
>> PARTICIPANT: Well, I think when we have been talking about big data in most of the cases that we have been talking, about we have been talking about the idea of the more data you throw a question into, the better your answer may become which is whether that's true or not is a significant question, but we really haven't been talking about big data in terms of the potential for correlation. Because the whole concept is one of the best examples is a hospital in Canada with ‑‑ that specializes in premature birth they started instead of just realtime tracking of the information for the nurse, they started to capture the information. And they came up with a founding that was counterintuitive. 24 hours before a baby spikes a fever, it turns up all the baby's levels stabilize which would make one think that the baby is doing better. They don't know why the levels stabilized but they figured out that in 24 hours the baby would spike a fever that could be fatal. They started treating the baby at the time the levels stabilized. They decreased infant mortality and improved the patient outcomes. That's all based on a correlation and not causation. They don't know why it happened. They guess but they don't know. And so part of the issue that comes in as a challenge to privacy is the fact that privacy which is consent based requires a specific permission for a use of information.
But the problem is if you are looking at correlations of data, you can't for a specific permission. One of the things that the OECD is doing, if you have a previously collected data set that was authorized for use in one case, is there a pom use case that you could develop in which you talk about the controls and you talk about the access limations and you talk about the validity of the use but where you might be able to use that information for a purpose that was not identified at the time of the collection of the information. And so it reminds me, the OECD had a panel on big data and one of the closing comments was made by one of the French delegates who was a researcher and entrepreneur, and he reminded everyone that they had to be responsible for their use of big data. But they had equally had to be responsible for their failure to use big data. So both sides of that equation are important and we have to remember that we need to control and address the risk, but we don't ‑‑ we shouldn't also forgo the opportunity because those are both important concepts.
>> DAVID GROSS: Thank you. There was a question over here.
Thank you.
>> PARTICIPANT: Thank you very much. My question to the panel really relates to the question of trust. Because I have heard some terrific uses of big data and I certainly think it's a very valuable thing, but we're at a point where over the last couple of months, a lot of our trust has been very severely eroded. It relates to surveillance revelations and it relates to the collection of ‑‑ you know, use of my personal data for marketing purposes without my permission by large corporations. My actions over the last few months is I have moved my server out of the United States. I stopped using Google plus. I stopped using g mail and I turn off my browser. I am looking for an alternative to Chrome. And if I could find a way to keep in touch with family, I would get out of the Facebook. I'm changing my whole pattern of behavior on the Internet. What measures do you suggest would be good ones to restore trust? Thank you.
Are daze we now had a series of interesting statements and questions. We'll go for one more, but we are running out of time and I know we need to go to the phones as they like to say.
Yes, ma'am?
>> PARTICIPANT: Keitha Booth New Zealand. Is there there big data that governments hole that should be released for social gain? We have been focusing on the personal data and the obviously critical privacy implications. But is there big data that isn't out there that is public data but not used that should be?
>> DAVID GROSS: Very good.
Let me ask Robert or Christian who may be on the line if they have comments before we go to those here in the room.
>> ROBERT KIRKPATRICK: Robert here. I could very briefly note ‑‑ you know, our interest in analysis is not individuals because our interest is in policy action. As I mentioned, we don't care whether someone lost his job and had to sell the family cow. We are interested in knowing that there was a 300% increase in the chance to sell livestock in a particular district of a country at a suspiciously unusual name of the year. And we are looking at aggregate analysis of the information, and we are looking at a justifiable policy response. For that reason, we never receive personally identifiable information from any of the partners who receive data from us. It's never confidential at the time it's shared. That's a good starting point.
Pat was alluding to this. It's becoming clear that all the data that is produced by using digital services we produce it in unique ways because we use the services in unique ways which means anonization is probably impossible, particularly with individual record behavior, even if you don't have phone numbers and other information included.
So this notion of moving towards a risk‑based approach, I think makes sense. It's not clear what that would look like. But, you know, if you ask people will they be willing to share their data if you were dealing with PII, and therm a public good or a personal benefit to them, they would probably, in many cases be likely to do it.
We are talking about data that isn't even PII, but where there's some nonzero risk of reidentification and potential misuse, we need to be having a public discussion that recognizes that any scenario where is there a public good is still going to be a kind of tradeoff satisfying in addition and we need to move forwards having a public and policy framework that's educated about making those kind of decisions. We are not there yet.
>> CHRISTIAN REIMSBACH-KOUNATZE: I would like to add two points. I think one is the example that Patrick just highlighted, shows that in many cases you don't really need personal identifiable data in order to gain the benefits, and it's true that we tend to focus a lot on the really ‑‑ call it the microaspect of big data which is targeted to individuals. Yes, it's true. That's probably the type of data set that the marketing companies are using, because they want to have personalized advertisement. But in many cases when we are talking about the social benefits, one doesn't need to be at micro level. I think it's perfectly useful to work at the macro level. That is one point.
The other point is we also tend to think about big data as big personal data and it's also true that many of the data sets out there that can also bring a lot of benefits are not personal. I'm talking about weather data, business activities data. I'm talking about public sectors activities data that can also bring a lot of benefits. Obviously we tend to focus on the privacy aspect because it's the most challenging part. But we shouldn't restrict the discussion about big data to those aspects.
>> DAVID GROSS: Thank you very much. Alex, do you have any comments? And I should note that these are all also in the form of closing comments because we are already at our closing time.
>> ALEXANDRINE PIRLOT DE CORBION: Just to bounce back on some of the comments about consent, it's definitely something that's a problem because of the nature of big data. There's no way of finding out what are the sources some what are the origin of the data and identifying if. And that's linked to the person being unidentifiable. If you are using these sources that give data that can't be identified to a specific person, there's still an issue of consent.
The other one was the one just now about, again, identifyingable data for social good, it's the mere fact of correlating this data in one database, just the existence of this database in one form or another can impact privacy, without ‑‑ at that moment in time challenging privacy. So it's just thinking about the short and the long term of that.
>> DAVID GROSS: Thank you very much, Rohan.
>> ROHAN SAMARAJIVA: I think in my comments, I made ‑‑ I said that we cannot and we should not impose simplistic negotiations of this notion of big data. I could see that a mobile company would need your data to plan and manage its network.
It needs your purchasing behavior, your credit behavior, et cetera, et cetera in order to manage its financial aspects of your ‑‑ of its relations with its customers.
So if these concepts are imposed mechanically on the phone companies, what will happen is that they will be able to use it. They will be able to predict your behaviors because that will all be covered by conventional consent agreements. The only thing is that it will not be ‑‑ we will not be able to use it for traffic management, for figuring out how to develop our cities or to remove the constraints that affect our development. That is all that will happen because the marking aspects, the fine tuning, all that will happen inside the companies. Because that can be covered by generic ‑‑ generic informed and consent rules.
I think is like in the early days of the motor car, there were some regulators, some governments that asked people to walk in front of the motor car with a bill, I believe, because that was their concept of how that particular new technology should be regulated. So I think we are seeing a bit of a repeat of that and I think we need to be a little more imaginative than that.
And I would also say to our colleague, who talked about the knowledge of statistics, I think one of the things that I have understood in the big data field, is that you cannot have one person doing all the work. It is done by teams and we do have statisticians, data people, all kinds of people in our teams and we don't do this without that knowledge. Thank you.
>> DAVID GROSS: Pat?
>> PAT WALSHE: Thank you. Well, I think for me, it is about risks and managing risks and coming one a framework to ensure accountabilities are assigned across this distributed ecosystem. I think we do need new PII methodology and identifying the risk and what is an agile context. We don't have that. We need technology solutions and whether that's from a data scientists point of view, for example, and we have such in the room, is, you know, how do we have privacy, preserving and protective algorithms. We have people would want to find. This people are pushing at the door, and that's fantastic. You think codes of conduct.
And I think the other thing to take the point here that was mentioned earlier, yes, there are risks that emerge from the data being stored even in anonymized ways. So how do you establish clear legal frameworks that veg ‑‑ by law, exactly. It's set in law. It's transparent. It's justified and it's proportionate. How do you do that in the global context? That's another challenge.
>> DAVID GROSS: Well, the time has left us, as these things often do, when they are so interesting. I think it's actually this is a remarkably easy panel and session to summarize. It's confusing.
(Laughter).
It's constantly changing. And it's incredibly important. With that, I think we should thank our extraordinary panelists, both here and remotely. Thank you very much.
(Applause).
(end of session 16:05)