2015 11 11 WS 259 An Observatory of Web Accessibility - the case of PortugalWorkshop Room 5 FINISHED

The following are the outputs of the real-time captioning taken during the Tenth Annual Meeting of the Internet Governance Forum (IGF) in João Pessoa, Brazil, from 10 to 13 November 2015. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the event, but should not be treated as an authoritative record.

***

>> GEORGE FERNANDEZ: Hello. Hello, good morning, everyone. We will start today's session; it's about 30 minutes. So we have ten minutes past 9:00. I suppose my presentation will be maybe 20‑25 minutes. I will give you then ten minutes to ask questions and doubts, I hope, because I speak a lot, because if I speak also in your ten minutes, sorry.

My name is George Fernandez. I came from the Ministry of Education and Science from Portugal. More precisely, in a public institute that is from the (speaking foreign language) where we have a unit about accessibility to digital information. Since 1999 that we have this unit in public administration in Portugal, and I think you will see why we have it.

I'm here to present you some information about our observatory of web accessibility and how it works, what kind of algorithm we use and some of the data that we have.

So Portugal was the first member states of European Union to have ‑‑ to adopt the web content accessibility guidelines. I think you know or maybe not, the final recommendation of W3C dated from 5 May of 1999 and Portugal adopted this recommendation in 26 of August of 1999. It was an interesting movement because it was a civil movement that starts with a petition. The first electronic petition to the Portuguese parliament in 3 December of 1998. 3 December is the day of international people with disabilities. So the movement starts in 1998 and the government responds with a law in 1999.

In our days we are using already diversion two since 2013, diversion two of web content accessibility guidelines. And at the moment we have also in Europe a strong movement that is working in a directive of web accessibility. This is the present future of the web accessibility in Europe with this directive.

If we take a look in some data that we have of our observatory since 2000 until 2011 we saw in this picture we have two lines. The blue one represents the conformity with level A that is the minimum of level of web content accessibility and the conformity of the home page of the websites of the public administration, the Portuguese public administration.

You see another line, the red one, I think. Let me check. No. The red one is the conformity and the blue one is the symbol of web accessibility, and the presence of the symbol of web accessibility in the home page.

And like you see in 2011 we have 75% of web pages that are conformed with level A and 89% of these pages have symbol of web accessibility on the home page.

You can see in this graphic that in 2006 and until 2008 we have big improvement. I don't have time to explain why. But you can see that was a big improvement in this kind of conformity levels, strong political that this is possible.

This is only the first, the home page. And you could say only the first page is an easier task. But if we analyzed more pages and we have this information collected by the ministries in the center, we have 15 ministries, and we have on the left the data collected of 2006 and on the right the data collected of 2011. And we could see that we have a shape, a big shape on the right graphic. This is made with a score; we made the score to each ministry. From the practice of accessibilities we give a score from 1 to 10 that resumes all the practices, that's the good practices or the bad practices that the ministry is doing.

So we see when we have more pages also an improvement. This is internal information. But when we saw, for example, international benchmarking, like the last study of United Nations in 2011, we saw that Portugal is also the second between 192 countries. So these correspond to our information that we have internally. And this is a good thing for us, of course.

To collect this information ‑‑ I'm sorry, either in the studies this is the scenario until 2011 and this is my best conclusion of the data that we have collected in 2015. So this year. And this year we see that the picture is more badly that we have in 2011. Our best statement is only 2% of the Portuguese public administration size, 50% of the pages past the Access Monitor text of level eight. So I need not all the pages, I need to shrink to 50% of the pages. And either the best 50% of the pages only past 2% of them in the text so level A. So the picture it is really concerned in our days.

We are about ‑‑ we are speaking about in this case more than 12,000 pages analyzed in public administration with a sample of 200 websites. To collect this information we have two ways of doing that. We could do it automatically or we could do it manually. In this case we are using automatically, we are using an algorithm and this is the algorithm that I would like to show you how to work.

It ranks the websites all the pages in a score of 1 to 10, 10 represents a good practice. In this case, 5.4 is the average of the public administrations in 2015. So if you have 5.4 in a text, of course that you think, well, I need to study a little more to have a greater level, a greater score. Like you see, is middle score in this case of practices.

I have here some particularities of our algorithm. One of them is we don't use the levels of W3C the level A, AA and AAA to made wide of the texts. We are using in this case the concept of personas. So people with limitations, limitations to see, limitations to hear, limitations to control the hyperlinks also problems with hedging (phonetic). And this classification from the international classification of functionality of ‑‑ oh, and, yes, from the health, sorry, from the World Health Organization, this classification is from them.

It's a particularity of our tool when we compare it with other tools on the market. So we are ‑‑ we have the picture of practices not properly errors so we see what is a good practice we found what is a bad practice, and what is a good practice. And some practices that need to have a (?). We see here some colors to do that. The red one is the bad statements. The yellow one that is statements that need manual checking. And the green one is in principal good things that we found. So we have excellent, good, regular, bad, and very bad practices. We have organized the information like that in each page.

So I will now show you a demo of our algorithm of validation of these access points that you can use to make your report evaluation of a web page. So to do that we enter in our websites that you found, this is the site of our unit. And if you go a little ‑‑ scroll down, you found our validator. And to do that it is simple to obtain a report. For example, the URL of our ‑‑ of this event, and we click review. We wait a little bit. So we have already report. This report was made by three servers at the same time. Two of them belongs to the W3C. That is the servers that analyzed HTML and analyzed the styles of the pages, colors, the type of letters and the things like that, describing style shapes is the technical name.

So we have a number in this case of this page, 6.1, and they tell me that we have six errors, four of them of the level A. So when we scroll a little down our page, we saw the report organized by elements. When we have a page you can put text, you can put images, you can put forms. And all of these elements need to be followed ‑‑ need to follow some guidelines.

For example, images need alternative texts. We saw that we have one error. We click and we found in the home page of the event five images. If we saw the image without text, so we go to see what is. And we have ‑‑ could you read me what you saw in this image?

>> (Speaker off microphone).

>> GEORGE FERNANDEZ: Right. And you saw this is a snippet of the code that we don't have final text alternative in this image. So, for example, technology like screen readers when passed on the image they don't see anything, they only see empty space. If you ask Google to index this, they don't see also this kind of picture so they don't index this information.

If we take a look, because automatic evaluations sometimes have false positives and false negatives. So we know that we have five images, one of them don't have alternative texts. But we know that some of them have alternative texts. So when we take a picture, for example, we take a look in this image that represents the name of the event like you see in the picture is IGF 2015. And the alternative text we found that we put the alternative text here like you see it is empty. So the technology will ‑‑ it is impossible to index this information, these images with some meaning, with some semantic to be useful to the users. So this is one of the texts. And I only want to show you another one that is the links, and to show you this.

So this page has 77 links. And Access Monitor found that... let me see. This is the first link. Second. Sorry. I think it's not that one. It's that one, sorry.

77 links. And two links have only an image without a caption to images. Let me see...

I'm trying to find where it is the button. Yeah. It's this over here. We have two links that is composed by an image and that image is an empty caption; so it's like a link with a blank text. You can see this is an image that is in a link. This is the most problematic ‑‑ more concerned and more dangerous how to say ‑‑ the more problematic accessibility problem is a link that has only an image and the image don't have an alternative text. So the user never knows when saw that link where to click to go to some pages because it's like empty links we have on it.

This is two examples. You have more in our report; for example, the case of the headings. In this case we have 14 elements by heading. And you can see all the reports and do this exercise with your home page. Okay.

So it's completely free, the use of Access Monitor. You can use it to see to analyze the information. This is only in Portuguese the tool. And with this algorithm we are using it also to make an observatory internally with a lot of information about not only a web page but sectors, ministries. For example, I have here the collection of the errors that we found in the more 2,000 pages that we are analyzing in the public administration. And we have, for example, things like errors of HTML is the most error present in the sample. You need to clean all the errors of HTML to have a good page.

We have problems, for example, with images like with an alternative text empty. We have links in this case with the unique element is an image. And we have forms that aren't connected with the fields of the labels ‑‑ of the form. So things like name, address, you don't connect the name with the fields. And you don't connect the address with the respective films.

So we are organizing our observatory with this algorithm, we collect information and organize it, for example, with tags that in our case in this case is the various ministries we have. These work like tags. We aggregate the information by tags. And we have a synthesis of the errors that we have according to the levels of the W3C.

If we go more in depth in this directory, we have organized also the directors. You can see the ranking of websites, for example, we see in the ministry of science and technology or education and science that our organization is second one here listed.

And we have then by organization. In this case the data from the (?) that we have a short summary here of information, the number of pages, and also the ranking, the score we have it. Global statistics in these links, we have global statistics in the website and a list of pages also organized by level of practices found by levels.

And then when we enter in the page to see where we have the errors, so we have organized also the information by good and bad practices like I show you before. So then you could go to the codes to correct the information.

So one last thing that is about the evaluation and the way we are doing the evaluation. Like you see almost all benchmarking international are collected with an automatic tool. And we are thinking to introduce a checklist to introduce manual checking to pick or do some check of the information, and merge this information with the automatic one. So our reports in the new future will be automatic and manually collected.

These ten cases that we have here are based on the functionality of the accessibility. That means what is the ten most concerns of the users of the web accessibility? They don't see properly images, forms, data tables and so on. They see elements like, for example, the main menu. You can use the main menu of your slide with and without mouse using only the keyword. It is possible you navigate in the options and sub options. So you see it's more elements according to response to the needs of the users. We are at the moment working on that and using our tool to make something like a search engine of, for example, the experts say in this page the main menu starts here and ends here. And we could ask to our tool to find this element in the samples, in the page samples we are collecting.

So we also will learn the tool, will pitch the tool how to do the observation. And that's it. So I don't know if you have questions, doubts. Maybe I think we have five minutes to do it. I don't know if we have remote participation. We have some questions of ‑‑

>> Thank you for asking. We have a question from Reinaldo. He asks I've heard about the Access Monitor on some websites. I also perform validation manually in some website pages, something like W3C validator. How it works?

>> GEORGE FERNANDEZ: Sorry, you can repeat?

>> No problem. The question is I've heard about Access Monitor in some websites that has to perform validation automatically in some websites and pages, something like W3C validator. How it works?

>> GEORGE FERNANDEZ: Well, our tool collects some information also from the HTML validator of W3C. We send them the URL to be analyzed and we collect the number of errors of HTML and with these errors we give them a score to classify the dimension of the problem.

If you have one error of HTML, you have a score; or if you have ten errors of 100 errors of HTML, the score is lower. We are collecting these with one URL so we put one URL in our algorithm. This page needs to be available on the web. So with this version that we have online is not possible for example analyzed pages that have a log‑in, for example. It’s not possible to do that. But we have at the moment tools that we use internally to evaluate pages that have also log‑in so we can put a log‑in and then we can analyze the page and collect the information.

So the procedures are very similar of what we have today with HTML validator of W3C.

Any more questions?

>> AUDIENCE: Okay. Thank you for your presentation. It's quite interesting. I work for the W3C Brazil. And we also do some kind of evaluation and assessment of web pages and accessibility.

I'm curious about how do you establish those educators about web accessibility and government pages in Portugal? Because you have ‑‑ we have some problems to define educators for web accessibility pages because it's very difficult to define the universe of web page because not all of web pages are linked to each other. So we don't know the size of the universe so if you want to establish something statistical or proportion or whatever you want to establish to define, it's very hard because there are a lot of web pages not connect to each other. So if you go through the links within the websites, sometimes it's impossible to collect all pages. And also if you just collect some sample, we realize that the sample is not a sample of the universe. So how do you define this, how do you establish the methodology for that?

>> GEORGE FERNANDEZ: Thank you for the question. It's a good one, the question of how to collect a sample. And in these ten years of collecting samples we do it already with a lot of ways. We try to put a color to the website and take, for example, 10,000 pages of a website and then put it on the ‑‑ on our evaluator and do the tests. And what we realized is in these 10,000 URLs we found some pages that is not a page completely page of website. It's a little bit of page.

So if we collect 10,000 pages of a website we don't have the sense of what kind of pages we have in this so huge sample. It's impossible to control it. And I think when we are using the automatic evaluator, it is very, very important you have the real sense of the sample that you are working on. So we try another kind of strategy that was ‑‑ well, Google is doing all this indexing information. So why not asking to Google to give me the 100 pages, the more 100 pages with the best page rank so it's the first results that Google shows us. And we make a script to collect the first 100 pages of Google.

Either with this kind of script we have the same problem. Sometimes Google indexing not a complete page but a snippet only, something that indexes. And when we put in our validator, we have already the same problem. So we have not a page but a little page that does a bad job in our tools.

So another way that we are collecting and that is the one that we are established now to control the sample is see the first page, so the first page part of the sample. And then we collect all the pages that belong to the main that are connected to the first one.

So our samples is to all websites have home page and all the pages connected to the first. Only the first level. And we know that this is only a sample, not representative or could be not representative of all websites. But also this tool is to the owner of the website to be the first picture and start to correct the information. And then we have another kind of tool that when we enter new pages, the new pages ‑‑ the owner of the site enters new pages on the validator and this page is growing up the sample we have collected from the website.

So we have a starting point that is home page and all the pages that are connected to the first one; and then people having some information either ‑‑ even the new ones, the new pages, that permit ‑‑ that allow them to make a job of monitoring the website. All the new contents you could monitor with Access Monitor. And this information is collected and is organized in these directories. And these kind have directories, the owner of the site could access them to manage to see the picture if either they have a lot of people enter information in the website they could control all the editors.

So I don't know if I ‑‑ yeah. Just we have an element of control that is very, very important. This is the element ‑‑ the number of elements of HTML that are in the analysis of the page. In this case we have 352 elements of HTML. Our experience shows that when we have more than 10 ‑‑ more than 100 elements of HTML the sample is robust. So what we have it is in principal a good sample of elements.

So no more questions, doubts and remote participation also? Okay. Thank you, very much, for your participation. And I hope you enjoy and produce more pages accessible and follow the guidelines and use our tool. Thank you.

(Applause)

(Session concluded at 9:52)