USER UNCERTAINTIES WITH TABULAR STATISTICAL DATA: IDENTIFICATION AND RESOLUTION

 

Carol A. Hert and Naybell Hernández

Syracuse University

 

 October 1, 2001

Final Report for Purchase Order #B9J03235

 

1.         EXECUTIVE SUMARY

1.1. Study Objectives

United States government services are increasingly becoming Web-based, creating opportunities to make potentially useful, even vital, information and services more easily accessible to the citizens than in the past. This opportunity has challenged Federal agencies as they work to provide information and services that are easy to use and understandable to an extremely diverse constituency. Federal mandates requiring agencies to provide "universally usable" information and services have added further impetus to resolving the challenges.

The statistical agencies have been addressing these issues via a variety of strategies and approaches. The FedStats website and its related planning and research development activities has been one venue. National Science Foundation funding for projects associated with statistical digital government has been another. The work reported on here was conducted in conjunction with an NSF-funded project investigating statistical information in tabular format.

Enabling universal access and usability of statistical tables can be modeled as a process in which a user with an information need comes to a system in order to locate and then use a table or tables of interest. The NSF project developed an integrated approach to that process. Several specific technologies were developed to support this process, each of which was designed to incorporate a rich understanding of user behavior that the project has developed. The specific piece funded under this BLS purchase order concerned user understanding of tables and the extent to which metadata could be used to support enhanced understanding.

Specifically, this project addressed the following questions:

·         What questions and uncertainties do users have when investigating the statistical tables used in the NSF project?

·         What are the answers to these questions?

·         To what extent is metadata available to answer the questions?

·         How do the questions, question types, and answers map to the XML DTD developed by the NSF project to support the Table Browser?

As a result of the investigation of these questions, a number of issues related to metadata creation and use were identified and a set of recommendations developed.

1.2. Findings and Recommendations

Users had a variety of uncertainties when investigating tables. The majority of these related to definitions of terms, categories of variables, etc. A second important class of uncertainties was that concerning rationales for why certain things were done, reported in certain ways, etc. Other uncertainties related to the structure of the tables and lack of information on various aspects of the tables. Users also provided a wide variety of suggestions and complaints about the tables.

Answers were found for all user uncertainties by searching relevant documentation and asking experts. Questions concerning rationales were difficult to resolve through existing documentation while other answers were found in the documentation.

The uncertainty categorization scheme developed in the project can serve to categorize questions in future studies in which the goal is to map to metadata sources and specify tool implementations.

Perhaps one of the important implications of this study for metadata design will be the provision of some notions of how to translate users’ uncertainties into metadata and metadata into functionality features of an information system. In order to scale the results of this project, it will be necessary to understand the processes by which a user uncertainty can be mapped to a potential answer and then potentially presented via the interface tools. A number of issues related to both the uncertainties and currently available metadata were identified. These include the potential uniqueness of answers needed to respond to user uncertainties, the specificity of answers provided, and the lack of easily retrieved information from documentation (due to lack of encoding within documentation).

One of the obstacles experienced during the project was the incomplete development of existing DTD’s and their lack of compatibility with project needs and this project suggests approaches to further development work.

This work might be furthered with the following additional research:

·         Expand the identification and coding of user uncertainties to additional tables in order to further validate the coding scheme, potentially begin to determine relative frequencies of uncertainty types.

·         Test the extent to which the Table Browser or other tools that incorporate relevant metadata are able to resolve user uncertainties.

·         Conduct document analyses to determine the effort involved in resolving user uncertainties with existing documentation.

The most obvious area in which further development of applications exists is in the area of metadata encoding and DTD development. As the statistical community continues to disseminate its information electronically, it will become ever more critical for the metadata behind the data to be easily available for users and applications. The most logical approach is would be to encode it in structured and standardized formats. Some metadata already exists in this form (such as data dictionaries) but technical documentation does not. XML has also become the standard of choice for encoding information. Thus the following recommendations for development and for agency action seem relevant:

·         Continue efforts to develop metadata standards.

·         Build relevant XML DTD’s for agency information.

·         Investigate mechanisms for ensuring compatibility of DTD’s across document types and agencies.

 

 2.        PROJECT OVERVIEW

United States government services are increasingly becoming Web-based, creating opportunities to make potentially useful, even vital, information and services more easily accessible to the citizens than in the past. This opportunity has challenged Federal agencies as they work to provide information and services that are easy to use and understandable to an extremely diverse constituency. Federal mandates requiring agencies to provide "universally usable" information and services have added further impetus to resolving the challenges.

The statistical agencies have been addressing these issues via a variety of strategies and approaches. The FedStats website and its related planning and research development activities has been one venue. National Science Foundation funding for projects associated with statistical digital government has been another. The work reported on here was conducted in conjunction with an NSF-funded project investigating statistical information in tabular format.

Enabling universal access and usability of statistical tables can be modeled as a process in which a user with an information need comes to a system in order to locate and then use a table or tables of interest. The NSF project developed an integrated approach to that process. Several specific technologies were developed to support this process, each of which was designed to incorporate a rich understanding of user behavior that the project has developed. Figure 1 represented the larger project. The specific piece funded under this BLS purchase order concerned user understanding of tables and the extent to which metadata could be used to support enhanced understanding. In Figure 1, the work of this project is contained within the component at the far right, entitled the Table Browser.

FIGURE 2.1: INTEGRATION

Specifically, this project addressed the following questions:

·         What questions and uncertainties do users have when investigating the statistical tables used in the NSF project?

·         What are the answers to these questions?

·         To what extent is metadata available to answer the questions?

·         How do the questions, question types, and answers map to the XML DTD developed by the NSF project to support the Table Browser?

As a result of the investigation of these questions, a number of issues related to metadata creation and use were identified and a set of recommendations developed.

2.1.  Universal Usability And The Role Of User Understanding

Shneiderman (2000, p. 85-6) has framed the universal usability challenge as having three components: 1) the need to support a diverse technology base, 2) the need to provide access to diverse users with diverse skills and tasks, and 3) the need to bridge user knowledge gaps. It is the second and third aspects of universal usability that are the focus of this project, as appropriate technological solutions rest, to some extent, on the characteristics of users and their needs.

The world of Federal statistical information is a challenging one for most users who must navigate a labyrinth of agencies (over 70 at the Federal level), interpret very distilled information (numbers, often presented in formats such as tables that are difficult to use, and, who, to use the information appropriately, may need to understand very specific details of the data collection and analysis that generated the numbers. Statistics and statistical information are not easy to use for the layperson. Most of us are not taught in school how to read or work with statistics, resulting in low statistical literacy for the general population (Moore, 1997). Statistics are often highly distilled (as a specific statistic, a table or a time-series of statistics), have been produced through complex statistical and mathematical procedures (such as sampling design, weighting), and utilize specific and sometimes arcane definitions of concepts and variables (with associated jargon). All these represent potential sources of misunderstanding and barriers to use.

The work reported here is focused on tables. There are several rationales for this focus. Although there is a substantial effort given to graphical representations of data (e.g., Carr, 1998; Wainer, 1997; Wilkinson, 1999), tabular display treatments are treated minimally at best (e.g., Hall, 1943; Walker and Dorost, 1936). Tables are a common conceptual and presentational structure by which statistical data are stored and represented. Data in tabular form are often the starting point for additional depictions (such as graphics or analytical reports) and contextualize specific numbers. Tables are, however, difficult to find, interpret and use. Most commercial search engines do not index the contents of tables nor can they retrieve that information and often do not even identify the existence of tables within a text. Once a table is found, users face succinct labels and highly distilled numbers and may wish to perform comparisons and calculations that are difficult in static tables. The ubiquity of tables along with the associated challenges suggest that research into improvement of table retrieval, interpretation, and use has the potential to significantly improve access to data produced by statistical agencies.

Providing universal access to tables is both a technical as well as a challenge concerning user understanding and modeling of that understanding. In this project, we first identified user questions then investigated how to model them as metadata, specifically in terms of metadata elements available within the NSF’s project’s DTD.

2.2. The Role Of Metadata In Location And Understanding

Metadata is an often ambiguous and nebulous term and is used variously in different communities. Dempsey and Heery (1998) define metadata as information that enables one to manage and use the data/information to which they refer. This definition highlights two key points, that metadata are defined within a context (there is no one set of metadata associated with a set of data), and that they are information that supports usage. Some of the purposes which metadata may support are information/resource discovery, administrative uses such as tracking terms and conditions of use, the context of creation, and unique identification of objects (see Bearman (1996) for a discussion).

Within the statistical domain, metadata may include subject heading schemes to support resource discovery (such as the list of headings employed by the American Statistical Index (published by the Congressional Information Service) and HASSAT (from the University of Essex), codebook information, survey instruments and related documentation, as well as reports and other documentation produced by survey methodologists about data collection strategies, analysis of past survey efforts, etc. (Dippo and Gilman, 1999).

2.2.1 Past empirical work on metadata use

The study of user interaction with metadata is not completely unknown. Within the traditional library and information science domain, there is a thread of research most commonly known as relevance judgment research that investigates how users make judgments on the relevance (variously defined and operationalized) or potential relevance of information units. Traditionally those information units have been articles and books, and users examine representations of those units (such as citations, which represent the metadata in this case) and indicate those they consider relevant or non-relevant. Users are asked about the criteria they are using in the judgments and how they make those judgments. The intent of this line of work has been to understand the phenomenon of relevance judgment, provide typologies of relevance criteria, and in some cases to suggest enhancements to the representations of the information units (See for example, Park (1993) and Barry (1994).) For example, if users indicate that having information on the chapter titles in a book is helpful, it may be suggested that such information be added to the description of the book.

The vast majority of work of this type has looked at books (using information on records in online library catalogs) or articles (using periodical databases with or without abstracts). Users may be asked to examine different representations of the same item such as a citation, a citation with an abstract, or the item itself. Only recently have other types of information entities such as maps (Gluck, 1996) and meteorological data (Schamber, 1991) been considered.

In the domain of statistical information seeking, the author and Bosley (as reported in Hert, 1999) have been investigating how experts and other users employ metadata within codebooks (in this case, from the Current Population Survey) as they choose variables for analysis. He and Gey (1996) allude to the value of the codebook data in choosing variables in a paper that discusses a system that might facilitate browsing of such data.

In general, these studies have worked from existing metadata associated with information entities back to user behavior with that metadata. Such an approach limits our ability to see what metadata might actually resolve user uncertainties since we have not begun with those uncertainties. Thus in this project we began with identifying these uncertainties then moved on to the potential of metadata to resolve them.

2.3.  Metadata and XML

To make metadata accessible in an automated environment, it needs to be represented and encoded so that software can identify appropriate metadata components and retrieve them. In the last several years, there have been a variety of efforts to encode statistical metadata. The International Organization of Standardization (ISO) has developed a standard, ISO/IEC 11179. The Inter-University Consortium for Political and Social Research’s (ICPSR) has a program entitled the Data Documentation Initiative (DDI) and an UN/ECE Work Session on Statistical Metadata (see for example: http://www.unece.org/stats/documents/2000.11.metis.htm) has been actively engaged in discussions.

For this project, the DDI encoding was used to encode tables and metadata. This choice was made because the DDI has a specific encoding designed to encode tables and project personnel had expertise with this encoding. However, the DDI encoding was not fully compatible with project needs and some modification was done. Details on the encoding of tables and metadata using the DTD can be found in Marchionini and Mu (2001).

 

3.     METHODOLOGY

3.1.  Investigating User Uncertainties

3.1.1 Overview

The investigation of user uncertainties involved several different activities. First, a set of respondents interacted with specific tables. The research team then mined transcripts of their sessions for uncertainties, questions, complaints, and suggestions. Answers to all questions were found by the research team. Questions and complaints were categorized.

Eleven people participated in the study. Each participant viewed a total of three tables in a mix of electronic and paper formats. After an initial unstructured period in which each participant was instructed to examine the tables, the researchers asked a series of questions about the participant’s understanding of each table. Demographic information on each respondent was gathered via a self-administered questionnaire at the beginning of the interview.

The team created records of each participant’s comments, responses to interview questions and other data (such as which component of a table was the focus of the comments). Analysts reviewed the records, and extracted uncertainties, suggestions, and complaints. The team coded the resultant lists using the schemes described below.

The team also searched for answers to the specific individual questions (rather than for the derived categories of questions). Answers were sought within the actual table and accompanying text (e.g., footnotes), related documentation (in both electronic and paper format), and in some cases, by consulting experts within the agencies that produced the tables.

3.1.2 Data collection

3.1.2.1 Table Selection

For the study, the team selected four tables from a set of tables nominated by agency partners in the project (tables available in Appendix 1). The four selected differed in their content, complexity, size, and formatting styles. The intent was to provide sufficient variety while still assuring that the researchers could provide the tables in multiple formats as well as be able to find answers to user questions. While this has implications for the generalizability of the results, consensus on what constitutes important differences in table format is generally lacking even among experts on statistical presentation. Additionally, it was important to show users real instances of tables, rather than artificial constructions, in order to identify actual questions.

All participants reviewed a set of three assigned tables about which they would answer questions. Some of these tables were presented in paper format, others in electronic format according to a researcher pre-defined set of combinations of the four tables and the two formats. All combinations had at least one example in each of the two formats (paper and electronic) to account for any difference that might occur when using different presentation media. One table was only available in electronic format.

3.1.2.2 Selection of Participants

Study participants were solicited through calls for participation posted in the university library’s government documents section. The researchers assumed that visitors to this section of the library would be more likely to be interested in and potentially knowledgeable about government information and statistics. Potential participants were screened for previous use of government statistical data. The study had a total of eleven participants (three males and eight females). Characteristics of the participants are indicated in Table 3.1. Each person was paid 25 dollars upon completion of participation.

TABLE 3.1. Demographic characteristics (N=11)

Characteristics

Measurement

# Participants

 

Level of Education

  1. High School
  2. College
  3. Graduate
  4. Post-Master
  5. Ph.D.
  6. Did not answer

0

9

1

0

0

1

Gender

F-     Female

      M - Male

3

8

Computer Uses

01- Email

02- Word Processing

03- Web surfing

04- Games

05- Database mgmt.

06- Multimedia

11

11

11

7

4

5

Web Searching Experience

 

Novice (1) – Expert (10)

1 - 4

5

6

7

8

9

10

0

3

1

2

2

3

0

Frequency of Table Use

on the Web

  1. Never
  2. Occasionally
  3. Monthly
  4. Weekly
  5. Daily

2

8

1

0

0

Most of the participants were undergraduate students at Syracuse University and all of them reported using computers on a daily basis and to be highly experienced web searchers. One participant also reported to have been exposed to statistics and to have used tables from government websites at least occasionally.

Potential participants were recruited throughout the data collection period until the researchers determined that theoretical saturation on the uncertainties was achieved for each table (no matter in what format the table was presented). Theoretical saturation is reached, among other things, when no more relevant data seems to emerge regarding a category or variable (Glaser and Strauss, 1967). In this case, interviewing stopped when no new uncertainties were elicited for a given table. The eleven interviews is a reasonable sample; as pointed out by Schamber (2000), as few as ten interviews can be expected to provide representative results when eliciting cognitive perceptions purely for exploratory purposes.

3.1.2.3 The Interview

Once a preliminary questionnaire was developed, we started the pretest process with a total of six respondents (with their data not included in analysis). After each interview the questionnaire was revised. The final questionnaire consisted of two sections. The first session contained eleven demographic questions that asked participants about their frequency of computer use, web searching experience, statistical background, statistical packages used, as well as frequency of use of some specific statistical tables. Factual questions allowed the researchers to verify the appropriateness of each participant for the study as well as to collect data to classify each of them based on background information. The second set of questions (the loop section) contained twelve general questions intended to elicit what questions/uncertainties participants faced during exploration of government statistical tables. The underlying purpose of these questions was to determine what kind of metadata and its content would need to be accessible during table usage so that users of these types of tables could better understand the meaning and significance of the data presented. Appendix 2 presents the interview guide.

The choice of these particular questions was supported by defined characteristics of good tables such as the ones described by (UN/ECE, 1992); by the standards on the sources, methods and procedures of statistics as defined in Walker and Durost (1936); as well as by researchers’ own evaluations of each of the tables to be used in the investigation. Some of these standards point out the need for titles to be constructed as an aid to the reader in understanding the facts, for the source of the data to be indicated, as well as for the indication of unit of measure used and the methods used to compute the data. It is based on these and other standards that our specific questions emerged. Examples of such a questions include but are not limited to: ‘Does the title help you to understand the facts on the table?’, ‘Is there anything in the way the table, its rows or columns are organized that makes the table more difficult to understand?’, ‘Can you tell from the information in the table how any of the statistical measures were calculated? ’.

A member of the team conducted interviews in person at a time convenient to each participant, over a period of three weeks. All the interviews were limited to ninety minutes since during pre-test sessions this amount of time proved to be sufficient for coverage of three tables and not overly tiring.

The interviews were performed following the interview guide, but researchers exercised some flexibility in order to give the researchers more control of the situation. This control allowed the interviewer to clarify terms that were unclear for the participants and to probe for additional information (Frankfort-Nachmias & Nachmias, 1996). All the interviews were taped-recorded and the transcripts were utilized as the main source for the subsequent content analysis.

3.1.3 Data Analysis

Data analysis had two components. In one component, specific answers to each user question were found. These answers were then forwarded to the project’s system design team for inclusion in tools designed to support manipulation and usage of the tables. The second component was to categorize the questions in order to better understand user’s uncertainties and how they could be resolved.

3.1.3.1 Finding Answers to Questions

Table 3.2 lists all questions asked (by table), and the frequency of asking. Due to the length of the answers, the full table is presented as Appendix 3. Researchers searched for answers to each question in a variety of paper and online sources. They first examined the table itself for answers (e.g., the footnotes in a table), then examined associated technical documentation. For online tables, links present within the table were also followed. The researchers did not do general searches on the respective websites, as the assumption was that users of tables would not be likely to do so. If no answer was found, a member of the team contacted the tables’ experts from the government agencies that were working with the research team. These experts also confirmed the answers that had already been found.

 

TABLE 3.2. Questions Asked by Users and Their Frequency by Table and Table Format

Questions Asked

Table

Freq.

Paper

Elec.

What is the meaning of "seasonally adjusted"?

AAG

 

1

How is "unemployment rate" calculated?

AAG

 

1

What is "change in payroll employment?

AAG

 

1

Who is classified as "production, non-supervisory workers"?

AAG

 

1

In Note 4, why does 1982-84=100?

AAG

 

1

In Note 5, what is meant by "finished goods"?

AAG

 

1

In Note 5, why does 1982=100

AAG

 

1

In Note 6, why are the imports not seasonally adjusted?

AAG

 

1

Clarification of Note 7

AAG

 

1

Clarification of Note 8

AAG

 

1

Preliminary- when will the current data become available?

AAG

 

1

R- does this mean revised? If so when were they revised and how?

AAG

 

1

What is meant by "civilian labor force"?

AAG

 

3

What is the difference between "employed" and "unemployed"?

AAG

 

1

What are the definitions of the job categories?

AAG

 

3

Why is the Construction and Mining category not seasonally adjusted?

AAG

 

1

What is included in the Syracuse metropolitan area?

AAG

 

1

What is the difference between CPI-U and CPI-W?

AAG

 

1

Note 5- Who is a clerical worker?

AAG

 

1

Why is there a difference in the information given for different metropolitan areas CPI? i.e. for Syracuse there is annual % change, for LA Orange County there is the % change and the actual numbers, and in Arkansas there is no CPI data given.

AAG

 

1

Why doesn't the title say more specifically what the table is about?

AAG

 

8

Does that include the subway, infrastructure, trains, etc.?

AAG

 

1

Why is non-farm wage on the titles with the table and not listed with other jobs?

AAG

 

1

What do TXT and PDF mean?

AAG

 

1

What does T&PV mean?

AAG

 

1

I don't understand what the numbers are about. Do they mean people in the civilian labor force or something else?

AAG

 

1

Does employment include civilian and armed forces labor force?

AAG

 

1

What does non-farm mean?

AAG

 

4

What is T&P?

AAG

 

1

What does preliminary mean?

AAG

 

1

What do they mean by 12-month % change?

AAG

 

3

What is salary employment

AAG

 

1

How are employment and unemployment rates different?

AAG

 

1

What is the definition of services?

AAG

 

1

What do the dinosaurs do?

AAG

 

1

What do the different colors mean?

AAG

 

1

Why do all of the links change color, when I only click on one of them?

AAG

 

1

Why is the P on each number for October?

AAG

 

1

What am I supposed to find when I click this link to another page?

AAG

 

1

Why are the news releases first when I click this link?

AAG

 

1

Where did the get the information for these tables?

AAG

 

1

What are these numbers about?

AAG

 

1

Why can't I get the information directly when I click on this link?

AAG

 

1

What are the units?

AAG

 

1

What is meant by "enumerated population"?

T14

1

 

What is the "median"?

T14

1

 

In Note 1- why haven't specific group numbers been revised and how does that affect the totals vs. breakdown?

T14

1

 

Note 2 is confusing

T14

1

 

What is the difference between Note 1 and Note 2 and therefore what happened in 1980 vs. 1990?

T14

1

 

How is the population estimated for the in between years?

T14

1

 

Why is there a second breakdown of school-age children?

T14

2

 

What are the implications of suddenly switching to 10-year segments of the population after doing the rest in 5-year blocks? And what about 85 and over?

T14

1

 

"Excludes Armed Forces overseas" -why are they excluded and how long do they have to be overseas?

T14

2

 

What are the implications of calculating the numbers from April 1 in 1980 and 1990, and July 1 in the interim years?

T14

1

 

In the title it mentions population, but are they talking about US population? Why aren't they more specific?

T14

3

 

Residents of where?

T14

1

 

What are count resolution corrections?

T14

1

 

What are the texts on the side for?

T14

1

 

Percentage of what, the respondents?

T14

 

1

Why do they have the male and female breakdown for only 1980, 1990, and 1997 and not for the other years?

T14

1

 

I'm not sure if this means that these 3 years are based on the census and others are projected. The answer might be in the notes, but they are really difficult to understand.

T14

1

 

Why do they only have 1997 in bold?

T14

1

 

What are those 3 columns before the mean? Why did they group them together?

T14

 

1

Why were those places picked?

T14

 

1

Why are some things in purple and not others?

T14

 

1

They don't tell the total number of people who weren't surveyed and they should at least give a general idea.

T14

 

1

It doesn't give enough information about the area that the population is from.

T14

 

1

What is the point of the count? Did they double count?

T14

1

 

It is confusing. What do they mean by in thousands?

T14

1

 

What does death registration states mean?

L.E

5

 

What do they mean by whites? I am not sure what they include

L.E

1

 

Does this refer to people who are citizen or not?

L.E

1

 

Does black mean people who was born African-American or people who are black that live here.

L.E

1

 

Why is area in the column, I don’t understand that?

L.E

 

1

What does con mean?

L.E

 

1

What all others include or refer to?

L.E

 

4

Then it says total, is it the total of all other races?

L.E

 

2

What the --- are? Does this mean they don’t have data collected or what?

L.E

 

1

Where did they get the numbers

L.E

1

 

Who is classified as white?

L.E

 

1

Who is classified as black?

L.E

 

1

Do the data points represent more years to live?

L.E

 

1

How is this number calculated?

L.E

 

1

What about data for after age 85?

L.E

 

1

Is it possible to include a mouse over calculator to figure out the age of death?

L.E

 

1

The table is difficult to read due to a lot of data columns and rows. Gridlines might help.

L.E

 

1

How does this table relate to other years? I was 20 in 1996 and it says I should live for 60.4 more years, does this mean in subsequent year's tables I will always live to 80.4?

L.E

 

1

What areas mean? This is pretty vague

Gas

1

 

What PADD means?

Gas

11

 

What OPRG means? What is this abbreviation?

Gas

5

 

Why ozone-non-attainment is abbreviated RFG? What does that mean?

Gas

8

 

What the subcategories of PADDs 1, 1A, etc means?

Gas

8

 

Why are they comparing RFG areas with OPRG areas?

Gas

1

 

What originated areas are?

Gas

1

 

What are the different gasoline categories?

Gas

2

 

What attainment conventional areas or oxygenated or carbon monoxide areas are

Gas

 

5

What are carbon monoxide areas?

Gas

 

2

What are Oxygenated areas are?

Gas

 

3

Why do they choose the dates they close? What is the significance of those dates?

Gas

 

1

Why is there not consistent data between the regions? i.e. Why is there no OPRG in PADD 1C and no Oxygenated in the PADD 1's?

Gas

 

1

Is there a way that we can compare between rows and columns?

Gas

 

1

Is there a graphing capacity, to see more clearly the historical changes?

Gas

 

1

There are a lot of data points and the column headings get lost.

Gas

 

1

How is this data collected?

Gas

 

1

It is illegal in NJ to have self service gas stations, so how can these be the "self service prices per gallon" for the country?

Gas

 

1

I know that state gas tax can vary from state to state, how is that handled in this comparison between states?

Gas

 

1

TOTAL

69

101

                                T14         = No. 14. Resident Population, by Age and Sex: 1980 to 1997.

AAG       = Economy at a Glance NY (presented only in electronic format).

LE           = Table 5. Estimated average length of life in years, by race and sex: Death-registration states, 1900-28, and United States, 1929-96.

Gas         = Retail Gasoline: (Self Service Prices per Gallon, Including Taxes).

 

3.1.3.2. Categorizing Questions

Inductive open coding (Krippendorff, 1980; Strauss & Corbin, 1990) was used to develop the final coding scheme for questions/uncertainties. With this technique, researchers derive the topics or categories in the data by identifying key issues. Researchers used data from a previous study (Hert, unpublished) to start the process of inductive open coding. The data from the study provided a preliminary set of categories by which to categorize questions. Our set of categories was put together by identifying instances where users had a: 1) direct question about the data, 2) concern about the clarity and completeness of information supporting the table, and 3) confusion about the meaning of terms, data, formatting style of the table and other issues.

To assess the reliability of the coding schema, two researchers coded the list of users’ questions/uncertainties independently of each other and then calculated a simple level of percent agreement (total number of agreements between the coders divided by total number of possible agreements). In our study, the percentage of agreement reached 91%, which as suggested by (Krippendorff, 1980) is an acceptable level.

The resulting categories constructed from the data relate to the uncertainties encountered by users during exploration of statistical tables. These categories let the researchers to classify and cluster the users’ questions/uncertainties into different levels of metadata to be provided by the table browser.

3.1.3.3. Categorizing Complaints and Suggestions

In the process of identifying user’s questions from interview data, researchers realized that were other forms of statements which users used to express their uncertainties in the process of understanding the statistical tables presented to them. These other forms of statements expressed either a suggestion on how users thought it best to present information, some functionalities that could be added to the electronic forms, or where the supporting information should be located, among other issues. Some of these comments include "Maybe if the columns, rows and numbers were more spaced it would be easier to read","I would add to the title what area is the population from’, ‘I would change some of the colors that are difficult to read’, etc.

The other form of statements made by users expressed the same sort of concerns previously mentioned but in the form of a complaint or dislike. These different forms of expressions might be the result of the way in which researcher questions were presented but they all captured the issues that concerned users in the process of understanding the particular statistical tables shown to them during the investigation. Some of the complaints expressed by users include: ‘There are too many numbers and that is confusing’, ‘You would have to pick apart notes underneath to understand what they are saying’, ‘Table does not explain how things are calculated’, ‘The page have a lot of related stuff besides what they explain about the data’, etc.

The researchers chose to tabulate and code complaints and suggestions separately from the uncertainties/questions.

  1. MAPPING TO THE XML DTD

The intent behind gathering and categorizing user uncertainties was to use this knowledge to provide design specifications for the Table Browser developed in the larger NSF project. In specific, we were 1) trying to determine which user questions could be mapped to specific elements in the DDI DTD and 2) provide details on the specific answer that might display to a user with a question in the Table Browser, and 3) understand the issues in building such a mapping. For the larger project, the ultimate goal would be to automatically identify relevant metadata that resolved user questions, tag it appropriately within the DDI DTD and port it into the Table Browser.

The methodology employed to pursue these objectives is provided in Figure 3.1.

Figure 3.1. METHODOLOGY FOR METADATA MAPPING TO TABLE STRUCTURE

INPUT: TABLE EXAMPLES FROM AGENCIES

 

 

Results from this process were to feed into a table such as that depicted by Figure 4.2.

Figure 4.2 TABLE MERGING RESULTS FROM XML MARKUP AND USER QUESTIONS

Question Asked (From SU work)

SOURCE OF INFO

 

DDI (From UNC)

Content from Table (identify which item) From UNC

Content from source other than table (identified as specifically as possible) UNC and SU

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

As the teams began this process, difficulties were quickly discovered. Elements in the DDI DTD tended to be somewhat "structural" in nature. That is, they were able to encode information if it could be determined which structural element contained that information within a given table. Thus, a question such as, "What does the P mean in a given cell [in the At-a-Glance tables from BLS]" resulted an answer ("it indicates preliminary data’) where the answer could be easily mapped to the DDI element which represented a footnote within a table. Other questions that were easily mapped included definitions of terms, headers for tables, columns, rows, and cells, and units of measurement. Questions concerning rationales and questions of a more specific nature were extremely difficult to map from uncertainties to DTD.

Given these difficulties, the teams briefly considered building an entirely new DTD for the project that would incorporate elements from the ISO metadata standard along with aspects of the DDI. This proved to be beyond the capabilities of the project so a decision was map to move away from attempting the specific types of mapping above, begin to encode the tables in the project using the DDI DTD (as further development work on the Table Browser depending on having the tables encoded) and addressing issues associated with incorporating the user uncertainties separately. More detailed reports on the DTD and its use are available in Marchionini and Mu (2001) and other NSF-project reports (some of which are available directly from Gary Marchionini and some of which are on the project website: http://istweb.syr.edu/~tables).

Since most of the work on the encoding following these investigations was done by the University of North Carolina team, results will not reported in this document, though the discussion and recommendations section of the document includes further information.

4. RESEARCH FINDINGS

4.1.User Uncertanties

Table 3.2 provided the complete list of questions/uncertainties for each table. The questions with their answers are presented in Appendix 3. A total of 170 questions was identified. As stated earlier, the team continued collecting data until new questions were not being identified from the new participants in the study. While this was done in advance of the final coding scheme for questions, the redundancy of questions was assessed against preliminary versions of the scheme as well as the researchers’ sense that little new data were being provided by new respondents. Thus, some categories in the scheme do not have large frequencies, as the other categories were saturated and data collection stopped.

A review of the users’ questions/uncertainties in terms of the type of medium in which tables were presented (paper, electronic) shows that, in general, participants asked the same or similar questions regardless of the format in which the tables were presented to them.

4.1.1 Categorization of Uncertainties

Table 4.1 shows the result of the categorization of questions/uncertainties. The categorization scheme had four main categories: ‘Definitional needs’, ‘Rationale of Information’, ‘Table Structure’, ‘Lack of Information’, and a category to enclose other no so clear user’s statements called ‘User uncertainty is not clear‘. The category ‘Definitional needs’ is concerned with the users’ need for having key terms, data categories and other element definitions available for them. Another important category that emerged from the data was ‘Rationale of Information’. This category is meant to enclose user questioning as to why some things were computed or reported in a particular way in the tables studied. ‘Table Structure’, another mayor category includes all the issues referring to the layout organization of the table. The last of the 4 major categories called ‘Lack of Information on‘ refer to the user’s need for explanation of data collection procedures, sources of data, computational methods among other issues, that allow them to evaluate the credibility and reliability of the data presented on the tables.

Most of these categories contain a set of subcategories intended to cluster more precisely the different uncertainties expressed by the users that participated in the research experiment. The most frequent type of user uncertainty was ‘definitional needs’ (specifically definitions about the meaning of terms’) followed by uncertainties about the ‘rationale of the information’.

  

TABLE 4.1. Categories of Users’ Uncertainties during Statistical Tables Exploration


Categories

                 Subcategories

Definitions/Examples

Freq.

Definitional needs

Meaning of Terms.

Meaning of Data.

Meaning of Categories

Meaning of Abbreviations

Population Universe

Unit of measurement

Users ask about the meaning of something in the table.

98

What does seasonally adjusted means?

47

I’m not sure what the data cell refers to.

5

User is uncertain of what belongs to a particular category. Ex.

What does Non-Farm wage include?

18

What does T&P means?

17

User is uncertain of to what population/universe data can be generalized to. Ex. Is it the population of the US?

9

Is this in number of persons or number of jobs?

1

Rationale of information

User is uncertain of the reasons why something was done, reported, computed, etc in a particular way. Ex

Why the numbers are reported differently for NY & LA?

28

Table structure

Formating, layout, and components

Meaning of Labels

Organization of the links in webpage

The way the table is organized and formatted make user uncertain about the meaning of data.

24

I don’t understand why the numbers are in purple.

10

I don’t understand this label.

11

I wouldn't expect to find Press releases first when you click on the link

3

Lack of information on

Data collection procedures

Sources of data

Computation methods

Comparability/Relationship of Info.

Tool Functionality

Updates to information

User is uncertain of how data were collected, computed, etc.

17

How was data collected? What method were used?

2

From where was data collected?

2

How were rates computed?

4

What is the difference between CPI-U & overall CPI?

6

Can I make a graph right now?

2

When was the information updated?

1

User uncertainty is not clear

The user didn’t clearly explained his/her uncertainty.

4

 

TOTAL

170

The most common class of questions was Definitional Needs (98 questions, 58% of total questions) and, in specific, definitions of terms (47 questions, 28% of the total questions). Rationale of Information and Table Structure with 28 and 24 questions respectively were the next most frequent. Questions concerned with lack of information had 17 questions.

It is important to note that this data represents only questions/uncertainties that are expressed. Users may have other questions that the interview protocol used in this study was not able to elicit. We can assert that users have questions in these categories, but they may also have additional questions and the relative frequencies with which questions might be asked might also change. Steps were taken to ensure elicitation of all questions through the structure and pretesting of the instrument and through the continuation of data collection until redundancy was reached.

 

 4.2.  User Suggestions and Complaints

Another significant finding that resulted from the content analysis of the interviews to the users was a categorization of comments/suggestions and complaints. This categorization reflects users’ concerns about different aspects of the tables. Specifically, the ‘comments/suggestions’ reflect some of the actions that users think would increase table understanding. The ‘complaints’ on the other hand, show user dissatisfaction with several issues about the tables as shown in Table 4.2.

 

TABLE 4.2. Categories of Users’ Comments/Suggestions & Complaints during Statistical Tables Exploration

Categories

                 Subcategories

Definitions/Examples

Freq.

Comment/Suggestion about:

Adjusting table formatting/layout

Facilitating understanding

More specific labels

Changes in location of information

Added tool functionality

Additional/more specific information

Nice feature of the table

Irrelevant Information

User gives his/her opinion about issues that could improve table understanding.

150

Have the numbers in bold in a bigger print.

81

They should at least give a general idea.

11

It would help if they added "Total Resident Population by Age & Sex"

6

I would put the dates a little closer to the data.

13

It would help to be able to do splits of the table.

13

It would help to know what TXT and PDF are.

22

I think the history button is great because it tells you 10 years worth of information without putting too much information together in the table.

1

Calling them would be the last thing I would do.

3

Complaint about:

Excessive amount of information

Insufficient amount of information

Not specific information

User is not satisfied for different reasons.

32

There are a lot of data points.

12

Titles don’t say anything. Clarification of note 7.

14

I’d like X information to be spelled out.

2

User’ statement is not clear

The comment/suggestion or complaint stated by the user is not clear.

4

 

TOTAL

182

 

4.3. Answers to User Questions

The researchers found answers to all user questions (Appendix 3 reports all answers). They searched a variety of sources, online information within the table (such as footnotes), links on online tables, associated technical documentation (in both paper and electronic formats) and in some instances when no answers were found through searching, by asking table experts.

Table 4.3 contains an overview of some of the questions users asked during their exploration of statistical tables. An answer to each question is presented as well as an indication of where it was found either in the document provided with the table or as a response from an expert. It was necessary to consult experts for questions involving requests for rationales. Definitions and clarification of other terms were able to be resolved by information in documents.

  

 

TABLE 4.3. Overview of questions asked by subjects and answers where they were found.

Question

  • Answer

Location of

Answer

What is the meaning of "seasonally adjusted"?

  • Normal seasonal fluctuations are smoothed out by a statistical process

Document

How is "unemployment rate" calculated?

  • Persons are classified as unemployed if they do not have a job, have actively looked for work in the prior 4 weeks, and are currently available for work.

Document

Who is classified as "production, non-supervisory workers"?

  • Employees who are not owners or who are not primarily employed to direct, supervise, or plan the work of others. Production workers in mining & manufacturing, & construction workers in construct.

Document

In Note 4, why does 1982-84=100?

  • Most of the specific CPI indexes have a 1982-84 reference base. That is, BLS sets the average index level (representing the average price level)--for the 36-month period covering the years 1982, 1983, and 1984--equal to 100.

Document

Why is there a second breakdown of school-age children?

  • School age breakdown - for convenience. Those are popular aggregations. NOTE - this may clear up further questions - the national estimates are produced and available for single years of age, by sex, race and Hispanic origin. The figures that appear in the table are put there as space allows and in an attempt to please as many users as possible.

Expert

What are the implications of suddenly switching to 10-year segments of the population after doing the rest in 5-year blocks? And what about 85 and over?

  • NOTE - this may clear up further questions - the national estimates are produced and available for single years of age, by sex, race and Hispanic origin. The figures that appear in the table are put there as space allows and in an attempt to please as many users as possible.

Expert

How is this data collected?

  • We don't currently have a link on the site for that. We should for the new one but it is still missing from the test site. The data are collected using computer-assisted telephone interviews from a statistically selected sample of approximately 800 retail gasoline stations each week. The prices are collected every Monday morning and the data released by 5 p.m. every Monday night, except on government holidays the data are released on Tuesday (but still represent Monday's price).

Expert

It is illegal in NJ to have self service gas stations, so how can these be the "self service prices per gallon" for the country?

  • Yes, some states, NJ for one, do not allow self-serve. In those cases, the prices represent the only service of gasoline provided in that state. Our analysis has always shown, that this is not a big price effect in those states as compared to states allowing self-serve have higher prices for full-serve vs. self serve. I had even heard NJ monitors the impact of their law to help justify it to state resident's as not contributing to higher prices because it is required. I have nothing in writing on any of this though, it is all anecdotal. The industry doesn't make an issue of it nor do we. Some states have other laws such as refiners can't operate gas stations (MD for one), and we don't note them either as non-refiner state stations or anything.

Expert

What exactly do the job categories like transportation and public utilities entail?

  • Establishments reporting on the schedule (form BLS 790) are classified into industries based on their principal product or activity determined from information on annual sales volume. This industry classification, based on the 1987 Standard Industrial Classification Manual, is collected on a supplement to the quarterly unemployment insurance tax reports filed by each employer. For an establishment making more than one product, the entire employment is included under the industry of the principal product or activity. http://www.bls.gov/790faq2.htm#q6

Document

  

5. DISCUSSION AND RECOMMENDATIONS

5.1.  User Uncertainties

This study has demonstrated that users have a variety of questions, some of which have the potential to be easily resolved with available electronic documentation (see next section for discussion of issues). The preponderance of definitional questions have fairly easy resolutions, and in fact, definitions of variables, categories of variables, etc. are already well documented and considered within existing metadata systems. This makes answers easy to retrieve.

Some uncertainties are much more complex, however, in particular those relating to rationales. Answers to these questions seem to require a richer domain knowledge that might be difficult to retrieve. For example, a question such as "Why is there no OPRG in PADD 1C and no Oxygenated in the PADD1's?" related to a gasoline table could not be resolved with simple definitions. To answer it requires an additional source (A map in another document) and knowledge of how the gasoline formulations and their reporting is changing. (Armstrong, personal interview with Paula Weir, 12-15-00).

The categorization scheme developed in the project can serve to categorize questions in future studies in which the goal is to map to metadata sources and specify tool implementations. Table 5.1 provides a demonstration of this utility. In the NSF project, preliminary mappings were made and Marchionini and Mu (2001) can be consulted for those mappings.

Table 5.1 Potential Mappings between User Uncertainty Categories and Tool Designs

Uncertainty Category

Definition of Category

Possible Design Options

Definitional Needs

Users ask about the meaning of something in the table

Mouse-over (at appropriate point, the cell, the row, the column, etc.) with definition, links to technical documentation explaining concepts, variables, variable categories as necessary

Rationale of Information

User is uncertain of the reasons why something was done, reported, computed, etc. in a particular way

Link to online question form (to be submitted to expert) or interactive help

Table Structure

The way the table is organized and formatted makes user uncertain about meaning

Mouse-overs, possible "About the format of the table" help option, pull-down menu with available manipulation options displayed

Lack of information

User is uncertain how data were collected, computed, etc.

Links to technical documentation

User uncertainty is unclear

 

This might be resolved by a long description of the object of concern or by parsing the content for definitions and providing those definitions

 

5.2 Mapping User Uncertainties to Metadata

Perhaps one of the important implications of this study for metadata design will be the provision of some notions of how to translate users’ uncertainties into metadata and metadata into functionality features of an information system. In order to scale the results of this project, it will be necessary to understand the processes by which a user uncertainty can be mapped to a potential answer and then potentially presented via the interface tools. A number of issues related to both the uncertainties and currently available metadata were identified.

The first issue is the question of whether a user is provided with a somewhat generic answer to his or her question or one that specifically resolves the uncertainty. For example, one user had the question: Why are the imports not seasonally adjusted (from the BLS At-A-Glance tables)? There is a very specific answer to this question but more generically, this might be considered a question that concerns a definition and a user could be provided with the definition of seasonal adjustment and import. Thus if definitions of terms were coded as such in related documentation, it would be a straightforward process to retrieve it for a user once the user’s uncertainty had been categorized as such. However, it is clear that providing the definitions is only one component to assist user understanding. One might envision a set of tools that would analyze user questions perhaps in terms of facets of the question (e.g., a why question concerning the co-joining of two definitions) which might be further assessed in terms of a user’s history (e.g., level of statistical expertise) to provide an answer to the user which could then be modified via a feedback mechanism, and also stored for use in later, similar queries.

A second issue is that not all answers (generic or otherwise) are easily found. The team has found that answers may not be in electronic format at all (though they may be available in a paper document or in a human expert’s head), or buried within a large document (in one instance a document of 92 pages) thus making it difficult to retrieve. It might be helpful to develop companion XML DTD for documentation associated with tables and statistical data so that information relevant to user uncertainties can be quickly (and automatically) found in the documentation and ported into a tool such as a Table Browser.

A third issue identified is that some answers are consistent across tables, while others might only be relevant to one specific instance of a table. A question such as "Why is the 1998 statistic for urban unemployment so high in relationship to the other 1998 numbers?" would relate only to one specific cell on one specific table, while a question such as "what is the definition of seasonally-adjusted" is likely to be at least consistent at the agency level. Knowing the "uniqueness" of an answer would provide insight into strategies for metadata storage and implementation in tools. Currently, it is difficult to assess the uniqueness/consistency of information without expert knowledge. Metadata repositories (such as that being developed by the Census Bureau) can be used to determine the level of consistency.

A point that needs further study is to what extent should the table browser provide specific answers or point to general types of information that user’s might need when they find themselves exploring statistical tables. While users often have uncertainties that are highly contextual and related to their specific situation and experience, it is difficult to anticipate those in advance and provide previously encoded solutions. Finding the balance between completely contextualized and general answers needs further exploration.

 

3.                              XML and DTD’s for Statistical Information

One of the obstacles experienced during the project was the incomplete development of existing DTD’s and their lack of compatibility with project needs. Most approaches to DTD development start with repositories of information and model them, not from a user’s perspective, but from a more conceptual or structural perspective. As a result, existing DTD’s don’t encode available information to support all the user uncertainties identified in the study. Starting with users, as was done here, may be another approach to developing DTD’s worth considering.

 

4.                              Recommendations

1.      Future Research

This work might be furthered with the following additional research:

o        Expand the identification and coding of user uncertainties to additional tables in order to further validate the coding scheme, potentially begin to determine relative frequencies of uncertainty types.

o        Test the extent to which the Table Browser or other tools that incorporate relevant metadata are able to resolve user uncertainties.

o        Conduct document analyses to determine the effort involved in resolving user uncertainties with existing documentation.

2.      Further Applications Development Work

The most obvious area in which further development of applications exists is in the area of metadata encoding and DTD development. As the statistical community continues to disseminate its information electronically, it will become ever more critical for the metadata behind the data to be easily available for users and applications. The most logical approach is would be to encode it in structured and standardized formats. Some metadata already exists in this form (such as data dictionaries) but technical documentation does not. XML has also become the standard of choice for encoding information. Thus the following recommendations for development and for agency action seem relevant:

o        Continue efforts to develop metadata standards.

o        Build relevant XML DTD’s for agency information.

o        Investigate mechanisms for ensuring compatibility of DTD’s across document types and agencies.

 

 6. REFERENCES

Barry, C.L. (1994). User-defined relevance criteria: an Exploratory study. Journal of the American Society for Information Science, 45(3):149-159.

Bearman, D. (1996). Developments in metadata management frameworks. Archives and Museum Informatics 10(2):185-188.

Carr, D. B. 1998. "Multivariate Graphics," Encyclopedia of Biostatistics, Eds. P. Armitage and T. Colton, Vol. 4, pp. 2864-2886.

Dempsey, L. and Heery, R. (1998). Metadata: A Current view of practice and agreements. Journal of Documentation 54(2):145-172.

Dippo, C.S. and Gilman, D.W. (1999). The Role of Metadata in Statistics. Working Paper UN/ECE Work Session on Statistical Metadata, Geneva, Switzerland, Feb. 1999.

Frankfort-Nachmias, C. & Nachmias D. (1996). Research Methods in the Social Sciences, Fifth edition, New York: St. Martin’s Press.

Gluck, M. (1996). Exploring the relationship between user satisfaction and relevance in information systems. Information Processing and Management. 32(1):89-104.

Hall. R. (1943). Handbook of tabular presentation: How to design and edit statistical tables, a style manual and case book. NY: The Ronald Press Co.

He, J. & Gey, F. (1996) Online codebook browsing and conversational survey analysis. Social Science Computer Review 14(2): 181-186.

Hert, C.A. (1999). Federal Statistical Website Users And Their Tasks: Investigations Of Avenues To Facilitate Access: Final Report to the United States Bureau of Labor Statistics. Available at: http://istweb.syr.edu/~hert/BLSphase3.PDF

Krippendorff, K. (1980). Content Analysis. An Introduction to Its Methodology. Newbury Park, CA: Sage publications.

Marchionini, G. and Mu, X. (2001). User Studies Informing E-Table Theory and Interfaces. Submitted to the ACM SIGCHI –02 conference. Available from the authors.

Moore, D.S. (1997). New pedagogy and new content: The Case of statistics. International Statistical Review. 65 (2):123-165.

Park, T. (1993). The Nature of relevance in information retrieval: An Empirical study. Library Quarterly, 63:318-351.

Schamber, L. (1991). users’ criteria for evaluation in a multimedia environment. ASIS Proceedings 1991, pp. 126-133.

Schamber, L. (2000). "Time-line Interviews and Inductive Content Analysis: Their Effectiveness for Exploring Cognitive Behaviors". Journal of the American Society for Information Science, 51(8): 734-744.

Shneiderman, B. (2000). Universal Usability. CACM. 43(5): 84-

Strauss, A. & Corbin, J. (1990). Basics of Qualitative Research. Grounded Theory Procedures and Techniques. Newbury Park, CA: Sage publications.

Wainer, H. (1997). Visual revelations: Graphical tales of fate and deception from Napolean Bonaparte to Ross Perot. NY: Copernicus Books.

Wilkinson, L. (1999). The Grammar of Graphics. New York: Springer-Verlag.

 UN/ECE, C(47) (1992). "The Fundamental Principles of Official Statistics in the region of the economic commission for Europe". Adopted at the 47th session of the ECE, Geneva, Switzerland, 1992. http://www.nso.magnet.mt/principles/principles.htm

Walker, H., & Durost, W. (1936). Statistical Tables. Their structure and use. Bureau of Publications Teachers College, Columbia University.

 

7.   ACKNOWLEDGEMENTS

The authors acknowledge the work of Kristen Armstrong, and Hala Annabi at Syracuse University for their work on this project. The University of North Carolina team included Gary Marchionini, Zhen-Zhen Deng, and Xiaming Mu. The expertise of Fred Gey and Dan Gillman was invaluable in understanding existing metadata standards. Cathyrn Dippo and Fred Conrad are also to be thanked for their support and ongoing assistance.

  

APPENDIX 1

INSERT PRINTED VERSIONS OF TABLES HERE (Since some of the electronic versions on the websites have been changed since we did the interviews to users).

These will be included in paper version of document.


 

APPENDIX 2: INTERVIEW GUIDE\

 

Subject ID _________

Date: ____/____/____                                                Starting time:_________

 

Demographic Questions

  1. Highest level of education completed?

______________________________ In which field? _____________________________

 

  1. Sex: ___ F ___M

 

  1. How often do you use a computer?

____Never ____Occasionally ____Monthly ____Weekly ____Daily

 

  1. What applications do you use (please check all that apply)

____Email ____Word processing ____Web surfing ____Games

____Database ____Multimedia ____Programming ____Other

 

  1. Experience in web searching (1-10, novice to expert):____

 

  1. Have you ever taken a statistical course? ____Yes ____No

If yes, choose all that apply:

____High school ____College ____Graduate study ____Professional training

 

  1. Please select any statistical package(s) that you have used:

____Excel or other spread sheet ____SAS ____SPSS ____Others

 

 

We’d like to know how often you use statistical tables. Please check the response that best represents your experience

  1. Please tell us how many times (ever) you have used the following tables (including both paper and electronic formats)

Stock market tables/listings

____None ____1-5 ____6-15 ____>15

Time schedule tables

____None ____1-5 ____6-15 ____>15

Consumer information tables (e.g. cost comparison tables)

____None ____1-5 ____6-15 ____>15

Nutritional labels (e.g. cereal box)

____None ____1-5 ____6-15 ____>15

Research results in articles

____None ____1-5 ____6-15 ____>15

Government statistics on the web (e.g. health, demographic tables)

____None ____1-5 ____6-15 ____>15

Tax tables

____None ____1-5 ____6-15 ____>15

 

  1. Have you ever used Fedstats (www.fedstats.gov)? ____Yes ____No

  

  1. Have often have you used data in tables from government Web sites?

____Never ____Occasionally ____Monthly ____Weekly ____Daily

 

  1. How often have you used statistical tables on the web?

____Never ____Occasionally ____Monthly ____Weekly ____Daily

 ---------------------------------------------------------------------------------------------------------------

Subject ID _________

Date: ____/____/____                                                Starting time:_________

Table name:______________________                                       Format used: ___________________

 

Please, take a few minutes to familiarize with the table before I ask you some questions. Please, pay attention to the things that are not clear for you in the table, including terms, as well as to the way rows, columns are organized, categorized, and displayed, amount of information in the table, and location of supporting information.

 

  1. Would it be helpful for you to know the author’s intended purpose for the table?

____Yes ____No

If so, where do you think it would be best to display that information?

___________________________________________________________________________

___________________________________________________________________________

 

If not, Why not?

___________________________________________________________________________

___________________________________________________________________________

 

 

  1. Does the title help you to understand the facts in the table? ____Yes ____No

If not, what kind of information about the title would make it easier for you to understand the content of the table?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

 

14.     Could you mention all the terms that are not clear for you in this table?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

 

15.     Is there anything in the way the table, its rows or columns are organized that makes the table more

difficult to understand? ____Yes ____No

If yes, Can you mention what those table issues are?

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

 

16.     What adjustments to table appearance do you think would make the table easier to understand?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

17.     If you do not understand something in the table, where would you expect to find that information displayed?

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

 

  1. What kind of supporting information* would be useful for you if you want to do comparisons between

rows or columns? *(information to help understand data)

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

 

19.     Was the amount of supporting information about table: :

______deficient ______ sufficient ______excessive ?

Can you describe in what way was the information (deficient/sufficient/excessive)?

________________________________________________________________________

________________________________________________________________________

 

20.     Can you tell from the information in the table how any of the statistical measures were

calculated? ____Yes ____No

Would having that information available be useful to you? ____Yes ____No

 

  1. Can you tell from the information displayed what is the unit of measure for the cells in

this table? _____Yes _____No

If not, would that information be useful to you? ____Yes ____No

If yes, Where would you like to see that information displayed?

______________________________________________________________________

______________________________________________________________________

______________________________________________________________________

 

  1. Would you trust this data? ____Yes ____No

Why or Why not?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

  1. Is there anything else that you can to tell us about what could make this table easier for you to use?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

24.     In terms of how difficult was it for you to understand this table, where would you put yourself in a

scale from 1-7 (1 did not understood - 7 understood everything)? ______

 

Would it help to have supporting information according to your level of understanding?

_____Yes _____No

 

Finish time: _______

Thanks a lot for your participation.

 


APPENDIX 3: Questions Asked by Users & Research Team and the Answers Found

 

No.

QUESTION ASKED

ANSWERS FOUND IN DOCUMENTATION

                                                                    ANSWERS FROM EXPERTS     

1

What is the meaning of "seasonally adjusted"?

Normal seasonal fluctuations are smoothed out by a statistical process.

 

2

How is "unemployment rate" calculated?

Persons are classified as unemployed if they do not have a job, have actively looked for work in the prior 4 weeks, and are currently available for work.

 

3

What is "change in payroll employment?

 

Change from previous month in the level of total nonagricultural employment as measured by the Current Employment Statistics program.

4

Who is classified as "production, non-supervisory workers"?

Employees who are not owners or who are not primarily employed to direct, supervise, or plan the work of others. Production workers in mining and manufacturing, and construction workers in construction.

 

5

In Note 4, why does 1982-84=100?

Most of the specific CPI indexes have a 1982-84 reference base. That is, BLS sets the average index level (representing the average price level)--for the 36-month period covering the years 1982, 1983, and 1984--equal to 100.

 

6

In Note 5, what is meant by "finished goods"?

The three stages of processing include Finished Goods; Intermediate Materials, Supplies, and Components; and Crude Materials for Further Processing.

 

7

In Note 5, why does 1982=100

Movements are measured with respect to the base period, when the index is set to 100. Currently, most PPI's have an index base set at 1982 = 100.

 

8

In Note 6, why are the imports not seasonally adjusted?

 

 

9

Clarification of Note 7

Employer Costs for Employee Compensation (cost levels). The cost levels program produces average costs per hour worked for wages and salaries and specific benefits.

 

10

Clarification of Note 8

The survey covers establishments of all sizes in private industry (excluding farms and households) and the public sector (excluding federal government) for all of the United States.

 

11

Preliminary- when will the current data become available?

 

For the CES (rows 2-3) the data are preliminary following their initial release and may be revised in one or both months. For the PPI, data are preliminary for four months and are then revised once.

12

R- does this mean revised? If so when were they revised and how?

 

 

 

 

 

 

 

This notation is no longer used. IT was used to indicate a correction to the CPI as a result of improved procedures for measuring housing rents being deployed.

13

What is meant by "civilian labor force"?

Civilian labor force. Included are all persons in the civilian noninstitutional population classified as either employed or unemployed.

 

14

What is the difference between "employed" and "unemployed"?

Employed persons. Employed persons are all persons who, during the reference week (week including the twelfth day of the month), (a) did any work as paid employees, worked in their own business or profession or on their own farm, or worked 15 hours or more as unpaid workers in an enterprise operated by a member of their family, or (b) were not working but who had jobs from which they were temporarily absent. Each employed person is counted only once, even if he or she holds more than one job.

Unemployed persons. All persons who had no employment during the reference week, were available for work, except for temporary illness, and had made specific efforts to find employment some time during the 4 week-period ending with the reference week. Persons who were waiting to be recalled to a job from which they had been laid off need not have been looking for work to be classified as unemployed.

 

15

What are the definitions of the job categories?

A sample establishment in the CES survey is an economic unit, such as a factory, which produces goods or services. It is generally at a single location and engaged predominantly in one type of economic activity. Establishments reporting on the schedule (form BLS 790) are classified into industries based on their principal product or activity determined from information on annual sales volume. This industry classification, based on the 1987 Standard Industrial Classification Manual, is collected on a supplement to the quarterly unemployment insurance tax reports filed by each employer. For an establishment making more than one product, the entire employment is included under the industry of the principal product or activity.

 

16

Why is the Construction and Mining category not seasonally adjusted?

There are a few anomalous situations in the Economy at a Glance tables. For example, the State and Area portion of the Current Employment Statistics program does not produce seasonally adjusted data for mining for a few states. These anomalies are flagged in the Economy at a Glance table through footnotes and by highlighting those rows in a different color.

 

17

What is included in the Syracuse metropolitan area?

 

They use the OMB definition of metropolitan areas.

18

What is the difference between CPI-U and CPI-W?

All Urban Consumers (CPI-U) and Urban Wage Earners and Clerical Workers (CPI-W). The CPI-U represents about 87 percent of the total U.S. population. It is based on the expenditures of almost all residents of urban or metropolitan areas, including professionals, the self-employed, the poor, the unemployed, and retired persons as well as urban wage earners and clerical workers. The CPI-W is based on the expenditures of households that are included in the CPI-U definition that also meet two requirements: More than one-half of the household's income must come from clerical or wage occupations and at least one of the household's earners must have been employed for at least 37 weeks during the previous 12 months.

 

19

Note 5- who is a clerical worker?

clerical or wage occupations

 

20

Why is there a difference in the information given for different metropolitan areas CPI? i.e. for Syracuse there is annual percent change, for LA Orange County there is the percent change and the actual numbers, and in Arkansas there is no CPI data given.

 

He wasn't sure, he is checking and getting back to me.

21

What is meant by "enumerated population"?

 

Enumerated - - "Counted", as in the census. Most any Census volume will give all the nuances to this, like residency rules and who is actually counted (e.g. diplomatic personnel aren't, with exceptions). There are very subtle definitions, but unless you're examining Census procedures or some things like that, it shouldn't make much difference.

22

What is the "median"?

 

The median is the standard statistical measure. Basically the point at which 50% of the values are above and 50% are below.

23

In Note 1- why haven't specific group numbers been revised and how does that affect the totals vs. breakdown?

However, these estimates and projections by race have been modified and are not comparable to the census race categories.

 

24

Note 2 is confusing

Data for the population by age for April 1, 1990, (shown in Tables 14, 21, and 23) are modified counts. The review of detailed 1990 information indicated that respondents tended to provide their age as of the date of completion of the questionnaire, not their age as of April 1, 1990.

Note 2 - There are different ways the pop figures from the Census could change - one is through CQR - count question resolution. These are usually things where somebody finds that a number of people are in the wrong place, e.g. they were counted outside a place, they should have been inside. Then there's adjustment due to undercount. That was not included.

25

What is the difference between Note 1 and Note 2 and therefore what happened in 1980 vs. 1990?

 

The process for revising the 1980 base was different than that for 1990.I think all that is documented somewhere.

26

How is the population estimated for the in between years?

This is a monthly nationwide survey of a scientifically selected sample representing the noninstitutional civilian population. The sample is located in 754 areas comprising 2,121 counties, independent cities, and minor civil divisions with coverage in every state and the District of Columbia and is subject to sampling error.

Estimates are produced using components: Births, Deaths, and International Migration. You can find out more looking in the documentation under "National Estimates" under "Estimates", on http://www.census.gov

27

Why is there a second breakdown of school-age children?

 

School age breakdown - for convenience. Those are popular aggregations. NOTE - this may clear up further questions - the national estimates are produced and available for single years of age, by sex, race and Hispanic origin. The figures that appear in the table are put there as space allows and in an attempt to please as many users as possible.

28

What are the implications of suddenly switching to 10 year segments of the population after doing the rest in 5 year blocks? And what about 85 and over?

 

NOTE - this may clear up further questions - the national estimates are produced and available for single years of age, by sex, race and Hispanic origin. The figures that appear in the table are put there as space allows and in an attempt to please as many users as possible.

29

"Excludes Armed Forces overseas" - how long do they have to be overseas?

 

Armed Forces Overseas - that's whoever is overseas as of the estimate date. Doesn't matter how long.

30

What are the implications of calculating the numbers from April 1 in 1980 and 1990, and July 1 in the interim years?

 

The people that do the estimates adjust for the 3-month interval.

31

Who is classified as white?

 

race is controlled by OMB, next of kin provides race on the death certificate

32

Who is classified as black?

 

race is controlled by OMB, next of kin provides race on the death certificate

33

Who is included in "all other"?

 

race is controlled by OMB, next of kin provides race on the death certificate

34

Do the data points represent more years to live?

The most frequently used life table statistic is life expectancy, which is the average number of years of life remaining for persons who have attained a given age (x). Life expectancy and other life table values at specified ages in 1996 are shown.

 

35

How is this number calculated?

The average remaining lifetime (also called expectation of life) at any given age is the average number of years remaining to be lived by those surviving to that age on the basis of a given set of age-specific rates of dying.

 

36

What about data for after age 85?

 

will be doing 85+, in some publications they do have details after 85, it just depends on the publication and what they have data for.

37

Is it possible to include a mouse over calculator to figure out the age of death?

 

 

38

The table is difficult to read due to a lot of data columns and rows, grid lines might help.

 

 

39

How does this table relate to other years? I was 20 in 1996 and it says I should live for 60.4 more years, does this mean in subsequent year's tables I will always live to 80.4?

 

Based on mortality experience for a year or group of years, every year when life tables are redone the figures are recalculated.

40

What is meant by and where are ozone non-attainment (RFG) areas?

Environmental programs - Some areas of the country are required to use special gasolines. Environmental programs, aimed at reducing carbon monoxide, smog, and air toxics, include the Federal and/or State-required oxygenated, reformulated, and low-volatility (evaporating more slowly) gasolines. Other environmental programs put restrictions on transportation and storage. The reformulated gasolines required in some urban areas and in California add three and five cents, respectively, to the price of conventional gasoline served elsewhere.

Reformulated Gasoline: Finished motor gasoline formulated for use in motor vehicles, the composition and properties of which meet the requirements of the reformulated gasoline regulations promulgated by the U.S. Environmental Protection Agency under Section 211(k) of the Clean Air Act.  This category includes Oxygenated Fuels Program reformulated gasoline (OPRG) but excludes reformulated gasoline blend stock for oxygenate blending (RBOB). Click to view a list of the areas, which require sales of, reformulated gasoline.  OPRG is included in reformulated totals in the motor gasoline survey results. http://www.doeombuds.org/Formulation_map.jpg

41

What is meant by and where are carbon monoxide non-attainment (Oxygenated) areas?

See above

they are getting rid of the four designations and shifting to only two- reformulated and conventional

42

What is meant by and where are ozone and carbon monoxide non-attainment (OPRG) areas?

See above

they are getting rid of the four designations and shifting to only two- reformulated and conventional

43

What is meant by and where are attainment (Conventional) areas?

See above

Conventional Gasoline: Finished motor gasoline not included in the oxygenated or reformulated gasoline categories. http://www.doeombuds.org/Formulation_map.jpg

44

What is included in 'All Grades' of gasoline?

There are three main grades of gasoline: regular, midgrade, and premium. Each grade has a different octane level. Price levels vary by grade, but the price differential between grades is generally constant.

The classification of gasoline by octane ratings. Each type of gasoline (conventional, oxygenated, and reformulated) is classified by three grades - Regular, Midgrade, and Premium. Note: Gasoline sales are reported by grade in accordance with their classification at the time of sale. In general, automotive octane requirements are lower at high altitudes. Therefore, in some areas of the United States, such as the Rocky Mountain States, the octane ratings for the gasoline grades may be 2 or more octane points lower. http://www.doeombuds.org/Gasoline_Grades.html

45

What is the octane of 'Regular' gasoline?

 

Regular Gasoline: Gasoline having an antiknock index, i.e., octane rating, greater than or equal to 85 and less than 88. Note: Octane requirements may vary by altitude. http://www.doeombuds.org/Gasoline_Grades.html

46

What is the octane of 'Midgrade' gasoline?

 

Midgrade Gasoline: Gasoline having an antiknock index, i.e., octane rating, greater than or equal to 88 and less than or equal to 90. Note: Octane requirements may vary by altitude. http://www.doeombuds.org/Gasoline_Grades.html

47

What is the octane of 'Premium' gasoline?

 

Premium Gasoline: Gasoline having an antiknock index, i.e., octane rating, greater than 90. Note: Octane requirements may vary by altitude. http://www.doeombuds.org/Gasoline_Grades.html

48

What is meant by these PADD designations?

PADD: Petroleum Administration for Defense Districts

PAD District 1 (East Coast) is composed of the following three subdistricts:

Subdistrict 1A (New England): Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont.

Subdistrict 1B (Central Atlantic): Delaware, District of Columbia, Maryland, New Jersey, New York, Pennsylvania.

Subdistrict 1C (Lower Atlantic): Florida, Georgia, North Carolina, South Carolina, Virginia, West Virginia.

PAD District 2 (Midwest): Illinois, Indiana, Iowa, Kansas, Kentucky, Michigan, Minnesota, Missouri, Nebraska, North Dakota, South Dakota, Ohio, Oklahoma, Tennessee, Wisconsin.

PAD District 3 (Gulf Coast): Alabama, Arkansas, Louisiana, Mississippi, New Mexico, Texas.

PAD District 4 (Rocky Mountain): Colorado Idaho, Montana, Utah, Wyoming.

PAD District 5 (West Coast): Alaska, Arizona, California, Hawaii, Nevada, Oregon, Washington

the new site http://www.doeombuds.org/

breaks out the PADD designations including a nice map, so this question may be irrelevant.

49

Why is there not consistent data between the regions? i.e. Why is there no OPRG in PADD 1C and no Oxygenated in the PADD 1's?

 

Regarding the question on why there are not consistent data between regions (such as no OPRG in PADD 1C) is because the gasoline data are being shown in the table according to the attainment and non-attainment areas specified by EPA. Those areas are displayed in the black and white map on p. 41 of the Weekly Petroleum Status Report at the url: http://www.eia.doe.gov/pub/oil_gas/petroleum/data_publications/weekly_petroleum_status_report/current/pdf/appendix.pdf and shows the four formulations for which prices are currently being released or at the test site (http://www.doeombuds.org)

by clicking on the tab saying map of formulations (currently on the left, but soon to be moved to the right to meet an EIA overall format/navigation). The latter only shows two formulations because we are soon going to be releasing the data showing only two formulations--conventional (which will include oxygenated), and reformulated (which will include OPRG). Looking at these maps you will see, not every formulation exists in every PADD (such as there are no requirements for OPRG anywhere in PADD 1C) Even though some of that formulation may spill-over from a nearby PADD, it isn't classified as such if it is not required in that area by EPA.

50

Is there a way that we can compare between rows and columns?

 

I'm not sure what is meant about a way to compare between rows and columns. From the main table at the test site (regular grade gasoline by PADD for the last three weeks), you could compare between rows to get area differences in prices or between columns to get changes in prices over time. From the test site you can click on the tab for detailed reports, and then click on the tab for spreadsheets. The intent will be to allow you to load all the data into an excel spreadsheet and let you rearrange, calculate differences or percentages, or manipulate the data whatever way you like. You will then have to ability to graph more or different data points as you like from the excel spreadsheet option.

51

Is there a graphing capacity, to see more clearly the historical changes?

 

see above

52

There are a lot of data points and the column headings get lost.

 

I think the comment "there are a lot of data points and the column headings get lost" was particularly true of the current site and one main reason for the redesign. The test site has a simple first page with the most commonly sought info (regular gasoline prices by PADD) with tab to detailed reports to then select other data according to your hierarchy of comparison (they all contain the same data; it is just "served up" different). The time series is more limited, just eight weeks, but the spreadsheets will have longer series for those that want it. Supposedly, by end of the year, we will have a system also in place that we can link to that allows the customer to choose the cuts of data they want through drop down boxes. That self-serve system includes much more than gasoline and we are not allowed in the mogas page to duplicate its effort so the spreadsheet is as far as we are allowed to go and then we will have a link to the new self-serve application.

53

How is this data collected?

 

How are the data collected? We don't currently have a link on the site for that. We should for the new one but it is still missing from the test site. The data are collected using computer-assisted telephone interviews from a statistically selected sample of approximately 800 retail gasoline stations each week. The prices are collected every Monday morning and the data released by 5 p.m. every Monday night, except on government holidays the data are released on Tuesday (but still represent Monday's price).

54

It is illegal in NJ to have self service gas stations, so how can these be the "self service prices per gallon" for the country?

 

Yes, some states, NJ for one, do not allow self-serve. In those cases, the prices represent the only service of gasoline provided in that state. Our analysis has always shown, that this is not a big price effect in those states as compared to states allowing self-serve have higher prices for full-serve vs. self serve. I had even heard NJ monitors the impact of their law to help justify it to state resident's as not contributing to higher prices because it is required. I have nothing in writing on any of this though, it is all anecdotal. The industry doesn't make an issue of it nor do we. Some states have other laws such as refiners can't operate gas stations (MD for one), and we don't note them either as non-refiner state stations or anything.

55

I know that state gas tax can vary from state to state, how is that handled in this comparison between states?

Taxes (not including county and local taxes) account for approximately 36 percent of the cost of a gallon of gasoline. Within this national average, Federal excise taxes are 18.4 cents per gallon and State excise taxes average 19.96 cents per gallon. Also, seven States levy additional State sales taxes, some of which are applied to the Federal and State excise taxes.1 Additional local county and city taxes can have a significant impact on the price of gasoline.

State taxes on the other hand are a big effect in price differences and usually should be included in any comparisons. We do include them in the prices because the customer pays them at the pump (and we are showing pump prices). The test site has a tab for state taxes and the user can see how much of the price difference may be due to taxes or subtract out taxes themselves for other analytic purposes (like comparing to our monthly prices which exclude taxes) if they desire.

56

In the title it mentions population, but are they talking about US population?

yes

 

57

Residents of where?

US

 

58

Why aren't they more specific in the title about specific years that are being included in the population count?

?

 

59

Why don't they say in the title where specifically this population

is from?

Since it is part of a whole packet of information, it is assumed.

 

60

What are count resolution corrections?

 

See documentation on Count Resolutions, again look in www.census.gov

61

What are the texts on the side for?

?

 

62

Percentage of what, the respondents?

of total population

 

63

Why do they have the male and female breakdown for only 1980, 1990, and 1997 and not for the other years?

 

Male -Female - to fit in the space allowed. This is the discretion of the Stat Abstract Table designers.

64

I'm not sure if this means that these 3 years are based on the census and others are projected. The answer might be in the notes, but they are really difficult to understand.

 

 

65

Why do they only have 1997 in bold?

 

Design

66

What are those 3 columns before the mean? Why did they group them together?

 

Convenience.

67

Why were those places picked?

 

Design, to highlight what some designer thinks is useful.

68

Why are some things in purple and not others?

 

Design, to highlight what some designer thinks is useful.

69

They don't tell the total number of people who weren't surveyed and they should at least give a general idea.

 

 

70

It doesn't give enough information about the area that the population is from and why it excludes the Armed Forces.

 

 

71

What is the point of the last 3 categories?

 

Convenience.

72

What is the point of the count? Did they double count?

Census is a count every 10 years.

 

73

It is confusing. What do they mean by in thousands?

In order to understand the numbers you must add 000 to the end of it.

 

74

Why doesn't the title say more specifically what the table is about?

?

 

75

What exactly do the job categories like transportation and public utilities entail?

Establishments reporting on the schedule (form BLS 790) are classified into industries based on their principal product or activity determined from information on annual sales volume. This industry classification, based on the 1987 Standard Industrial Classification Manual, is collected on a supplement to the quarterly unemployment insurance tax reports filed by each employer. For an establishment making more than one product, the entire employment is included under the industry of the principal product or activity. http://www.bls.gov/790faq2.htm#q6

 

76

Does that include the subway, infrastructure, and trains?

 

 

77

Why is non-farm wage on the titles within the table and not listed with other jobs?

They are all non-farm.

 

78

What do TXT and PDF mean?

formats for the documents

 

79

What does T&PV mean?

?

 

80

I don't understand what the numbers are about. Do they mean people in the civilian labor force or something else?

 

 

81

Does employment include civilian and armed forces labor force?

Employment, except for national Federal Government estimates, is the total number of persons on establishment payrolls employed full or part time who received pay for any part of the pay period which includes the 12th day of the month. Temporary and intermittent employees are included, as are any workers who are on paid sick leave, on paid holiday, or who work during only part of the specified pay period. A striking worker who only works a small portion of the survey period, and is paid, would be included as employed under the CES definitions. Persons on the payroll of more than one establishment are counted in each establishment. Data exclude proprietors, self-employed, unpaid family or volunteer workers, farm workers, and domestic workers. Persons on layoff the entire pay period, on leave without pay, on strike for the entire period or who have not yet reported for work are not counted as employed. Government employment covers only civilian workers. http://www.bls.gov/cescope.htm#3

 

82

What does non-farm mean?

anyone who is working, but not on a farm

 

83

What is T&P?

 

 

84

What does preliminary mean?

 

For the CES (rows 2-3) the data are preliminary following their initial release and may be revised in one or both months. For the PPI, data are preliminary for four months and are then revised once.

85

What does non-farm wage mean?

Anyone who is working, but not on a farm

 

86

What do they mean by 12-month % change?

The change over the past year

 

87

What is the difference between civilian labor and non-farm?

Employment, except for national Federal Government estimates, is the total number of persons on establishment payrolls employed full or part time who received pay for any part of the pay period which includes the 12th day of the month. Temporary and intermittent employees are included, as are any workers who are on paid sick leave, on paid holiday, or who work during only part of the specified pay period. A striking worker who only works a small portion of the survey period, and is paid, would be included as employed under the CES definitions. Persons on the payroll of more than one establishment are counted in each establishment. Data exclude proprietors, self-employed, unpaid family or volunteer workers, farm workers, and domestic workers. Persons on layoff the entire pay period, on leave without pay, on strike for the entire period or who have not yet reported for work are not counted as employed. Government employment covers only civilian workers. Http://www.bls.gov/cescope.htm#3

 

88

What do they mean by 12-month % change?

The change over the past year

 

89

What is non-farm wage?

Anyone who is working, but not on a farm

 

90

What is salary employment

Employment, except for national Federal Government estimates, is the total number of persons on establishment payrolls employed full or part time who received pay for any part of the pay period which includes the 12th day of the month. Temporary and intermittent employees are included, as are any workers who are on paid sick leave, on paid holiday, or who work during only part of the specified pay period. A striking worker who only works a small portion of the survey period, and is paid, would be included as employed under the CES definitions. Persons on the payroll of more than one establishment are counted in each establishment. Data exclude proprietors, self-employed, unpaid family or volunteer workers, farm workers, and domestic workers. Persons on layoff the entire pay period, on leave without pay, on strike for the entire period or who have not yet reported for work are not counted as employed. Government employment covers only civilian workers. http://www.bls.gov/cescope.htm#3

 

91

How are employment and unemployment rates different?

Civilian noninstitutional population. Included are persons 16 years of age and older residing in the 50 states and the District of Columbia, who are not inmates of institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. Civilian labor force. Included are all persons in the civilian noninstitutional population classified as either employed or unemployed (see the definitions below). Employed persons. Employed persons are all persons who, during the reference week (week including the twelfth day of the month), (a) did any work as paid employees, worked in their own business or profession or on their own farm, or worked 15 hours or more as unpaid workers in an enterprise operated by a member of their family, or (b) were not working but who had jobs from which they were temporarily absent. Each employed person is counted only once, even if he or she holds more than one job. Unemployed persons. All persons who had no employment during the reference

 

92

What is the definition of services?

see above

 

93

How is civilian force defined?

see above

 

94

How do they calculate 12-month % change?

see above

 

95

What do the dinosaurs do?

Give a graphic history

 

96

What do the different colors mean?

They indicate links

 

97

Why do all of the links change color, when I only click on one of them?

Because all of the links go to the same place

 

98

Why is the P on each number for October?

They are all preliminary

 

99

What am I supposed to find when I click this link to another page?

?

 

100

Why are the news releases first when I click this link?

?

 

101

Who is involved in each of these job categories?

see above

 

102

Where did the get the information for these tables?

Http://www.bls.gov/eag/abouteag.htm

 

103

What are these numbers about?

?

 

104

Why can't I get the information directly when I click on this link?

?

 

105

What are the units?

Thousands of people.

 

106

What does death registration states mean?

 

From the death certificate, the state where the person died

107

What do they mean by whites? I am not sure what they include

 

Race is controlled by OMB, next of kin provides race on the death certificate

108

Does this refer to people who are citizen or not?

 

 

 

 

 

109

Does black mean people who was born African-American or people who are black that live here

 

Race is controlled by OMB, next of kin provides race on the death certificate

110

Why is area in the column, I don’t understand that?

 

?

111

What does con mean?

 

?

112

What all others include or refer to?

 

Race is controlled by OMB, next of kin provides race on the death certificate

113

Then it says total, is it the total of all other races?

 

Yes

114

What the --- are? Does this mean they don’t have data collected or what?

 

?

115

Where did they get the numbers

 

 

116

What areas mean? This is pretty vague

 

see above

117

What PADD means?

 

see above

118

What OPRG means? What is this abbreviation?

 

see above

119

Why ozone-non-attainment is abbreviated RFG? What does that mean?

 

see above

120

What the subcategories of PADDs 1, 1A, etc means?

 

see above

121

Why are they comparing RFG areas with OPRG areas?

 

 

122

What originated areas are?

 

see above

123

What does convent. Area means?

 

see above

124

What are the different gasoline categories?

 

see above

125

What attainment conventional areas or oxygenated or carbon monoxide areas are

 

see above

126

What are carbon monoxide areas?

 

see above

127

What are Oxygenated areas are?

 

see above

128

Why do they choose the dates they close? What is the significance of those dates?

 

 

 

 


APPENDIX 4

List of Users’ Comments/Suggestions and Complaints

Comments/Suggestions

Table

Freq.

It would help if they added "Total Resident Population by Age & Sex".

T14

1

If would help if they were more specific about the years being included into the population count.

T14

1

I would add to the title what area is the population from.

T14

1

They could have labeled the last group of ranges as a different age range such as adolescents, adults.

T14

2

May be if the columns, rows and numbers were more spaced it would be easier to read.

T14

4

A pop over window would be very useful for other years to indicate they are estimates.

T14

1

May be the 1997 should be put in Italic or something

T14

1

It would make more sense if the last 3 columns were displayed in a more chronological order.

T14

2

Have the numbers in bold in a bigger print.

T14

1

May be they could put the unit of measure next to the word population located in the right side.

T14

1

People don’t always look at the small print right below the title.

T14

1

It would help if you make table smaller so it is easier to see in one page.

T14

2

I would if possible put the citation on the bottom instead of in the side.

T14

1

I would put the dates a little closer to the data.

T14

1

It would help if you could run over the mouse and something tells you this range is 4 or 5 years.

T14

1

They could have a graph/chart showing the size of the population with the different years.

T14

1

They should at least give a general idea of the total number of people who weren’t surveyed.

T14

1

It would help to be able to do splits of the table.

T14

1

It would be helpful if I could highlight the columns.

T14

1

I would split the footnotes in different lines so it doesn’t seem that much information.

T14

1

Definitely a pie chart would be a lot easier.

T14

1

May be if they had an example of "in thousands" would be less confusing.

T14

1

Move the citations to the bottom.

T14

1

If they could reprint the column titles that are on the left side in the right side as well it would be easier to read without having to scroll.

T14

1

It seems unnecessary to have the word population in the right side.

T14

1

It would be helpful if the title were more specific saying something about Employment Rates or the kind of jobs people have, or something that describe what is in the table.

AAG

8

Non-Farm Wage should be listed with the other jobs. That’s silly to have that in the title.

AAG

1

It would help to know what TXT and PDF are.

AAG

1

It would be helpful to keep the table either based on persons or jobs.

AAG

1

Make all the individual type of jobs separate tables but in the same page so you can go back & forward.

AAG

1

The title should centered with the actual table and NY should be in it instead of floating.

AAG

1

The unit of measure shouldn’t be in the footnotes. It should be next to the category so you can see then right away instead of having to scroll down.

AAG

6

I think they should separate the raw numbers from the percentages.

AAG

1

The two different sections should really be two separate tables.

AAG

3

I would get rid of the (P) since it applies to all data in October and put it next to the month’ title.

AAG

2

The table could be centered more.

AAG

1

The column titles like Labor Force Data should be bigger.

AAG

1

The subheading titles should be bigger, not in the same size as all other things.

AAG

2

Being able to run the mouse over and see what construction is or include would be interesting but I don’t think there will be any difference either way.

AAG

2

When I click a term I would expect to have that part of the table by itself with more information

AAG

1

Change some of the colors that are difficult to read.

AAG

2

The dinosaurs’ icons should be bigger.

AAG

1

May be they could use different degradation of blue or use different colors for thing that are not related.

AAG

2

I would repeat the titles of the columns in the Non-Farm wage section of the table.

AAG

1

Explain the titles better by being more specific.

AAG

1

I would put the "Total" at the end of the table.

AAG

1

It would be nice if they gave you the differences between each month.

AAG

1

May be if they could show if there was a big jump on the data or something like that.

AAG

1

I think the history button is great because it tells you 10 years worth of information without putting too much information together in the table.

AAG

1

It would help to have that page organized in a different more meaningful way.

AAG

1

Make the numbers wider.

AAG

1

The pink and gray colors are very difficult to read.

AAG

2

The column header was not aligned with the icons.

AAG

1

They have a text version of the table but this is terrible.

AAG

1

Footnotes like this are not a good idea.

AAG

1

The title should be more concise. It should say something about the different grades.

Gas

4

It would help if there were explanation of the different gasoline categories.

Gas

1

They should explain what the form at the end is or at least they should have a link to it.

Gas

1

Since there is a lot of information if they could use some lines, especially horizontally, to break information up a little more might it might help.

Gas

6

The column title "Date" is like hanging. It should be closer to the values.

Gas

1

There are a lot of numbers that seem to be the same so it would help if they were highlighted or the columns more spaced.

Gas

3

Put each grade in different pages instead of cutting them up as they did.

Gas

1

Define the terms that are not clear.

Gas

1

Space out the rows a little more.

Gas

1

Bold the titles so they stand out more for people.

Gas

4

Making the titles and numbers different sizes.

Gas

1

Putting more space between the numbers.

Gas

1

Have the first part of this table in a page and then each of the grades the broken down as links.

Gas

3

They should have a description following each table.

Gas

1

May be adding some color might help to differentiate.

Gas

2

Before you read something it should be defined.

Gas

1

I would prefer to have a Website so I can go back to them.

Gas

1

Calling them would be the last thing I would do.

Gas

2

If they highlighted the most and least expensive gasoline prices in the columns.

Gas

1

May be they could have something like the Microsoft question mark "?" where you click on what you want to know.

Gas

1

May be if they explained what the abbreviation are.

Gas

1

I would rather have this information written down than in a mouse click because this sometimes is annoying and more a hassle.

Gas

1

It would help to know from where they got the information.

Gas

2

The title is very long and that makes it harder to understand.

Gas

1

The fact that the table doesn’t have gridlines makes it a little harder to follow the rows.

Gas

1

So many decimals make it confusing.

Gas

1

There are no lines to separate the numbers.

Gas

2

The total should be more explained.

LE

1

Move the years close to the data and get rid of the dots.

LE

3

The title for both sexes should be simpler.

LE

1

Preferable it should not just be like "all others", it should go through each race.

LE

1

I would separate the numbers from the different groups by bolding the column lines.

LE

2

Put term definitions close or below the title, everything else should be footnoted at the end.

LE

2

Making the number a little bigger and centered.

LE

1

May be to underline the rows or to separate the rows somehow like by alternating colors.

LE

2

May be if they told you how they got the estimates.

LE

1

If they could define those terms it would be better.

LE

1

They should probably put the source at the bottom.

LE

1

They should indicate who specifically put the information together.

LE

1

May be they could put a more detailed explanation of the footnotes.

LE

1

The column title "All others" is a little vague.

LE

1

I don’t like how it is "All races", "White", and then black as a subcategory of "All Others"

LE

2

The dates on the left are too far from the data columns. It’s easy to get lost.

LE

2

Dividing the years better.

LE

1

 

 

 

Complaints

Table

Freq.

You would have to pick apart the notes underneath to understand what they are saying.

T14

1

The numbers are too close. It is hard to read them.

T14

1

The table is kind of large.

T14

1

Table is hard to read over and keep scrolling over.

T14

1

They don’t tell the total number of people who weren’t surveyed.

T14

1

State At a Glance NY is kind of vague. It doesn’t really tells you what you will be looking at

AAG

4

I wouldn’t expect to find news releases first when I click on a link.

AAG

1

Although you can click on the links they still aren’t clearly telling you what the numbers are about

AAG

1

You might spend an hour trying to find an answer and you might not find any.

AAG

1

The page has a lot of related stuff besides what they explain about the data.

AAG

1

When you go to the history they don’t give you any explanation of what the numbers are.

AAG

1

I would click on a term and would expect to find a definition, more details about the term, number of people surveyed, or the number of respondents instead of so many other links.

AAG

7

There are too many numbers and that is confusing.

LE

1

It is hard to separate the numbers of "All races" from "Both sexes"

LE

1

Table does not explain how things are calculated.

LE

1

There were terms that were not defined.

Gas

4

They did not discuss what classifies specific oils into one grade.

Gas

1

There were no links that you could click on the page and get more information.

Gas

1

They don’t really give you any supporting information except for that contact number.

Gas

1