Microdata Use

 

 

What is Microdata?

Research libraries and data archives are now bursting with data ready to be analyzed. In numeric or statistical data collections, roughly speaking, there are two kinds of data resources in terms of the structure of data content and/or processing. One is aggregated data; and the other is unaggregated data, which is usually called microdata or machine-readable raw data.

Microdata is mostly original data that contains every individual (e.g., person, company, etc.) record in the survey/research samples. Public Use Microdata Sample datafiles, mainly from U. S. Censuses and other relevant governmental surveys, so far account for the majority of microdata that have been used by researchers and other data users. The size of microdata is usually too huge to be available in print, thus computerized data storage and retrieval systems, as well as issues of file formats associated with various statistical software and operating systems, are heavily involved in working with microdata. This has made data services one of most challenging tasks in library services at academic and research libraries.

This site is developed to provide a guide to research data users and librarians who are currently or potentially involved in seeking, acquiring, using, and managing microdata files. On this site, a brief explanation of the content structure of microdata is provided so that you can decide whether or not microdata is what you are looking for. Resources regarding where to get microdata and how to get and/or use it are listed and annotated.  The guide is a gateway to microdata resources, which is intended to facilitate users in acquiring and using microdata of their interests, and also to serve as a reference tool for librarians who work in data services.

 

Major Sources of Microdata

Comprehensive Sources of Microdata

In this category, a number of comprehensive microdata sources are listed and annotated. You can find all kinds of microdata sources from these data sources. If you are not centain about what kind of microdata you are looking for, or you are not sure about the data availability, or you just want to get a general sense of what microdata is like, then this category would be a good starting point.

Inter-University Consortium of Political and Social Research (ICPSR)

The Inter-University Consortium for Political and Social Research (ICPSR), established in 1962, is an integral part of the infrastructure of social science research, with perhaps the richest collection of microdata files among all the data archives. A unit within the Institute for Social Research at the University of Michigan, ICPSR is a  membership-based organization, with over 400 member colleges and universities around the world. ICPSR maintains and provides access to a vast archive of social science data for research and instruction, offers training  in quantitative methods to facilitate effective data use, and provides user support to assist researchers in identifying relevant data for analysis and in conducting their research projects.

National Archives and Records Administration – Center for Electronic Records

The Center for Electronic Records of National Archives and Records Administration (NARA) appraises, accessions, preserves and provides access to U.S. Federal Government electronic records of continuing value, and enables researchers to gain access to Federal records designed for computer processing. The Center provides access to records of the U. S. Federal Government that have been accessioned into the NARA in an electronic format. These records may be from and readable by any type of computer application, but not digitized versions of accessioned paper or audio-visual records. If you are looking for historical microdata records (including census PUMS files) that are relevant to Federal Government organizations and their missions, then you’re strongly recommended to explore the table of content of the Center’ collection.

The UK Data Archive

The UK Data Archive at the University of Essex houses the largest collection of accessible computer-readable data in the social sciences and humanities in the United Kingdom. It is funded by the Economic and Social Research Council (ESRC), the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils and the University of Essex. Founded in 1967, it houses several thousand datasets of interest to researchers in all sectors and from many different disciplines. It also houses the History Data Service, part of the Arts and Humanities Data Service funded by the Arts and Humanities Research Board (AHRB) and JISC. To use the services, first, users must register. Registration is a simple, online procedure which only needs to be done once. Once registered, users are then free to order data online in a variety of formats and media, or, for most popular datasets, access them online via the NESSTAR system. For some users, there may be administrative fees, but for the vast majority of users, the only cost is the price of the medium (CD or disk) and postage/packing.

Social Science Data Archives at the Australian National University

The Social Science Data Archives (SSDA), located in the Research School of Social Sciences at the Australian National University, was established in 1981. Its missions are to collect and preserve computer-readable data relating to social, political and economic affairs and to make the data available for further analysis. In addition to its acquisition and distribution activities, the Social Science Data Archives has established a library of reference sources to data collection activities in Australia, including the data collection activities of Commonwealth and State government agencies, major polling organisations and individual researchers of Australia.

Data Services of the Odum Institute for Research in Social Science (IRSS)

The Odum Institute, located at the University of North Carolina – Chapel Hill (UNC), maintains one of the oldest and largest comprehensive archives of machine-readable statistical data in the U.S. Its Louis Harris Data Center is the exclusive national repository for Louis Harris public opinion data. The Data Library of IRSS holds an extensive collection of U.S. Census data, a variety of public opinion, economic, health, and education data. It is the only archive of the raw data from the Computer-Administered Panel Study (CAPS) which collected personality and social psychological microdata from annual samples of UNC undergraduates from 1983 to 1988. The strengthes of its collection are in public opinion microdata data and North Carolina state data.

Data and Program Library Service (DPLS)

DPLS at the University of Wisconsin – Madison is the central repository of data collections used by the social science research community at the University of Wisconsin-Madison. DPLS acquires, preserves and facilitates access to social science data resources, provides reference and technical services to researchers, and assists in the archiving of locally produced data. While the topical coverage of its data collections is comprehensive, DPLS holds a very strong collection of longitudinal data files which are all microdata.

 

U.S. Census Microdata

If you are interested in finding, using, or browsing U. S. Census microdata, then the data sources in this category are right for you. The U. S. Census microdata files are ofter refered to as PUMS and IPUMS, consisting of 1% samples of the long-form questionnaire, stripped of any personal identifiers and grouped in special geographic units of at least 100,000 people to protect confidentiality. These census microdata can be used as raw data for users to create their own customized tables that are unavailable through published census resources.

U. S. Census Bureau: PUMS Files

Brief description and order information for a list of microdata (PUMS) files (on disks or CDs or tapes) produced and released by the U. S. Census Bureau, including the microdata files from American Community Survey (ACS) and  Decennial Census.  The ACS files are from selected samples of the surveys conducted during 1996-1998, and the Decennial Census PUMS files are from the 1990, 1980, 1970, 1960, 1950, and 1940 Censues. American Community Survey files are available for downloading and on CD-ROM, and also available through American FactFinder. All these files are archived by and available at ICPSR.

IPUMS-USA, Minnesota Population Center, University of Minnesota

The IPUMS (Integrated Public Use Microdata Series), produced by the Minnesota Population Center at the University of Minnesota, is a series of historical census microdata that has high precision samples and comprehensive documentation. IPUMS-USA  is a coherent census database for the U. S. Censuses spanning 1850 to 1990. It is available for free access and use. Once users register, they can use the IPUMS Data Exaction System to acquire census microdata according to their own specifications with sample size, year of census, variables, etc.

1940 – 1990 Census PUMS Files, available at the Consortium for International Earth Science Information Network (CIESIN)

CIESIN, located at Columbia University in New York, is a private, government-sponsored research organization that is dedicated to disseminating information on global environmental change and demographics. It archieved some of the U. S. Census PUMS files and made them available (in ASCII format and compressed) to users for free via FTP. The PUMS datafiles include sample data from 1940 (1 percent, national file), 1950 (1 percent, national file), 1960 (1 in 1000, national file), 1970 (1 in 1000, national files), 1980 (1 percent and 5 percent files by state), and 1990 (1 percent and 5 percent files by state), and an added file contains the 15% sample for the year 1970. Simply click the FTP link here, you will be getting the microdata files if you are using a PC computer, or you can refer to the CIESIN’s User Guide to transfer the files using FTP in Unix .

1990 U. S. Census PUMS Utilities, the Geospatial and Statistical Data Center at University of Virginia

With the three kinds of application, users can subset PUMS datafiles to make their own data sets, run descriptive statistics, and create cross tabulation tables. Click the links to try them out!

2000 U. S. Census Release Schedule for PUMS Files

For 1-percent sample: 2002
For 5-percent sample: 2003

 

Canadian Census Microdata

Microdata services in Canada have developed quite fast. The Canadian Census microdata, of which the Public Use Microdata Files (PUMF) are the core, can be well accessed from the following data sources.

Statistics Canada

The Online Catalogue contains about 20 items of microdata files from Canadian Censuses 1986 – 1996, with some descriptive information (such as abstract, price, note, contact information, computer requirements, microdata specifications, keywords, and etc.) for each PUMF data file. The following links can lead you to the catalogue description for PUMFs from the 1996 and 1986 Censuses.

Data Liberation Initiative Web Site (DLI)

DLI, a cooperative effort among various academic organizations, Statistics Canada and other government departments, whereby subscribing Canadian academic institutions gain access to a large collection of Statistics Canada’s electronic data files on CD-ROM, FTP or the Web. With the advent of the Data Liberation Initiative (DLI), Canadian universities need no longer purchase Statistics Canada data file by file. Instead, participating universities pay an annual subscription fee that allows their faculty and students unlimited access to DLI microdata, databases and geographic files. Canadian Census microdata, i. e., PUMFs, are now mostly distributed under this cooperative program.

Data Library Service – University of Toronto

Codebooks and data sets for the Public Use Microdata Files (PUMFs) are available for the 1996 Census and censuses back to 1971, under the provision of the Data Liberation Initiative.

The Data Centre – Carleton University

As same as Toronto and other universities, Carleton also provide the access to all the Canadian Census microdata files. But it has made available many of its data holdings, including Census Public Use Microdata Individual Files from the 1971 to 1991 Censuses, in NSDstat+ format. Developed for use in high schools in Norway, NSDstat+ is a new, powerful and easy to use statistics package. Its ability to process data quickly and generate easy to understand and highly visual output makes this package user friendly for even the most statistics-anxious users.

 

Australian Census Microdata

Microdata files were first released for the 1981 Census, containing individual, family, and household information with a hierarchical structure for a 1 percent sample sample of households. Starting from the 1991 Census, Household Sample File (HSF) is produced and available for public use, becoming a core of Australian Census microdata. Australian Census microdata are usually accessible on the following sites.

Australian Bureau of Statistics

The Australian Bureau of Statistics is Australia’s official statistical organization. Click the link to see its catalogue (including abstract, media type, etc.) for Census Products: Census Household Sample File for 1996 to present that is a microdata file.

Social Science Data Archive (ASSDA) – Australian National University

Set up in 1981 with a brief to collect and preserve computer-readable data relating to social, political and economic affairs and to make the data available for further analysis. SSDA charges include a standard service fee of $60 for each request plus other possible fess based on the type of request.  Historic Census Data includes Sample Files from 1976 forward.

 

British Census Microdata

The Samples of Anonymised Records (SARs) extracted from the British censuses comprise the majority of microdata produced and released in the U. K. This category lists and annotates the major data sources from where you can access to, or find information about, these British Census microdata.

SARs Website, by Cathie Marsh Centre for Census and Survey Research, University of Manchester

The website presents detailed information about the SARs and how to access them.  SARs are primary microdata, and they can be used in many different applications.  Some examples are given of the sort of research you can do and a list of publications about SAR based research.

Office for National Statistics, UK
A product document of  1991 Census Samples of Anonymised Records

1991 Census Samples of Anonymised Records (SARs), as catalogued by The UK Data Archive

SN 3243 – 1991 Census Sample of Anonymised Records (SARs) : Individual
SN 3244 – 1991 Census Sample of Anonymised Records (SARs) : Household

The Sample of Anonymised Records (SARs Data)

A wonderful documentation that explains the content and format structure of SARs and provides access information about SARs in a very detained manner. It originally targets to the data users at the University of Warwick in Britain, but is very valuable to other data users in general.

 

Longitudinal Demographic Microdata

Inter-University Consortium for Political and Social Research (ICPSR)

ICPSR has an extensive converge of longitudinal microdata from both governmental and non-governmental data sources. It has so far archived and made available more than 40 longitudinal microdata files, of which most are relevant to population studies. Using keyword to search the ICPSR catalogue of holdings is always a good starting point to look for longitudinal demographic microdata files, although many longitudinal data are not archived in ICPSR but exclusively in other data organizations.

Data FERRET System , U. S. Census Bureau

A tool developed and supported by the U.S. Bureau of the Census in collaboration with the Bureau of Labor Statistics and other statistical agencies. Longitudinal microdata files from Survey of Income and Program Participation (SIPP, 1984 – 1993) are now available to extract for free from the FERRET site.

 

Other Longitudinal Microdata Sources from U. S. Census Bureau

Wisconsin Longitudinal Study (Center for Demography and Ecology, University of Wisconsin-Madison)

The Wisconsin Longitudinal Study (WLS) is a 35-year study of the social and economic life course among 10,317 men and women who graduated from Wisconsin high schools in 1957, and who have been followed up at ages 25, 36, and 53-54. Data from the original respondents or their parents from 1957 to 1975 cover social background, youthful and adult aspirations, schooling, military service, family formation, labor market experience, and social participation. The 1992-93 surveys cover occupational histories; income, assets, and economic transfers; social and economic characteristics of parents, siblings, and children; and mental and physical health and well-being. Parallel interviews have been carried out with siblings in 1977 and 1993-94. WLS data and codebooks are available on its homepage.

Panel Study of Income Dynamics (Institute for Social Research, University of Michigan)

Having continued for over 30 years, the Panel Study of Income Dynamics (PSID) is a longitudinal survey of a representative sample of U.S. men, women, and children and the families in which they reside. High quality data on employment, income, wealth, health, housing, and food expenditures, transfer income, and marital and fertility behavior have been collected annually since 1968.  The data were collected annually through 1997, and biennially starting in 1999. The data files contain the full span of information collected over the course of the study. PSID data can be used for cross-sectional, longitudinal, and intergenerational analysis and for studying both individuals and families. The most recent versions of all PSID data and supplements can be downloaded from this site.

 

National Longitudinal Study of Adolescent Health (ADD Health) (Population Center, University of North Carolina)

This is a longitudinal study which provides research data on various key questions about adolescent health and health behaviors. A national sample of 7th to 12th grade students completed 90,000 in-school questionnaires during the 1994-1995 school year. Twenty thousand students and a parent were interviewed in their homes during the summer of 1995; fourteen thousand of the adolescents were re-interviewed during the summer of 1996. Add Health data are available in two forms–a public-use dataset and a restricted-access contractual dataset. It is a central concern of the Add Health study that the confidentiality of respondents be strictly protected. Thus, public-use data includes only a subset of respondents; restricted-use data will be distributed only to certified researchers who commit    themselves to maintaining limited access; and in no case will identification numbers of persons nominated by the respondents be available to outside researchers.

Children and Young Adults of the National Longitudinal Survey of Youth (Center for Human Resource Research, Ohio State University)

The Children of the National Longitudinal Survey of Youth (NLSY79) is a longitudinal data set that focuses on the cognitive, socio-emotional, and physiological development of the children of the mothers in the NLSY79. The Children of the NLSY79 data set profiles the development and achievement of the children of the mothers in the NLSY79. Started in 1986 and repeated biennially, the NLSY79 Child/YA uses mother report and direct assessment to gauge the children’s growth, abilities, problems, school progress, and home environment. Starting in 1994, children 15 and older have been interviewed, much like their mothers, on schooling, employment, training, family experiences, health, and attitudes. The Child/YA sample ranges in age from birth to middle twenties and contains significant numbers of black, Hispanic and (through 1990) economically disadvantaged white respondents. The Center for Human Resource Research (CHRR) issues the maternal and child data and documentation at a nominal cost on CD-ROM.

National Longitudinal Survey of Children  and Youth, Canada

“The National Longitudinal Survey of Children and Youth (NLSCY), developed jointly by Human Resources Development Canada and Statistics Canada, is a long term research project that will track a large sample (22,831) of children (ages 0-11 yrs.) over many years, enabling researchers to monitor children’s well-being and development. Longitudinal data are central to discovering developmental changes in children over time, and studying the impacts of social environment of the child and various family related factors.” The microdata samples for NLSCY are available for use within the constraints of the DLI license to Canadian univeristy libraries users. The files are zipped for quick downloading and must be unzipped on the local hard drive. SPSS command files are also supplied for customized use.

Data Library Service (University of Toronto) Canadian Longitudinal Data Files

The DLS of the University of Toronto holds more than 40 longitudinal microdata files. Besides those distributed by ICPSR, there are following 8 files exclusively distributed by Canadian data sources such as Statistics Canada, National Archives of Canada, and so on. Select the links to see full description, distributor, access information, and full title for each each file.

Links and contents:
Aging in Manitoba longitudinal study (AIM)
Canadian out-of-employment survey, 1995
Canadian study of health and aging, 1991-1996-2001
Labour market activity survey, 1986-1990
Longitudinal administrative database
Longitudinal immigration database
Longitudinal study of immigrants, 1969-1970, 1971 
Longitudinal survey of immigrants to Canada, 2001
National longitudinal survey of children and youth (aka ‘the Kids survey’), 2008/2009 
National Population Health Survey – Household Component – Longitudinal (NPHS), 1994/1995 
National survey of 1990 graduates, 1992 and Follow-up survey, 1995
Ontario child health survey, 1983- [ongoing]
School leavers survey, 1991-1995
Self-sufficiency project, 1994/95-2001
Social change in Canada, 1977-1981
Survey of labour and income dynamics, 1993 – [ongoing]
Survey on smoking in Canada, 1994-1995
Workplace and employee survey, 1999 – [ongoing]
Youth in transition survey, 2000 – [ongoing]

Economic and Social Data Service Longitudinal (ESDS) A joint effort of the UK Data Archive (UKDA) and the ESRC United Kingdom Longitudinal Studies Centre (ULSC). The service currently provides a web-based download service, specialist user support that also links up with specialist support provided by the Centre for Longitudinal Studies (CLS), training and workshops and a range of value-added data enhancements for the following longitudinal data collections:

  • 1970 British Cohort Study (BCS70)
  • British Household Panel Survey (BHPS)
  • English Longitudinal Study of Ageing (ELSA)
  • Families and Children Study (FACS)
  • Longitudinal Study of Young People in England (LSYPE)
  • Millennium Cohort Study (MCS)
  • National Child Development Study (NCDS)

South Africa Data Archive, National Research Foundation, South Africa

The following items are a number of notable longitudinal microdata files produced in the longitudinal survey research in South Africa.

 

Social, Election, and Political Microdata

For data users who look for microdata in this category, ICPSR catalogue again should be the first place to explore. In fact, these are the strongest subject areas that ICPSR’s data collection is built on. In this section, thus, I would only highlight a number of most frequently-used data sources and international data sources in social and political research.

General Social Survey (GSS)

General Social Survey has long been one of the most popular sources of microdata for social sciences instruction and research because of the free access to it. The GSS (General Social Survey) is an almost annual, “omnibus,” personal interview survey of U.S. households conducted by the National Opinion Research Center (NORC) with James A. Davis, Tom W. Smith, and Peter V. Marsden as principal investigators (PIs). The first survey took place in 1972 and since then more than 38,000 respondents have answered over 3,260 different questions. Key features of the GSS are its broad coverage, its use of replication, its cross-national perspective, and its attention to data quality. The special features of the GSS follow from its unique origin as the first, perhaps only, social science microdata set designed to be analyzed by “users,” rather than the PIs and project staff. Click here to see GSS 1998 Codebook, which will show you what variables are included in the data sets and how the values for those variables are coded.

Roper Center for Public Opinion Research

Elmo Roper founded the Roper Center just after World War II. He and George Gallup played leading roles in its subsequent development. By constantly adding to the domestic and international collections of survey data, the Roper Center maintains what is by far the most complete collection of public opinion information in existence. Anyone who is interested in public opinion polls can use the center’s services. A cost estimate will be provided for each request that involves use of staff time. Pricing information is available online for datasets, and Roper Center publications, including Public Perspective. here are significant cost savings for those wishing to obtain datasets. The cost of a membership, which entitles the member to up to 50 datasets per year, is the equivalent to what 10 datasets would cost on an ad hoc basis.

Data Archive of the Henry A. Murray Research Center (Radcliffe College)

Dedicated to the study of lives over time, the Henry A. Murray Research Center promotes the use of existing social science data to explore human development in the context of social change. The Murray Center’s staff and visiting fellows conduct and support research using longitudinal studies, qualitative data, and secondary analysis, and provide a scholarly forum for the study of lives. The center’s microdata collection focuses on human development across the life span, social change, and the lives of women. Data sets within the collection are available for reanalysis, replication, and longitudinal follow-up. The center has over 230 such data sets on a wide variety of topics.

 

British Social Attitudes Survey (National Centre for Social Research of U. K.)

The National Centre’s ’s British Social Attitudes (BSA) series has followed, charted and interpreted the ebbs and flows in the nation’s social values during the 1980s and 1990s. Each annual survey consists of an hour long interview and a self-completion supplement among around 3,500 randomly-selected adults nation-wide. The survey focuses mainly on people’s attitudes, but also collects details of their behavior patterns, household circumstances and work. Each survey generates an edited book which analyses and tries to explain movements in the British public’s beliefs and values. These BSA microdata are well archived by and accessible atthe U. K. Data Archive – British Social Attitudes.

ISSP – The International Social Survey Programme

The International Social Survey Programme (ISSP) is a continuing annual programme of cross-national collaboration on surveys covering topics important for social science research. ISSP started late in 1983 when SCPR, London, secured funds from the Nuffield Foundation to hold meetings to further international collaboration between four existing surveys – the General Social Survey (GSS), conducted by NORC in the USA, the British Social Attitudes Survey (BSA), conducted by SCPR in Great Britain, the Allgemeine Bevölkerungsumfrage der Sozialwissenschaften (ALLBUS), conducted by  ZUMA in West Germany and the National Social Science Survey (NSSS), conducted by ANU in Australia. Since 1983 it brings together pre-existing social science projects and co-ordinates research goals, thereby adding a cross-national, cross-cultural perspective to the individual national studies.

American National Election Study (NES)

NES conducts national surveys of the American electorate in presidential and midterm election years and carries out research and development work through pilot studies in odd-numbered years. The NES time-series now encompasses 23 biennial election studies spanning five decades. The longevity of the NES time-series greatly enhances the utility of the data, since measures can be pooled over time, and both long-term trends and the political impact of historical events can be identified. The NES’s Data Archive site allows non-ICPSR members (and members who can’t wait for someone official to download full data sets or create a subset for them) to download data sets or create subsets from the latest NES or the cumulative NES. Data is available in SPSS portable and ASCII formats. Also available are searchable codebooks and the ability to run descriptive statistics online.  The section of NES Guide to Public Opinion and Electoral Behavior provides political observers, policy makers, journalists, teachers, students, and social scientists with immediate access to tables and graphs that display the ebb and flow of public opinion and electoral behavior and choice in American politics since 1952.

Canadian Election Study

Canadian Election Study (CES)’s main objective is to explain what makes people decide to vote (or not to vote), and, if they do, what makes them decide to support a given party or candidate, and why parties gain or lose ground from one election to another. From the sections of “Surveys” and/or “Publications” on the CES site, you can retrieval the data sets (in SPSS format) and codebooks (technical documentation) for the 2000 and 1997 studies.

Lijphart Elections Archive

The Lijphart Elections Archive (LEA), housed at the University of California at San Diego, is a research collection of district-level election results for approximately 350 national legislative elections in dozens of countries: Argentina, Australia, Austria, Bangladesh, Belgium, Belize, Bolivia, Bosnia, Brazil, Bulgaria, Canada, Chile, Colombia, Costa Rica, Cyprus, the Czech Republic, Denmark, the Dominican Republic, Ecuador, Finland, France, Germany, Greece, Hungary, Iceland, India, Ireland, Israel, Italy, Jamaica, Japan, Liechtenstein, Lithuania, Luxembourg, Malta, Mexico, the Netherlands, New Zealand, Nicaragua, Norway, Peru, Poland, Portugal, Romania, Russia, Slovakia, Slovenia, South Africa, Spain, Sri Lanka, St. Vincent and the Grenadines, Sweden, Switzerland, Taiwan, Thailand, Trinidad and Tobago, Turkey, the United Kingdom, the United States, Uruguay, and Venezuela. The LEA attempts to collect election results down to the level of the individual election districts in which votes are converted into seats. The LEA also contains information about many of the country’s constitutions and a detailed description of each country’s electoral system. The LEA originally acquired print copies of the data for 14 countries and is now focusing on online data for those 14 plus 48 additional countries. The LEA microdata is also available at ICPSR.

Other Sources of Election Studies

 

Education Microdata

Statistical Resources on the Web – Education

A webliorgraphic site by the Document Center of the University of Michigan Libraries. It contains a considerable number of microdata sites in relation to education.

National Center for Education Statistics

Part of the U.S. Department of Education, the National Center for Education Statistics (NCES) is the primary federal entity for collecting and analyzing data that are related to education in the United States and other nations.
NCES has developed an information program that provides the users of education statistics with access to a wide range of data. Statistical information is provided through the NCES Electronic Catalog, the National Education Data Resource Center, the National Library of Education, the Resource Sharing and Cooperation Division, and ED Pubs.

The following are some of the frequently used microdata data resources distributed by NCES.

Major Microdata Sources in Education: Surveys from Statistics Canada

Statistics Canada as Canada’s national statistical agency has administered a variety of educational surveys that collect microdata. The following links will lead to some notable ones.

International Archive of Education Data (ICPSR)

The International Archive of Education Data (IAED) is a project sponsored by the National Center for Education Statistics (NCES), the primary federal entity for collecting and analyzing data related to education in the United States and other nations. Over a period of several years, the Archive will acquire, process, document, and disseminate data collected by national, state or provincial, local, and private organizations, pertaining to all levels of education in countries for which data can be made available. The data stored in this new Archive are intended to support a wide variety of comparative and longitudinal research through the preservation and sharing of data resources. The Archive is housed in and operated by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. Data files, documentation, and reports are downloadable from the website in public-use format. The website features an online data analysis system (DAS) that allows users to conduct analyses on selected datasets within the Archive.

UK Data Archive – Education Data

More than 300 microdata files have been archived and made available by the UK Data Archive for public use, including the following:

  • SN 3697 – UNESCO Education Database : Secondary Education by Grade, 1960-1995
  • SN 3699 – UNESCO Education Database : Pre-primary Education Statistics, 1960-1995
  • SN 3700 – UNESCO Education Database : Primary Education by Grade, 1960-1995
  • SN 3701 – UNESCO Education Database : Tertiary Education Statistics, 1960-1994
  • SN 66040 – Provision of Adult Education and Education Leisure; Adult
  • SN 66041 – Provision of Adult Education and Education Leisure; Student
  • SN 66042 – Provision of Adult Education and Education Leisure; Staff
  • SN 983 – Attitudes of Parents of Primary School Children : National Survey of Parents of Primary School Children, 1964
  • SN 984 – Parental Attitudes of Secondary School Pupils : Follow-Up Survey of Plowden National Sample : Secondary School Parents, 1967-1968
  • SN 3760 – Teaching and Learning Processes in Inner City Infant Schools, 1992
  • SN 3815 – National Adult Learning Survey (NALS), 1997
  • SN 1096 – Pre-School Education and the Family : Relative Responsibilities of Local Authority Departments
  • SN 1285 – Pupils Interests, Abilities and Future Progress at School and Work
  • SN 1354 – Attitudes of Students at the London School of Economics, February 1980
  • SN 1514 – Children’s Difficulties on Starting Infant School
  • SN 1605 – Department of Education and Science Form 7 Schools Data, 1978; Middle Schools
  • SN 1658 – Education and Career Choices of Fifth Form Pupils
  • SN 1965 – Longitudinal Study from Middle School to 14+ of Some Factors Affecting the Development and Stability of English Pupils’ Interest in Science and Science Subject Choices
  • SN 199 – Effect of Local Education Authority Resources and Policies on Educational Attainment
  • SN 2068 – Attitudes of Students at the London School of Economics, January 1983
  • SN 2088 – Attitudes of Students at the London School of Economics, January – February, 1985
  • SN 2092 – Pre-Retirement Education, 1979-1981
  • SN 2144 – Recent Developments in the Transition from School to Work, 1981-1984
  • SN 2296 – Structure and Process of Initial Teacher Education Within Universities in England and Wales : Staff Data
  • SN 665 – Scottish Education Time Series Data, 1962, 1970, 1972
  • SN 66027 – Study of Postgraduate Education; Survey of Advanced Course Students
  • SN 4111 – Survey on School Competition, 1997
  • SN 4039 – Civil Rights in Schools : School Students’ Views, 1997-1998
  • SN 3781 – Health Education Monitoring Survey (HEMS), 1996
  • SN 3562 – Health Education Monitoring Survey (HEMS), 1995
  • SN 3488 – National Data on Rates of School Exclusion, Socio-Economic and Educational Circumstances of Some Local Education Authorities, 1992-1993
  • SN 3467 – Higher Education in Northern Ireland : Participation and the Graduate Labour Market, 1991-1992
  • SN 3296 – Scottish School-Leavers Survey, 1992 Leavers

Higher Education Statistics Agency Data (UK)

HESA is the central agency responsible for collecting statistical data from all the UK Universities. The results and aggregate data on students, institutions, and finances are distributed in both print and digital form. Academic users can get access to the raw data (microdata) without charge via the Society for Research into Higher Education with an individual or institutional membership. Ten research datapacks are available by theme: ethnicity; entry qualifications in Higher Education; course results; first destinations of graduates; disability; overseas students; regioinal issues; ethnicity; academic staff; non-credit-bearing courses.

 

Health and Medicine Microdata

Statistical Resources on the Web – Health

A webliorgraphic site for the topic of health by the Document Center of the University of Michigan Libraries. It contains a considerable number of microdata sites in relation to health.

National Center for Health Statistics (NCHS) – Microdata for Research

NCHS is the Federal Government’s principal vital and health statistics agency. Since 1960, when the National Office of Vital Statistics and the National Health Survey merged to form NCHS, the agency has provided a wide variety of data with which to monitor the Nation’s health.  The National Center for Health Statistics (NCHS) is a part of the Centers for Disease Control and Prevention, U.S. Department of Health and Human Services. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care. The following are the microdata files from NCHS:

 

National Data Archive on Child Abuse and Neglect (NDACAN)

The mission of the National Data Archive on Child Abuse and Neglect (NDACAN) is to facilitate the secondary analysis of research data relevant to the study of child abuse and neglect. Data is available for purchase at a fairly reasonable cost. The primary activity of NDACAN is the acquisition, preservation, and dissemination of high quality datasets relevant to the study of child abuse and neglect. NDACAN distributes data in SPSS or SAS format on floppy disk or CD-ROM depending on the file size. A data order comes complete with documentation and installation instructions. NDACAN staff will assist you with any technical problems you encounter while using the data. Documentation for most datasets includes a user’s guide, codebook, original instruments, and references to relevant publications.

 

Medical Expenditure Panel Survey (MEPS)

The Medical Expenditure Panel Survey, or MEPS as it is commonly called, is the third (and most recent) in a series of national probability surveys conducted by AHRQ on the financing and utilization of medical care in the United States. MEPS consists of four components: Household Component (HC); Nursing Home Component (NHC); Insurance Component (IC). MEPS data releases are available for public use on diskettes, CD-ROMs, and the Internet at the MEPS Web site.

Health and Retirement Study (HRS) and Asset and  Health Dynamics Among the Oldest Old (AHEAD)

These two micodata files available online are nationally representative longitudinal data collections that examine retirement and the aging of society. The Health and Retirement Study (HRS) studies many characteristics of those near or in their retirement years. Specifically, it is a longitudinal national panel study. The baseline consists of interviews in 7,600 households in 1992 (respondents aged from 51 to 61, along with their spouses), with followups every two years for 12 years. The data contains a wealth of economic, demographic and health information, related to retirement issues. The Asset and Health Dynamics Among the Oldest Old (AHEAD) is an HRS Auxiliary Study, and is also known as “Aging and Health in America . It centers on “data to address a broad range of scientific questions focused on the interplay of resources and late life health transitions.” It consists of “8,224 respondents aged 70+, including about 2,560 aged 80 and over.”

Demographic and Health Surveys (DHS)

The DHS Data Archive is a computerized archive of survey data collected from countries in Africa, Asia and Latin America. Data are currently available for 51 countries and more are being added as additional surveys are completed. For each country several datasets are usually available: Individual women’s data (standard DHS survey) Household data, Male or husband’s data (for some countries), Couple’s data (some countries), Children’s data (some countries). There are presently two ways to access DHS data: (1) Direct FTP from the DHS site, and (2) through the data archive consortium such as ICPSR.

National Population Health Survey (NPHS), Statistics Canada

The National Population Health Survey, a longitudinal survey conducted by Statistics Canada, re-interviews a group of Canadians every two years. Designed to measure the health status of Canadians and to expand knowledge about the determinants of health, the initial wave of data collection – cycle 1- took place from June 1994 to June 1995. Data for cycle 2 were collected from June 1996 to August 1997. Topics covered include: general health, children’s health, chronic disease incidence, activity limitations, depression, repetitive strain and other injuries, alcohol dependence, smoking, physical activity and medical checkups. 20,000 households were surveyed, with a minimum of 1,200 per province/territory. Micro datafiles for 1994-1995, 1996-1997, and 1998-1999 are available for ordering from Statistics Canada or through the Data Liberation Initiative.

Substance Abuse & Mental Health Data Archive (SAMHDA)

This archive housed at the ICPSR is a free, publicly available repository of current datasets (data files and their documentation). Over two dozen studies are currently available. An online subsetting mechanism is available for the following major studies: Monitoring the Future, National Household Survey on Drug Abuse, Treatment Episode Data Set, and Washington D.C. Metropolitan Area Drug Study.

Health Survey for England

The Health Survey for England (HSE) is part of a wider programme of surveys commissioned by the Department of Health, Great Britain, and is designed to monitor trends in the nation’s health. The full (anonymized) survey datasets for the Health Survey for England are made available for online access at MIMAS or UK Data Archive for academic research or teaching. To access the Health Survey for England via MIMAS, you will need to be a registered user with an account the MIMAS Unix server (see How to register for Irwell, MIMAS Server).  Or, you may want to apply to the UK Data Archive at the University of Essex for permission to access each year of the HSE which you require for a specific project.

 

Economic, Business and Financial Microdata

DataFERRET  (U.S. Bureau of the Census and  Bureau of Labor Statistics)

Federal Electronic Research and Review Extraction Tool (FERRET) provides access to microdata from the Current Population Survey, the Survey of Income and Program Participation, etc.

Consumer Expenditure Survey (Bureau of Labour Statistics Microdata)

The survey collects information from the Nation’s households and families on their buying habits (expenditures), income, and characteristics. The strength of the survey is that it allows data users to relate the expenditures and income of consumers to the characteristics of those consumers. The survey consists of two components, a quarterly Interview Survey and a weekly Diary Survey, each with its own questionnaire and sample. The Bureau of Labor Statistics sells CD-ROMs and computer tapes of Consumer Expenditure (CE) Survey public use microdata. The CD-ROMs and tapes contain either separate or various combinations of Interview survey data, Diary survey data, EXPN files, and tabulated data (see the contents section in the table for a particular year’s exact contents). For years prior to 1996 the microdata are available only in ASCII text format. Beginning in 1996 the microdata are available in either ASCII text format or PC SAS data sets. Using programs like SAS or SPSS users can extract and manipulate the data.

Center for Economic Studies (U. S. Census Bureau)

The Center for Economic Studies (CES) is a research unit of the Office of the Chief Economist, U.S. Bureau of the Census, established to encourage and support the analytic needs of researchers and policy makers throughout government, academia, and business.  The Census Bureau’s Center for Economic Studies (CES) has developed a data site which lists core data available for research at CES and its Research Data Centers, and provides limited documentation for these data. Most of these data are longitudinal microdata sets or time series datafiles constructed from business establishment and firm level data collected by the Census Bureau.

American Housing Survey (AHS)  (U.S. Bureau of the Census)

AHS provides housing microdata for metropolitan areas, on CD-ROM. The data files are also available at ICPSR and many other data archives.

 

SESTAT (National Science Foundation)

SESTAT is a comprehensive and integrated system of information about the employment, educational, and demographic characteristics of scientists and engineers (S&E) in the United States. In concept it covers those with a bachelor’s degree or higher who either work in or are educated in science or engineering, although some data on the non-S&E are also included. Public use data files of over 100,000 college graduates with an education and/or occupation in a natural science, social science or engineering field currently representing about 12 million scientists and engineers in the United States.

CANSIM (Statistics Canada)

The acronym CANSIM came from CANadian Socio Economic Information Management System, the largest computerized Statistics Canada’s Time Series database where the information is arranged by successive dates and formatted into structures known as matrices. Containing approximately 740,000 time series, CANSIM is the key socio-economic database profiling Canada’s economy and industries and a detailed source of Canadian economic data. It is recommended that you use CANSIM for research that requires longitudinal and detailed economic data, although it also contains some social data. CANSIM is updated weekly. The CANSIM time series database is available through our Data Distributors , also through the Data Liberation Initiative.

Survey of Family Expenditures (Statistics Canada)

This survey provides expenditures microdata by households, as well as their budgets for the year, including all expenditures, income, and changes in assets and debts in Canada. Topics include: composition of households, characteristics of dwelling, shelter expenses, furnishings and equipment, running the home, food and alcohol, clothing, medical and health care, travel and transportation, recreation and education, tobacco and miscellaneous expenses. Public use microdata files from the survey are available for 1969, 1978, 1982, 1984, 1986, 1990, 1992, 1996, through the Data Liberation Initiative member institutions or directly ordering the CDs from Statistics Canada.

Econometrics Laboratory Software Archive (ELSA) (University of California at Berkeley)

Microdata available here include Lorna Greening’s Integrated Consumer Expenditure Survey data files, for 1980-1994. Data are available in either ascii or SAS formats. Variables are labeled consistently across all fourteen years. Also available: Bronwyn Hall’s datasets, including patents-R&D data; patents data at the individual firm level; header information for all the firms in the 1991 manufacturing sector master file; firms that exited from the 1991 manufacturing sector master file (and reasons for exit); data for productivity estimation; data for market value estimation.

Luxembourg Employment Study (LES)  (Grand Duchy of Luxembourg and Center for Population, Poverty and Policy Studies (CEPS), Syracuse University)

The Luxembourg Employment Study, a project associated with the Luxembourg Income Study (see below) began in 1994. Its aim is to “construct a databank containing Labour Force Surveys from the early nineties from countries with quite different labour market structures. These surveys provide detailed information on areas like job search, employment characteristics, comparable occupations, investment in education, migration, etc. The LES team has harmonized and standardized the micro data from the labour force surveys in order to facilitate comparative research.” After registering, users may submit statistical program jobs to the LES in order to analyze data. The “User Information” section explains this process. The “Using the Database” section provides links to available electronic documentation needed to set up program statements. LES can process SAS, SPSS, or STATA jobs via email. Note that the service is freely available only to researchers in LES member countries.

Luxembourg Income Study (LIS) (Grand Duchy of Luxembourg and Center for Population, Poverty and Policy Studies (CEPS), Syracuse University)

The Luxembourg Income Study, begun in 1983, is a database of “social and economic household survey microdata” from 25 countries in Europe, North America, the Far East, and Australia.” Data are directly taken from household surveys or administrative records in the countries involved. Microdata are standardized and become part of the database. Researchers in member countries have access to this data, after registration. LIS can process SAS, SPSS, or STATA jobs via email. Available datasets and documentation can be found at the site.

Wharton Research Data Services  (University of Pennsylvania)

Wharton Research Data Services or WRDS is a revolutionary Internet-based business data service from the Wharton School. It has become the standard in academic research at the nation’s leading business schools. WRDS puts the power of a supercomputer on every user’s desktop, providing instant, easy access to the most important financial databases. It taps the most comprehensive sources of accounting, banking, economics, finance, insurance, management, marketing, public policy, risk management, and statistics giving user’s the business intelligence they need. Its simple yet powerful interface offers researchers point-and-click access and a menu of variables for researching more than several hundred firms simultaneously. Telnet access allows users to run existing C, FORTRAN, SAS, and a variety of statistical analysis software on WRDS with little or no modification. Entire databases can be downloaded to a PC for further analysis. Financial data such as CRSP, Compustat and TAQ are now available from the Wharton Research Data Services (WRDS). To use WRDS, you must create an account – all students, faculty and staff of Princeton University are eligible. If you are a member of an institution that does not subscribe to WRDS, then you can request a trial account.

Economagic.com  

This page is meant to be a comprehensive site of free, easily available economic time series data useful for economic research, in particular economic forecasting. There are more than 100,000 time series for which raw data, table data, and custom charts can be retrieved as text or Excel files. The majority of the data is USA data. About 75,000 economic time series useful for economic research, particularly economic forecasting. Mostly US data including macroeconomic data and employment data by local area — state, county, MSA, and many cities and towns.

Panel Study of Income Dynamics  (University of Michigan)

The Panel Study of Income Dynamics (PSID) is a longitudinal survey of a representative sample of US individuals and the families in which they reside. It has been ongoing since 1968. The data are collected annually, and the data files contain the full span of information collected over the course of the study. PSID data can be used for cross-sectional, longitudinal, and intergenerational analysis and for studying both individuals and families. Documentation such as codebooks, computer assisted interview documentation, and questionnaires are available online on the PSID site. The PSID Data Center provides a user-friendly system for creating custom subsets from the PSID Public Release I and Public Release II datasets. The Data Files page allows you to download entire datasets for Public Release I, Public Release II, and Supplemental datasets; and also offers information of ordering the 1968-1992 CD-ROMs.

Federal Reserve Board  – Time Series Data and Microdata Files

  • Statistics: Releases and Historical Data
    Daily, weekly, monthly, quarterly, and annual historical (time series) data released by the Federal Reserve Board are available here.
  • National Survey of Small Business Finances (1987, 1993)
    Survey of the financial affairs of small businesses, conducted by the Board of Governors and the U.S. Small Business Administration in 1987 and 1993. Includes characteristics of firms and owners, the firms’ use of financial services and financial service suppliers, and income and balance sheet items. The survey data available here are contained in either a SAS transport file or an ASCII flat (rectangular) file.
  • Survey of Consumer Finances (1962 – 2000)
    Triennial survey by the Board of Governors of selected demographic characteristics of U.S. families, including their income, balance sheets, and use of financial services. Data are available in transportable types of files and sometimes also ASCII files, both of which may be used with the current versions of SAS.

Historical Market (Price) Data

The Chicago Board of Trade  – Quotas and Data

Daily Price File contains daily open, high, low, close, volume, and open interest data for each commodity for a minimum of the previous 3 months. Historical data from 1972 to the present is available for purchase from the Market Information department at 1-312-435-3633  or via e-mail at cbot_historical_data@cbot.com.

Chicago Mercantile Exchange Datafiles Available via FTP

The Chicago Mercantile Exchange’s Web site offers a variety of free price data to help track the markets. To serve the users of the CME’s markets, they are now making historical datafiles available each day via the CME’s FTP site on the Internet.

 

Tips for Extracting Microdata

Introduction to Data Handling

This is a handout provided by the University of Chicago’s data archivists. It is intended to introduce you to the basics required to extract data and converting “raw” data into a dataset to be used by a statistical application, specifically SPSS or SAS. It illuminates topics such as reading a codebook, identifying data structures, and developing programs for reading raw data into a statistical application.

FAQ with Data FERRET System

Tips for extracting data via FERRET of the U. S. Census Bureau. Answers are provided for questions such as: 1) How do you convert a sas transport dataset back to a regular sas dataset ? and 2) I am trying to extract eight items from the CPS Microdata. But every time I hit the enter button on the second item, the first item gets unhighlighted. Is there something wrong with my machine, or can I only get one item at a time? (I am using Netscape on my PC).

Extracting Data in the Harvard-MIT Data Center

This tutorial is intended to help you to take a dataset in some known but inconvenient format, and convert it to a more usable format, extracting only the variables and cases that interest you. For most data extraction, the program DBMS/Copy is recommend , although a few tasks require SAS or SPSS.

Tips with the Enhanced Extract System of IPUMS, University of Minnesota

On Friday, August 18, 2000 the old IPUMS extract system was replaced by a new system incorporating enhanced features requested by users. One of the key features of the new system is the ability to modify and resubmit previous jobs.  Here are the most common technical aspects of the extract conversion that might affect data users and strategies fro them.

Using SETS to Extract NHIS and Other NCHS Data on CD-ROM

This is a tip from the Electronic Data Services at Columbia University. It illustrates how NHIS data on CD-ROMs can be accessed by using NCHS’s Statistical Export and Tabulation System (SETS) software. This technical note on the SETS User Interface will guide you through the process of extracting a subset of the NHIS data from the CD-ROM for your analysis.

CPS Utilities, Unicon Research Corporation

Despite their importance to the research community, the Current Population Survey (CPS) files distributed by the Census Bureau are inconvenient to use in several ways, particularly for the novice but even for those experienced in the use of these data. The CPS Utilities, consisting of CDs containing data, documentation and Windows software, can help researchers easily find and extract data from the U.S. Census Bureau’s Current Population Survey.  Extractions are formatted as Stata datasets, or as raw ASCII files with SAS and SPSS input code.   Each extraction is documented in a report file.   Options allow selection of observations, renaming of variables, smart recoding, and more.  Powerful search utilities allow variables of interest to be identified quickly.  Documentation for each variable is consolidated on a single page covering all survey years, and is hyperlinked to documentation for related variables and appendices.

 

Introduction to the Use of Microdata

Uses of Microdata, from the IPUMS site at the University of Minnesota.

Illustrative Use of the Panel Study of Income Dynamics (PSID).

1990 PUMS Utilities, created by the Geospatial and Statistical Data Center, University of Virginia.

Analysis of the SARs using SPSS for Windows

A technical handout on using British Census microdata provided by Cathie Marsh Centre for Census and Survey Research, University of Manchester.

Analysis of the SARs using Stata

A handout on using British Census microdata provided by Cathie Marsh Centre for Census and Survey Research, University of Manchester.

 

Exemplary Uses of Microdata

Using National Survey Data to Analyze Children’s Health Insurance Coverage: An Assessment of Issues,  by John L. Czajka and Kimball Lewis, Mathematica Policy Research, Inc.  May 21, 1999

Use of Census Data in Transportation Planning: San Francisco Bay Area Case Study, by Charles L. Purvis, Metropolitan Transportation Commission, Oakland, California

How Do the Elderly in Taiwan Fare Cross-Nationally? Evidence from the Luxembourg Income Study Project, by Peter Saunders and Timothy M. Smeeding. Social Policy Research Centre, Discussion Paper, No.81

Understanding Productivity: Lessons from Longitudinal Microdata, by Eric J. Bartelsman, Free University, Amsterdam, and Mark Doms, Federal Reserve Board of Governors.

Jobs, Skills, Location and Discrimination: An Analysis of Milwaukee’s Inner City and Metro Areas, by Robert Drago, Department of Economics, University of Wisconsin-Milwaukee, April 1994

The Research Potential of the Samples of Anonymized Records (SARs), as demonstrated by Cathie Marsh Centre for Census and Survey Research, University of Manchester.

 

Manipulating, Managing and Extracting Microdata

SAS Learning Modules for Data Management & Analysis

A SAS data management tutorial,  by the Academic Technology Service, UCLA

SAS: Data Transformations and Data Manipulation

A guide to data transformation and data manipulation with SAS, by the Academic Technology Service, UCLA.

SPSS: Displaying Data

This document, created at the University of Texas – Austin,  describes the use of SPSS to create and modify tables which can be exported to other applications. Graphical displays of data are also discussed, including bar graphs and scatterplots as well as a discussions on how to modify graphs using the SPSS Chart Editor and Interactive Graphs.

SPSS: Data Manipulation

This module, created at the University of Texas – Austin, describes the use of SPSS to do advanced data manipulation such as splitting files for analyses, merging two files, aggregating datasets, and combining multiple tables in a database into an SPSS dataset in the first section. Several advanced topics are included in the second section, including the use of SPSS syntax, the SPSS Visual Basic editor, and SPSS Macros.