DEPARTMENT OF POLITICAL SCIENCE

AND

INTERNATIONAL RELATIONS



STATISTICAL COMPUTING: MINITAB

  1. AGENDA:
    1. Internet and MINITAB demonstration
    2. Data types and structures
    3. Reading:
      1. The Student Edition of MINITAB for Windows, Tutorial pages 18 to 33.
        1. Try second tutorial.
      2. Agresti and Finlay, Statistical Methods, pages 12 to 17.


  2. SUBSTANTIVE PROBLEM:
    1. Racial differences in educational achievement.
      1. Is there a gap white-black rates of high school graduation? What is the trend in this difference over the last 30 years or so?
    2. Strategy:
      1. Look for sources of data.
      2. After finding the information, copy it from the source to the desktop.
        1. If there is a considerable amount of information, we want to avoid having to transcribe the data by hand. So we will copy and store the table(s) on our disks and then use a word processor to...
      3. Convert data so they can used in a statistical program like SPSS or MINITAB.
      4. Use the software to create the variable of interest and help answer the questions.


  3. THE INTERNET:
    1. Use a browser to "visit" the Census Bureau's web site.
      1. In particular, look for the Statistical Abstract of the United States.
      2. Find "Educational Attainment" table.
        1. Among other things it gives graduation rates for whites and blacks for various years
      3. Use your browser's "Save As" feature to copy the table to the hard drive.
        1. See Figures 1
        2. See Figures 2
        3. See Figures 3
        4. See Figures 4


  4. "DATA CLEANING:"
    1. Statistical programs cannot process textual information so use a word processor to strip out non-numeric information.
      1. Cut and delete operations
      2. See Figures 5
    2. An aside for key terms
      1. Units of analysis: the entities being "measured"
      2. Variables: attributes or characteristics of units of analysis.
        1. Types of variables:
          1. Quantitative or numerical measures
          2. Qualitative: categories or labels
      3. Missing values: measurements or classifications are frequently not available for all units. They are assigned missing value "codes."
        1. The codes indicate that the case or unit should not be included in the statistical analysis.
        2. Codes must be convert to a symbol the particular program "understands."
        3. MINITAB recognizes the asterisk as a missing value: any observation or case that has a "*" in a data field will be excluded from the analysis.
          1. Note: SPSS uses a period as a symbol or the user can designate some value(s) as missing.
          2. Be aware of missing values and how a program handles them.
        4. Hence convert textual indicators of missing information to asterisks.
        5. See Figure 6.
    3. Use a word processor's search (or find) and replace operators to change text to missing value indicator.
    4. Save the data a plain or text file.
      1. SPSS and MINITAB only "read" (i.e., process) numeric information.
      2. So numbers must be stored in a format that these programs can recognize.
        1. No statistical program will recognize a file of information that is stored in, say, Word or Word Perfect format.
      3. Consequently, after removing text and changing missing data characters to the missing value code, store the file as a DOS text file.
        1. See Figure 7.


  5. READING PLAIN OR RAW DATA INTO MINITAB:
    1. Two methods for entering data into MINITAB (or SPSS):
      1. Type numbers directly into data window, as discussed last time.
      2. "Read" previously stored data file.
    2. Folders and files
      1. Data on a hard drive or diskette are stored in folders and sub-folders under specific file names that indicate the content and type of data.
    3. Start MINITAB
      1. Click File and then Other Files
      2. Click Import Special Text
      3. NOTE: WITH STUDENT VERSION OF MINITAB CLICK ON IMPORT ASCII FILES
        1. See Figure 8.
    4. Import dialogue box.
      1. Data are stored in MINITAB in columns.
      2. You must therefore know how many columns will be "imported" or read from the disk.
        1. The example data set consists of 9 columns of data.
        2. So in Store data in column(s): type c1-c9
      3. Click OK
        1. See Figure 9.
      4. Next tell the program where the data are stored. That is, indicate the disk, folder(s), and name.
        1. See Figure 10.
        2. In this case the data are stored on the hard drive (c:) in a folder called Data under the name mydata.txt
        3. After identifying the location click OPEN
    5. The Data window
      1. See Figure 11 which shows how data are stored.


  6. CREATING A NEW VARIABLE:
    1. We want to examine the difference in white and black graduation from high school rates for different time periods. But the original data set does not contain a measure of the difference in rates. So we need to use the calculation features of MINITAB (or the transformation procedures in SPSS) to create a variable:


      1. Given the present data set this translates to


    1. Click Calc and then Calculator
    2. NOTE: WITH STUDENT VERSION CLICK ON CALC AND MATHEMATICAL EXPRESSIONS
      1. See Figure 12.
    3. The new variable, Differ, needs to be stored somewhere so type in the Store result in variable box: c10
      1. Or, type a short name and the program will automatically assign a storage location for the new variable.
      2. See Figure 13.
    4. In the Expression box type the equation.
      1. Manually type the expression or use the pointer (arrow) and mouse to "click" in the equation. That is, highlight c2 by moving the pointer to it, then Select; click the minus sign; highlight c3 and click Select.
        1. See Figure 13 again.
      2. Click OK
    5. Look at the data window. The new variable has been stored in column 10.
      1. I previously (off screen) added the label "Differ"
      2. See Figure 14.


  1. DATA ANALYSIS:
    1. Time series data
      1. The "difference scores" pertain to time periods. That is, the unit of analysis is "year."
      2. Measurements collected at different time periods are called time series.
    2. Time series can easily be plotted with the Time Series Plot
      1. See Figure 15.
    3. Select Differ (or c10) from the variable list and click OK.
      1. See Figure 16.
    4. The time series plot appears in Figure 17.
      1. What is your interpretation?


  2. DATA STRUCTURES AND VARIABLES:
    1. First, recall that "unit of analysis" refers to the level or kind of "thing" being measured or studied.
      1. Research might involve individuals or perhaps counties or provinces or countries.
      2. In these examples the units of analysis are "individuals," "counties," "provinces," and "countries" respectively.
    2. Variables are properties or attributes of units of analysis.
      1. The percent of children who watch 5 or more hours of television daily, for instance, is a variable because different units of analysis (i.e., countries) have different levels or amounts of this "property."
      2. Similarly, the "homicide rate" is a variable because it too is a characteristic of an area like an American state.
    3. Many of the problems discussed in Class 1 involve assertions or statements or about variables
    4. Note: a variable is attribute or characteristic or trait that different units exhibit in different amounts or degrees or levels. Do not confuse a variable with a specific amount. Age is a variable; 25 is a particular value of the variable. Per capita income is a variable; $10,000 is a particular value of that variable.
    5. Types of variables:
      1. Numeric (quantitative)
        1. Natural numbers with or without a meaningful zero point
        2. Examples: age, income, temperature, number of highway fatalities per year per 1 million car miles driven; percent of children who watch more than 5 hours of television; years of education.
      2. Categorical (qualitative):
        1. Nominal: measurement consists of assign units or cases to categories.
          1. Examples: Gender, political party affliation, occupation, region of country
          2. "Values" of the variable are merely labels that do not indicate amount or quantity.
        2. Ordinal: categories can be "ranked" but "distances" between levels or categories are not fixed or even known.
          1. Example: level of interest in government categorized as "high," "medium," and "low."
          2. A person in the "medium" category is not necessarily twice as interested as someone ranked "low."
      3. Variables can also be termed "dependent" or "independent"
        1. Mathematics test scores would probably be called dependent because we think of it as depending on or being caused by other factors such as amount of time studying or family background.
        2. On the other hand, per capita income or the level of development of nations could be called independent or "explanatory" because we might hypothesize that it "explains," "causes," or some other sense account for variation on a dependent variable.
      4. Sometimes the distinction between dependent and independent variables is arbitrary.
    6. Variation: The fundamental concept of statistics is variation: the amount or magnitude of differences on a variable.
      1. Variation may be large or small.
      2. A MAJOR GOAL OF SOCIAL SCIENCE IS TO EXPLAIN VARIATION
        1. This is normally done by using one or more independent variables to explain a dependent variable.
        2. There is thus a connection between statistical "explanation" and "substantive understanding and answers to problems.


  3. NEXT TIME:
    1. Data structures
    2. Tools for summarizing and displaying data


Go to Statistics main page

Go to H. T. Reynolds page.

Copyright © 1997 H. T. Reynolds