| Type: | Package | 
| Title: | Collection of Data Sets for Teaching Purposes | 
| Version: | 1.0.4 | 
| Date: | 2025-01-10 | 
| Author: | Jared Studyvin [aut, cre] | 
| Depends: | R (≥ 3.6.0) | 
| Maintainer: | Jared Studyvin <studyvinstat@gmail.com> | 
| Description: | Collection (syllogi in greek) of real and fictitious data sets for teaching purposes. The datasets were manually entered by the author from the respective references as listed in the individual dataset documentation. The fictions datasets are the creation of the author, that he has found useful for teaching statistics. | 
| License: | Apache License (≥ 2) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-01-10 16:39:35 UTC; jaredstudyvin | 
| Repository: | CRAN | 
| Date/Publication: | 2025-01-10 17:00:02 UTC | 
Study of Diets in Alligators
Description
Data.frame
Usage
data(alligatorDiet)
Format
The data frame has 16 rows and 8 variables:
- lake
- Lake in Florida of the capture of the aligator. 
- gender
- Female (F) or Male (M). 
- size
- small (<=2.3m) or big (> 2.3m). 
- fish
- Number of alligators with a primary stomach contents of fish. 
- invertabrate
- Number of alligators with a primary stomach contents of invertebrate. 
- reptile
- Number of alligators with a primary stomach contents of reptile. 
- bird
- Number of alligators with a primary stomach contents of bird. 
- other
- Number of alligators with a primary stomach contents of other. 
Details
A study done at four lakes in Florida captured 219 alligators. The primary food type found in the alligator's stomach is recorded. Along with the gender, lake of capture, and size of the alligator.
References
Agresti, A. (2013) Categorical Data Analysis. 3rd Edition, John Wiley & Sons, Hoboken, New Jersey.
Examples
data("alligatorDiet", package='syllogi')
str(alligatorDiet)
Study of Diets in Alligators at Lake George, Florida
Description
Data.frame
Usage
data(alligatorLength)
Format
The data frame has 63 rows and 3 variables:
- sex
- Female (F) or Male (M). 
- length
- Length of alligator in meters. Subadult alligators have length < 1.83 and adults if > 1.83 meters. 
- foodChoice
- Primary stomach contents of the alligator. 
Details
A study in Lake George, Florida caught 63 alligators. Each alligator's stomach contents were classified as fish, invertebrate, or other. The sex and the length of the alligator were also recorded.
References
Agresti, A. (2013) Categorical Data Analysis. 3rd Edition, John Wiley & Sons, Hoboken, New Jersey.
Examples
data("alligatorLength", package='syllogi')
str(alligatorLength)
Fictitious Data Set of Annual Sales
Description
Data.frame
Usage
data(annualSales)
Format
The data frame has 12 rows and 3 variables:
- sales
- Annual gross sales in $1000 of dollars. 
- advert
- Annual cost of advertising in $1000 of dollars. 
- quality
- Quality of their store\'s typical product: 0=very poor quality to 25 = exceptional quality. 
Details
You are hired as a statistical consultant. Twelve stores in the Fort Collins, CO area have asked you to develop a prediction model for their annual gross sales (sales; measured in $1000 of dollars). They would like to know if it is possible to predict the amount of their sales by knowing how much they spend annually on advertising (advert; measured in $1000 of dollars) and the quality of their store’s typical product (quality; measure on a scale from 0 = very poor quality to 25 = exceptional quality).
References
fictitious data set
Examples
data("annualSales", package='syllogi')
str(annualSales)
Beer
Description
Data.frame
Usage
data(beer)
Format
The data frame has 86 rows and 5 variables:
- brand
- Brand name of the beverage 
- brewery
- Brewery of the beverage 
- percentAlcohol
- Percent alcohol by volume 
- calories
- Total calories 
- carbohydrates
- Total carbohdrates 
Details
Does a beer with more carbohydrates tend to have more alcohol? To answer this question the carbohydrates and percent alcohol from several different beer brands were measured.
Examples
data("beer", package='syllogi')
str(beer)
Bighorn Sheep
Description
Bighorn Sheep data
Usage
data(bighornSheep)
Format
The data frame has 8000 rows (a geographic sample unit) and 15 variables:
- sampleUnit
- Sample unit ID, 150m circles randomly overlayed across the study area 
- count
- Count of use by bighorn sheep. 
- slope
- Average slope (degrees) within the sampling unit 
- elev
- Average elevation (m) within the sampling unit 
- distBurn
- Sampling unit center to nearest (m) burned habitat edge calculated after fire event 
- distRoad
- Sampling unit center to nearest (m) road 
- distEscp
- Sampling unit center to nearest (m) escape terrain (slope > 27 degrees) 
- distWater
- Sampling unit center to nearest (m) perennial water source 
- aspect
- Dominant cardinal direction within each sampling unit 
- fire
- 1 = after fire, 0 = before fire 
- season
- Season, summer or winter 
Details
Twelve female bighorn sheep are radio collared and tracked. Location of use of points is recorded before and after a forest fire.
References
Clapp, J.G., Beck, J.L. Short-Term Impacts of Fire-Mediated Habitat Alterations on an Isolated Bighorn Sheep Population. fire ecol 12, 80–98 (2016). https://doi.org/10.4996/fireecology.1203080
Examples
data('bighornSheep', package='syllogi')
str(bighornSheep)
Study of Recurrence of Bladder Cancer
Description
Data.frame
Usage
data(bladderCancer)
Format
The data frame has 31 rows and 3 variables:
- Size
- 0 = small primary tumor (< 3 cm) and 1 = large primary tumor (> 3cm). 
- Tumors
- Number of tumors. 
- Time
- Follow up time in months. 
Details
Study on recurrence of bladder cancer tumor patients. Each patient had perviously received surgery to remove a primary tumor. The size of the primary removed tumor was recorded. After different follow up times the number of recurring tumors were recorded.
References
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
Examples
data("bladderCancer", package='syllogi')
str(bladderCancer)
Fictitious Data Set of Butterfly Counts
Description
Data.frame
Usage
data(butterflyPlot)
Format
The data frame has 40 rows and 2 variables:
- area
- Plot area size in hectares. 
- numSpecies
- Count of number of unique species. 
Details
Plots ranging in size from 1ha to 1000ha, were left uncut in a larger landscape of logged tropical rainforest. In each plot the number of unique butterfly species was recorded. What is the relationship between plot size and unique species count?
References
fictitious data set
Examples
data("butterflyPlot", package='syllogi')
str(butterflyPlot)
Self Reported Depression
Description
Self reported level of depression and other associated metrics.
Usage
data(depression)
Format
An object of class data.frame with 50 rows and 13 columns.
Details
This is a fictious dataset useful for teaching how to use and interpret linear statistical models. The variables are:
- educate
- Level of Education: (1) professional degree (non-college), (2) 2 years of college, (3) 2+ years of college, but not a BS degree, (4) BS degree, (5) MS degree 
- income
- Annual Income: 1 = $10,0001 to $19,999; 2 = $20,000 to $29,999; ... 9 = $90,000 to $99,999; 10 = $100,000 or more 
- trauma
- Experience of Trauma; Percent of Life Events Viewed as Traumatic: 0 = 0%, 1 = 10%, 2= 20%, ..., 9 = 90%, 10 = 100% 
- satisfac
- Satisfied with your Life: 0 = No, 1 = Yes 
- control
- Feeling of Control; How much do you feel in control: 0 = Not at all, 1 = A Little, 2 = Some, 3 = A Lot, 4 = Completely 
- history
- Family History of Depression: 0 = No, 1 = Yes 
- exercise
- Weekly Amount of Exercise: 0 = None, 1 = 1 Hour, 2 = 2 Hours, 3 = 3 Hours, 4 = 4 Hours, 5 = 5 or more Hours 
- mhpg
- 3-methoxy-4-hydroxyphenylethyleneglycol, Depression Related Chemical Secreted in Urine; milligrams secreted per 24 hour period, labeled as - mg/24h: 0 = 0- mg/24h, 1 = 100- mg/24h,..., 9 = 900- mg/24h, 10 = 1000+- mg/24h
- sleep
- Amount of Sleep Problems: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time 
- depress
- Perceived Level of Depression: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time 
- depressYes
- Do I consider myself depressed: 0 = No, 1 = Yes 
- welbeing
- Feeling of Well Being; how often do you feel good about yourself: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time 
- gender
- Your Sex: 0 = Male, 1 = Female 
References
fictitious data set
Fictitious Data Set Comparing Dog Food Brands
Description
Data.frame
Usage
data(dogFood)
Format
The data frame has 25 rows and 2 variables:
- type
- The type of dog food: our dog food or one of the four top sellers. 
- gain
- The percent weight gain. 
Details
You are hired as a statistical consultant for a dog food manufacturing company. The engineers who designed the company's dog food would like to know how it compares to the current top selling dog food brands on the market? To answer this question, 25 puppies of the same breed and age (within a week of each other) were chosen for this study. Five puppies were assigned to each dog food type. After 4 weeks the percent of weight gained for each puppy was determined.
References
fictitious data set
Examples
data("dogFood", package='syllogi')
str(dogFood)
Federalist Papers
Description
List of the Federalist Papers
Usage
data(federalistPapers)
Format
The list has 86 elements, each element is a list with 2 elements. The paper element is the text of the paper. The meta element is a data frame:
- number
- Paper number. 
- author
- Author of the paper. 
- title
- Title of the paper. 
- journal
- Newpaper that published the paper. 
- date
- Date of publication. 
Details
The Project Gutenberg version of the Federalist Papers attributes paper No. 58 to Madison, but Mosteller and Wallace consider this paper to have disputed authorship. Thus, this version considers No. 58 authorship to be disputed.
The Project Gutenberg has two slightly different versions of No. 70, both included.
References
https://www.gutenberg.org/ebooks/18
Mosteller, F. and D. L. Wallace. Inference and Disputed Authorship: The Federalist. Reading, MA., 1964
Examples
data("federalistPapers", package='syllogi')
str(federalistPapers)
Generic Data Set
Description
Generic data set with four ratio predictors (X1,X2,X3,X4), two categorical predictors (A,B) and one ratio response variable (Y).
Usage
data(genericData)
Format
An object of class data.frame with 60 rows and 7 columns.
Details
This is a fictious dataset useful for teaching how to use and interpret linear statistical models.
References
fictitious data set
Examples
data("genericData", package='syllogi')
str(genericData)
Golfing
Description
Data.frame
Usage
data(golf)
Format
The data frame has 18 rows and 3 variables:
- clubs
- clubs used for that round of golf 
- course
- course for the round of golf 
- score
- score or strokes for 18 holes 
Details
I purchased new golf clubs last summer, which I believe will significantly improve my game. I recorded my score after three rounds of golf with my new clubs and my old clubs. I also played at three different courses.
References
fictitious data set
Examples
data("golf", package='syllogi')
str(golf)
Nutrition Cancer Study
Description
Data.frame
Usage
data(nutritionCancer)
Format
The data frame has 50 rows and 6 variables:
- id
- ID number of each patient. 
- age
- The age of the patient in years. 
- length
- The duration or time in months the patient has had breast cancer. 
- serving
- The number of servings the patient eats of fruits and vegetables in a typical day. 
- familyHistory
- Does or did any blood relatives (i.e. mother, grandmother, aunt, etc.) have or had breast cancer? 
- stage
- The stage of the cancer: 0-non-invasive to IV-very invasive or "advanced" cancer. 
Details
Fictitious data set for teaching purposes. The fictitious scenario:
The purpose of a medical study is to examine the relationship between eating fruits and vegetables and breast cancer. To study the relationship, 1500 caucasian women with breast cancer were randomly selected from the list of cancer patients in the U.S. The first 50 patients have been measured.
References
Fictitious data set
Examples
data("nutritionCancer", package='syllogi')
str(nutritionCancer)
Study of Nonmetastatic Osteosarcoma
Description
Data.frame
Usage
data(osteosarcoma)
Format
The data frame has 8 rows and 5 variables:
- lymphocyticInfiltration
- Patient has high or low lymphocytic inflitration. 
- gender
- Female (F) or Male (M). 
- osteoblasticPathology
- Patient has osteoblastic pathology yes or no. 
- diseaseFreeYes
- Number of patients that are disease free after three years. 
- diseaseFreeNo
- Number of patients that are not disease free after three years. 
Details
A study of nonmetastatic osteosarcoma was done. They recorded if the patient was disease free after three years. They recorded the gender, level of lymphocytic infiltration, and if there is osteoblastic pathology or not. Can the probability of being desease free after 3 years be predicted?
References
A M Goorin, A Perez-Atayde, M Gebhardt, J W Andersen, R H Wilkinson, M J Delorey, H Watts, M Link, N Jaffe, and E Frei 3rd Journal of Clinical Oncology 1987 5:8, 1178-1184
Agresti, A. (2002) Categorical Data Analysis. 2nd Edition, John Wiley & Sons, Inc., New York, 320-332. http://dx.doi.org/10.1002/0471249688
Examples
data("osteosarcoma", package='syllogi')
str(osteosarcoma)
Patient Satisfaction
Description
Data.frame
Usage
data(patientSatisfaction)
Format
The data frame has 46 rows and 4 variables:
- satisfaction
- Patient's level of satisfaction, higher value means more satisfied. 
- age
- Patient's age in years. 
- severityIllness
- Patient's severity of illness, higher value means more sever. 
- anxietyLevel
- Patient's anxiety level, higher value means more sever. 
Details
A hospital administrator wants to predict patient's satisfaction using their age, severity of illness, and anxiety level. Forty six patients were selected for the study.
References
Kutner, M. H., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). McGraw-Hill Irwin.
Examples
data("patientSatisfaction", package='syllogi')
str(patientSatisfaction)
Political Ideology
Description
Data.frame
Usage
data(politicalIdeology)
Format
The data frame has 20 rows and 4 variables:
- gender
- Female (F) or Male (M). 
- party
- Democrat (D) or Republican (R) 
- ideol
- Very liberal (VL), Slightly Liberal (SL), Moderate (M), Slightly conservative (SC), or Very conservative (VC). 
- count
- Count of people. 
Details
A 1991 U.S. General Social survey that cross classifies people according to gender, political party, and political ideology.
References
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
Examples
data("politicalIdeology", package='syllogi')
str(politicalIdeology)
High School and Beyond Survey
Description
A survey conducted on high school seniors by the National Center of Education Statistics.
Usage
data(schoolProgram)
Format
The data frame has 200 rows (a student) and 11 variables:
- id
- Student ID. 
- gender
- Student's gender. 
- race
- Student's race. 
- ses
- Socio economic status of the student's family, with levels low, middle, and high. 
- schtype
- Type of school: public or private. 
- prog
- Type of program the student wants to attend after high school. 
- read
- Student's standardized reading score. 
- write
- Student's standardized writing score. 
- math
- Student's standardized math score. 
- science
- Student's standardized science score 
- scost
- Student's standardized social studies score 
Details
Two hundred students were randomly selected from the whole cohort in the survey.
References
https://www.openintro.org/data/index.php?data=hsb2
UCLA Institute for Digital Research & Education - Statistical Consulting.
Examples
data("schoolProgram", package='syllogi')
str(schoolProgram)
Wave Damage of Ships
Description
Data.frame
Usage
data(shipDamage)
Format
The data frame has 20 rows and 5 variables:
- shipType
- Type of ship 
- constYear
- Year of construction 
- operation
- Period of operation 
- months
- Aggregate months of service 
- incidents
- Number of damage incidents 
Details
Cargo carrying vessel's damage to the forward section due to waves. These data are only for the period of operation 1975 to 1979.
References
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Edition, Chapman and Hall, London.
Examples
data("shipDamage", package='syllogi')
str(shipDamage)
Ships and Gold
Description
Data.frame
Usage
data(shipGold)
Format
The data frame has 20 rows (a ship) and 2 variables:
- shipSize
- Size of the ship measured in inches on the horizon. 
- gold
- Amount of gold pieces on the ship. 
Details
Fictitious data set for teaching purposes. The fictitious scenario:
Captain Buck Tooth has taken you prisoner aboard his pirate ship, the Lucky Lemon. He sees from your college transcripts you have taken a couple of statistics courses. Captain Buck Tooth wants you to predict the amount of gold a ship is carrying based on the size of the ship. Specifically, he thinks bigger ships carry more gold. For the last several ships he has looted he measured the height in inches when the ship was still way off on the horizon. The captain also has a good memory and remembers how much gold was taken from each ship in number of pieces.
References
Fictitious data set
Examples
data("shipGold", package='syllogi')
str(shipGold)
Ski Resort
Description
Data.frame
Usage
data(ski)
Format
The data frame has 9 rows and 4 variables:
- miles
- miles of skiable terrain 
- capacity
- number of vistors that could be taken per hour from the base of the mountain to the top via the resorts various lifts 
- vistors
- number of tickets sold per week 
- resort
- sort ID number 
Details
Information from local ski resorts in the region. The research question is can weekly visitors be predictors from miles of skiable terrian and/or capacity of lifts?
References
fictitious data set
Examples
data("ski", package='syllogi')
str(ski)
Weight Loss Study
Description
Data.frame
Usage
data(weightLoss)
Format
The data frame has 60 rows and 2 variables:
- drug
- Which weight loss drug the participant took for 6 weeks. 
- loss
- Percent of weight loss after the 6 weeks. 
Details
Fictitious data set for teaching purposes. The fictitious scenario:
You are a statistical consultant. A client comes to you asking for help with their analysis. The client is from a drug company. Their new drug is supposed to help people lose weight. They conducted an experiment with their drug (drug A) and the two best selling weight loss drugs (B and C). Male participants from age 50-60 were used in the study. Each participant took one of the drugs for 6 week and the percent of weight loss was recorded.
References
Fictitious data set
Examples
data("weightLoss", package='syllogi')
str(weightLoss)
Wheat Kernels
Description
Data.frame
Usage
data(wheat)
Format
The data frame has 275 rows and 7 variables:
- class
- hrw = hard red winter wheat and srw = soft red winter wheat. 
- density
- Density of a kernel. 
- hardness
- Hardness of a kernel. 
- size
- Size of a kernel. 
- weight
- Weight of a kernel. 
- moisture
- Moisture content of a kernel. 
- type
- Kernel's condition: Healthy, Sprout (sprouted prematurely), or Scab (infected with a fungus). 
Details
A study on kernels of wheat was done. There are two classes of wheat: hard and soft red winter wheat. Each kernel measured for density, hardness, size, weight, and moisture content. Each kernel was classified by visual inspection if healthy, sprouted, or scab. A row in the data frame represents a kernel of wheat.
References
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
Martin, C., Herrman, T.J., Loughin, T. and Oentong, S. (1998), Micropycnometer Measurement of Single-Kernel Density of Healthy, Sprouted, and Scab\-Damaged Wheats†. Cereal Chemistry, 75: 177-180. https://doi-org.libproxy.uwyo.edu/10.1094/CCHEM.1998.75.2.177
Examples
data("wheat", package='syllogi')
str(wheat)