6302_6356 Labs
Welcome to the homepage for the 6302_6356 Labs! This course focuses on data collection methods and data visualization techniques. Below, you will find links to each lab assignment, along with a brief description of what each lab covers.
1 Lab01: Basic Commands and Matrix Operations in R
In this lab, you will get hands-on experience with basic commands and operations in R. You will learn how to create objects using the assignment operator, explore matrix operations, and perform simple descriptive statistics. Additionally, you will get introduced to R’s base graphics system to create simple plots.
1.1 Key Learning Objectives:
- Using R functions to perform basic operations on vectors and matrices.
- Performing simple mathematical operations like addition, subtraction, and matrix manipulation.
- Generating random numbers using
rnorm()
and computing the correlation between variables. - Performing basic statistical calculations such as mean, variance, and standard deviation.
2 Lab02: Text Mining with R – Wordcloud Creation
In this lab, you will learn how to perform text mining by downloading text data from the web, preprocessing the text, and visualizing the most frequent words using a word cloud in R. You will use various R packages for natural language processing (NLP), including tm, wordcloud, and quanteda. By the end of this lab, you will have the skills to create word clouds to represent text data visually.
2.1 Key Learning Objectives:
- Downloading text data from websites and converting it into a usable format for analysis.
- Preprocessing text data by converting it to lowercase, removing punctuations, numbers, and stopwords.
- Creating a Term Document Matrix (TDM) to count word frequencies in the text data.
- Generating word clouds to visualize the most frequent words, and customizing the appearance of the word cloud.
2.2 Tools:
- R packages:
tm
,wordcloud
,RColorBrewer
,NLP
,quanteda
. - Data Source: You’ll be working with famous speeches, such as Martin Luther King’s “I Have a Dream” and Winston Churchill’s “Finest Hour”.
3 Lab03: Quanteda Text Analysis
In this lab, you will learn how to perform text modeling and analysis using the quanteda
package in R. This lab focuses on working with textual data from multiple sources, including US presidential inaugural addresses and tweets from the Biden-Xi summit. You will explore advanced text mining techniques like Latent Semantic Analysis (LSA), hashtag analysis, and keyness analysis, allowing you to extract meaningful insights from text data.
3.1 Key Learning Objectives:
- Text Mining: Learn how to tokenize and preprocess text data using the
quanteda
package. - Latent Semantic Analysis (LSA): Reduce the dimensionality of your text data to reveal hidden patterns and topics.
- Hashtag and Term Frequency Analysis: Analyze and visualize the most frequent hashtags and words.
- Keyword in Context (KWIC): Explore how specific terms are used in context, and perform keyword frequency analysis across speeches.
- Keyness Analysis: Compare the linguistic features of different presidential speeches to identify key terms.
- Wordscores Model: Estimate word positions and document scores based on pre-determined reference texts.
3.2 Tools:
- R packages:
quanteda
,quanteda.textmodels
,quanteda.textplots
,ggplot2
. - Data Sources: US presidential inaugural addresses and tweets from the Biden-Xi summit.
4 Lab04: Collecting and Mapping Census Data Using API – State Data and Maps
In this lab, you will learn how to use APIs to collect census data and visualize it on maps. You will fetch census data, such as income levels, and map it at the state and county levels using R packages like tidycensus, tmap, and mapview. You’ll also explore interactive maps, allowing for dynamic visualization of census data.
4.1 Key Learning Objectives:
- Using APIs for Data Collection: Learn how to collect census data programmatically using the tidycensus package.
- Geospatial Visualization: Map census data geographically at the state and county levels using ggplot2, tmap, and mapview.
- Interactive Mapping: Create interactive maps that allow users to zoom in and explore geographic data dynamically.
- Real-world Application: Apply these techniques to Texas and Dallas county income data, visualizing the distribution of income levels by census tract.
4.2 Tools:
- R packages:
tidycensus
,tmap
,mapview
,ggplot2
. - Data Sources: US Census Bureau’s API for census income data.
5 Lab05: Visualizing Federal Housing Price Trends Using Shiny
In this lab, you will build an interactive Shiny app to visualize housing price trends across different regions in the US using data from the Federal Housing Finance Agency (FHFA). The FHFA Housing Price Index (HPI) is a weighted measure of house price changes in repeat sales or refinancings on single-family homes. This lab will guide you through the process of data wrangling, creating static and interactive plots, and building a fully functional Shiny app to explore housing price trends over time.
5.1 Key Learning Objectives:
- Data Collection and Wrangling: Fetch and preprocess the Federal Housing Finance Agency Housing Price Index (FHFA HPI) data using R.
- Static and Dynamic Visualizations: Create line plots, polar plots, and other visualizations to explore HPI trends across different regions and time periods.
- Building a Shiny App: Develop an interactive web-based application using Shiny to allow users to explore housing price data for specific years and regions dynamically.
- User Input and Visualization: Use radio buttons, sliders, and conditional panels to control the output of the Shiny app based on user selections.
5.2 Tools:
- R packages:
shiny
,ggplot2
,reshape2
,openxlsx
. - Data Source: FHFA HPI data, available through an Excel file hosted on GitHub.
Each of these labs will help you develop essential skills in data science, focusing on data collection and visualization. Click the links to access detailed instructions and resources for each lab.