Fig. 1: Japan’s Industrial Diamond Trade (Export and Import Values) Over Time (1962-2021)

This line chart visualizes Japan’s industrial diamond trade trends. The x-axis represents years from 1960 to 2025, while the y-axis shows trade values in monetary terms in Million $. The blue line indicates export trends, representing Japan’s role as a supplier of industrial diamonds; while the orange line highlights import trends. Data source: AEC Precious Metals and Gems Dataset

Fig. 2: Cumulative Sewage Spill Quantities (in gallons) in New York State Water Bodies

This bar chart visualizes the total reported sewage spill quantities (in gallons) across major water bodies in New York State, using data from the New York State Spill Incidents Database. The East Chester Creek account for the highest Sewage Spill Quantities, likely due to urban wastewater overflow, stormwater drainage issues, and aging infrastructure in the NYC metro area.

The log scale is used to capture the large variation in spill quantities, as some water bodies experience significantly higher contamination than others.

Fig. 3: Proportion of Total Sewage Spills by Decade in New York State

This pie chart shows the distribution of reported sewage spills in New York State, grouped by decade. The 2000s account for the largest share of spills (40.3%), followed by the 2010s (29.3%). Note that data for the 2020 - 2023 is incomplete, as it only includes spills reported up to 2023. And 1970s has no Spills. The chart highlights the increasing trend in spill incidents over time, likely due to aging infrastructure and population growth. Data source: New York State Spill Incidents Database.

Fig. 4 Occupational Fatalities by Industry and Cause (2020-2023)

This Sankey diagram visualizes the distribution of occupational fatalities across various industries and their leading causes from 2020 onwards. The left side represents different industry sectors (color applies to industries), while the right side categorizes the major causes of workplace fatalities (causes are shown in grey). The thickness of the connections (links) between the industries and causes reflects the number of fatalities attributed to each combination.

decade	Total Export (Million $)	Mean Export (Million $)	Median Export (Million $)
1960s	1,046.64	0.09	0.00
1970s	24,375.57	1.13	0.01
1980s	163,884.79	5.12	0.05
1990s	419,981.70	9.20	0.07
2000s	930,122.62	15.26	0.09
2010s	4,252,803.00	58.85	0.13
2020s-2021	1,152,624.71	73.86	0.11

Table: Export value of gold changed over time (1960-2021)

Trends in global gold export values (inflation-adjusted, in million USD) over time, grouped by decade. This table summarizes the total, mean, and median export values per decade from the 1960s to the 2010s and 2020-2021, providing insight into how global gold trade has evolved. The 2010s (highlighted in red) saw the highest total exports; while the 1960s (highlighted in skyblue) saw the lowest total exports.

Table note:

Gold exports have increased significantly over time, with the largest growth occurring in the 2010s.
The mean and median export values also show a steady rise.
The dataset is inflation-adjusted.
The data comes from the Atlas of Economic Complexity (SITC Revision 2) and has been pre-filtered to include only gold and diamonds.

Code Output

library(dplyr)
library(reshape2)
library(ggplot2)
library(stringr)
library(knitr)
library(ggrepel)
library(gt)
library(networkD3)
library(ggsankey)
options(scipen = 9999)

aec <- read.csv("D:/Projects/INFO526/Assignment/Assignment3/AEC Precious Metals and Gems.csv")

aec_jp <- aec %>%
  filter(country_name == "Japan",
         str_detect(product_name, "diamonds")) %>% 
  mutate(across(c(export_value, import_value), ~ .x / 1e6))
aec_jp_agg <- aec_jp %>%
  group_by(year) %>%
  summarise(
    total_export = sum(export_value, na.rm = TRUE),
    total_import = sum(import_value, na.rm = TRUE)
  )
ggplot(aec_jp_agg, aes(x = year)) +
  geom_line(aes(y = total_export, color = "Export"), linewidth = 1) +
  geom_line(aes(y = total_import, color = "Import"), linewidth = 1) +
  scale_color_manual(values = c("Export" = "#0072B2", "Import" = "#D55E00")) +
  labs(
    x = "Year",
    y = "Trade Value (Million USD)",
    color = "Trade Type"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom", plot.caption = element_text(hjust = 0))

Sewage_Spills <- read.csv("D:/Projects/INFO526/Assignment/Assignment2/Sewage_Spills.csv")
# Aggregate total spill quantity by water body
spill_by_waterbody <- Sewage_Spills %>%
  filter(!is.na(Waterbody), Waterbody != "", !is.na(Quantity), Quantity > 0) %>%
  mutate(Waterbody = str_to_title(Waterbody)) %>%
  group_by(Waterbody) %>%
  summarise(Total_Quantity = sum(Quantity)) %>%
  arrange(desc(Total_Quantity)) %>%
  slice_head(n = 10)  # Show only the top 10 most affected water bodies

# Line Plot of Spill Quantity by Water Body
ggplot(spill_by_waterbody, aes(x = reorder(Waterbody, Total_Quantity), y = Total_Quantity, fill = Waterbody)) +
  geom_col(alpha = 0.8, show.legend = FALSE) +
  scale_fill_manual(values = scales::hue_pal()(nrow(spill_by_waterbody))) + 
  scale_y_log10() +
  coord_flip() +
  labs(
    x = "Water Body",
    y = "Total Sewage Spill Quantity (Gallons, Log Scale)"
  ) + 
  theme_minimal()

Sewage_Spills <- Sewage_Spills %>%
  mutate(Spill.Date = as.Date(Spill.Date, format = "%m/%d/%Y"))

actual_spill <- Sewage_Spills %>%
  mutate(Spill.Year = as.numeric(format(Spill.Date, "%Y"))) %>%
  filter(!duplicated(Spill.Number)) %>%
  group_by(Spill.Year) %>%
  tally()

spill_decade <- actual_spill %>%
  mutate(Decade = floor(Spill.Year / 10) * 10) %>%
  filter(Decade != 1970) %>%
  mutate(Decade = factor(as.character(Decade))) %>%
  group_by(Decade) %>%
  summarize(Total = sum(n)) %>%
  mutate(Percentage = Total / sum(Total) * 100)


ggplot(spill_decade, aes(x = "", y = Total, fill = Decade)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y", direction = -1) +
  theme_void() +
  labs(fill = "Decade") +  
  geom_text(
    aes(label = paste0(round(Percentage, 1), "%")), 
    position = position_stack(vjust = 0.5),
    color = "black",
    size = 5,
    box.padding = 0.5,
    max.overlaps = Inf
  ) +
  scale_fill_manual(
    values = c("1980" = "#ffffb3", 
               "1990" = "#bebada", "2000" = "#fb8072", 
               "2010" = "#80b1d3", "2020" = "#fdb462"),
    labels = c("1980s", "1990s", "2000s", "2010s", "2020-2023")
  ) +
  theme(
    legend.position = "bottom",
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14)
  )

jobs <- read.csv("D:/Projects/INFO526/Assignment/Assignment5/Dangerous Jobs.csv")

jobs_clean <- jobs %>%
  filter(!is.na(Fatalities)) %>%
  filter(Year >= 2020) %>%
  mutate(Cause = tolower(Cause)) %>%
  mutate(Cause = str_to_sentence(Cause)) %>% 
  filter(MajorGroup != "") %>%
  filter(Cause != "Total.Fatalities" & Cause != "Total.fatalities")

jobs_agg <- jobs_clean %>%
  group_by(MajorGroup, Cause, Year) %>%
  summarize(Total_Fatalities = sum(Fatalities, na.rm = TRUE)) %>%
  ungroup()


top_causes <- jobs_agg %>%
  group_by(Cause) %>%
  summarize(Total_Fatalities = sum(Total_Fatalities)) %>%
  arrange(desc(Total_Fatalities)) %>%
  head(5)  

top_industries <- jobs_agg %>%
  group_by(MajorGroup) %>%
  summarize(Total_Fatalities = sum(Total_Fatalities)) %>%
  arrange(desc(Total_Fatalities)) %>%
  head(5) 

sankey_data <- jobs_agg %>%
  filter(Cause %in% top_causes$Cause & MajorGroup %in% top_industries$MajorGroup) %>%
  make_long(MajorGroup, Cause, value = Total_Fatalities)

industry_colors <- c(
  "Educational Services" = "#1f78b4",
  "Heavy And Civil Engineering Construction" = "#33a02c",
  "Justice, Public Order, And Safety Activities" = "#e31a1c",
  "Truck Transportation" = "#ff7f00",
  "Administrative And Support Services" = "#6a3d9a"
)

# Assign colors: industries use predefined colors, causes = grey
sankey_data <- sankey_data %>%
  mutate(node_color = ifelse(x == "MajorGroup", 
                            industry_colors[node], 
                            "grey80"))

ggplot(sankey_data, aes(x = x, next_x = next_x, node = node, next_node = next_node,
                        fill = I(node_color),
                        value = value)) +
  geom_sankey(flow.alpha = 0.5, node.color = "black") +
  geom_sankey_text(aes(label = node), size = 4, color = "black", hjust = 0.5) +
  theme_void()

gold_exports <- read.csv("D:/Projects/INFO526/Assignment/Assignment4/AEC Precious Metals and Gems.csv") %>%
  filter(str_detect(product_name, "gold")) %>%
  select(year, country_name, export_value, country_continent) %>%
  filter(!is.na(export_value) & export_value > 0)

gold_exports <- gold_exports %>%
  mutate(export_value = export_value / 1e6) 

gold_exports_decade <- gold_exports %>%
  mutate(period = case_when(
    year >= 2020 ~ "2020s-2021",
    TRUE ~ paste0((year %/% 10) * 10, "s")
  )) %>%
  group_by(period) %>%
  summarise(
    total_export = sum(export_value, na.rm = TRUE),
    mean_export = mean(export_value, na.rm = TRUE),
    median_export = median(export_value, na.rm = TRUE),
    .groups = 'drop'
  )

gold_exports_decade %>%
  gt() %>%
  cols_label(
    period = "decade",
    total_export = "Total Export (Million $)",
    mean_export = "Mean Export (Million $)",
    median_export = "Median Export (Million $)"
  ) %>%
  fmt_number(
    columns = c(total_export, mean_export, median_export),
    decimals = 2
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_fill(color = "skyblue"),
    locations = cells_body(rows = period == "1960s")
  ) %>%
  tab_style(
    style = cell_fill(color = "#f12346"), 
    locations = cells_body(rows = period == "2010s")
  )

Data Visualization Portfolio

Code Output