Administrative and Statistical Divisions in Brazil

In this post I present the main administrative and statistical subdivisions of the Brazilian territory. All information comes from IBGE, the official statistical bureau. I also show how to easily interpolate data from these shapefiles to custom statistical grids such as Uber’s H3 hexagons.
brazil
maps
data-science
data-visualization
ggplot2
english
Author

Vinicius Oike

Published

March 19, 2024

Overview

Brazilian geographical hierarchy comprises several fundamental concepts. From an administrative standpoint, the smallest delineations are either zip code areas or neighborhoods. While zip codes offer greater precision, their accessibility in terms of shapefiles is limited. Therefore, our focus primarily lies on neighborhoods.

Moving up the hierarchy, we encounter cities (also referred to as municipalities), metropolitan regions, states (including the Distrito Federal), regional divisions, and the entirety of the country. As of 2022, Brazil is composed of 5,568 cities, 77 metropolitan regions plus 1 RIDE, 26 states, and 5 regional divisions.

From a statistical perspective, the Brazilian Institute of Geography and Statistics (IBGE) delineates three other crucial spatial units: census tracts (setores censitários), weighting areas (áreas de ponderação), and the statistical grid. Finally, there are also other sets of cities, that are more specially defined than typical metropolitan regions:

  • Population arrangements (arranjos populacionais): a grouping of two or more municipalities where there is a strong population integration due to commuting for work or study, or due to contiguity between the main urbanized areas.

  • Urban Concentration areas (concentracoes urbanas): isolated cities or population arrangements with over 100,000 inhabitants.

  • Meso Regions and Micro Regions:

  • Intermediate Regions and Imediate Regions:

Additionally, there are other noteworthy spatial delineations, such as the Unidades de Desenvolvimento Humano, introduced by the Institute for Applied Economic Research (IPEA) in a 2013 study, and the Origin-Destination Zones, utilized by Metro to segment the São Paulo metropolitan region.

In summary,

Administrative Divsions

  • Macro Regions

  • States

  • Metropolitan Regions

  • Cities

  • Neighborhood

  • Districts and subdistricts

  • Zip-code area

Statistical Division

  • Weighting Areas

  • Census Tracts

  • Statistical Grid

Administrative Divisions

To illustrate the above shapes consider the city of Curitiba. Curitiba is the capital city of the state of Paraná and the 8th most populous capital in Brazil and also the 8th smallest capital which helps keep the maps smaller. The state of Paraná is also relatively small.

Importing these shapefiles into R is made very easy thanks to the excellent geobr package.

Code
library(geobr)
library(censobr)
library(sidrar)
library(sf)
library(leaflet)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(ggplot2)
library(patchwork)
library(showtext)
library(MetBrewer)
library(h3)
library(areal)

code_cur <- 4106902

# Region borders
region <- read_region(showProgress = FALSE)
# State borders
state <- read_state(showProgress = FALSE)
# Metropolitan regions borders
metro <- read_metro_area(showProgress = FALSE)
# City border
border <- read_municipality(code_cur, simplified = FALSE, showProgress = FALSE)
# Neighborhoods
nb <- read_neighborhood(showProgress = FALSE)

# Weighting areas 
wa <- read_weighting_area(code_cur, showProgress = FALSE, simplified = FALSE)
# Census tracts
ct <- read_census_tract(code_cur, showProgress = FALSE, simplified = FALSE)

Regions

The map below shows where Paraná is in Brazil. Paraná is in the South region. As mentioned, there are 5 regions in Brazil: North, Northeast, Midwest, Southeast, and South.

Code
#> Define a factor variable to signal the South region
region <- region |> 
  mutate(south = factor(if_else(name_region == "Sul", 1L, 0L)))

#> Get the centroid of the city of Curitiba
curitiba <- st_centroid(border)

translate <- c(
  "Norte" = "North",
  "Nordeste" = "Northeast",
  "Centro Oeste" = "Midwest",
  "Sul" = "South",
  "Sudeste" = "Southeast"
  )

region_label <- region %>%
  st_centroid() %>%
  mutate(label = stringr::str_replace_all(name_region, translate))

ggplot() +
  geom_sf(data = region, aes(fill = name_region), color = "white", lwd = 0.15) +
  geom_sf_label(
    data = region_label,
    aes(label = label),
    family = "Lato",
    label.padding = unit(0.15, "lines")
    ) +
  geom_sf(data = curitiba) +
  scale_fill_met_d("Hokusai1") +
  labs(title = "Regions in Brazil") +
  theme_vini

States

There are 26 states (equivalent to provinces in some countries). Distrito Federal is an importatn exception. Although it is equivalent to a state, it doesn’t contain cities but administrative regions (regiões administrativas). The capital of Brazil, Brasília, is located within the Distrito Federal.

Code
state <- state |> 
  mutate(parana = factor(if_else(code_state == 41, 1L, 0L)))

parana_label <- state |> 
  filter(code_state == 41) |> 
  st_centroid()

ggplot() +
  geom_sf(
    data = state,
    aes(fill = parana),
    color = "white"
    ) +
  geom_sf(data = curitiba) +
  geom_sf_label(
    data = parana_label,
    aes(label = name_state),
    family = "Lato",
    nudge_x = -1,
    size = 3
    ) +
  scale_fill_manual(values = met.brewer("Hokusai1", 3)[c(2, 3)]) +
  guides(fill = "none") +
  theme_vini

Metropolitan Regions

As shown above, there exist several groupings of cities in Brazil, each with a distinct goal. To keep this post simple, I will focus only on metropolitan regions. As of 2018, the metropolitan region of Curitiba contained 29 cities.

Code
ggplot() +
  geom_sf(
    data = cities_metro,
    aes(fill = curitiba),
    alpha = 0.85,
    color = "white") +
  guides(fill = "none") +
  scale_fill_manual(values = met.brewer("Hokusai1", 3)[c(2, 3)]) +
  # scale_fill_manual(values = pal[c(2, 4)]) +
  labs(title = "Curitiba Metro Region") +
  theme_vini

The second map below highlights the name of each city.

Code
cities_metro <- cities_metro %>%
  mutate(
    name_label = stringr::str_wrap(name_muni, width = 8)
  )

ggplot() +
  geom_sf(data = cities_metro, aes(fill = name_muni)) +
  geom_sf_label(
    data = cities_metro,
    aes(label = name_label),
    size = 4) +
  scale_fill_manual(values = MetBrewer::met.brewer("Hokusai1", nrow(cities_metro))) +
  theme_vini

Cities

The map below shows only the city of Curitiba.

Code
ggplot() +
  geom_sf(data = border) +
  labs(title = "The City of Curitiba") +
  theme_vini

Neighborhoods

Finally, there is an official definition of neighborhoods provided by IBGE for only some cities. Most notably, the city of São Paulo does not have an official definition of its neighborhoods. There are also districts and subdistricts but we ignore both of these for the purpose of this exposition.

There are 78 neighborhoods in Curitiba. The map below highlights all of them.

Code
nb <- nb |> 
  filter(code_muni == code_cur)

ggplot() +
  geom_sf(data = nb, lwd = 0.15, aes(fill = name_neighborhood), color = "white") +
  scale_fill_manual(values = met.brewer("Hokusai1", nrow(nb))) +
  labs(title = "Neighborhoods of Curitiba") +
  theme_vini

Statistical Subdivisions

Code
cur_sg <- qs::qread(
  here::here("static/data/statistical_grid_cur.qs")
)

Statistical grid

The smallest statistical subdivision available is the statistical grid. This is a varying size squared grid that offers a 200 x 200 m resolution in urban areas. It contains minimal information and is most useful as an intermediate shape in dasymetric interpolations. More specifically, it contains a population and household counts. This grid splits Curitiba into 11259 equally sized quadrants.

The map below shows the population counts at the statistical grid level. Since the area of all quadrants is the same, this data can be interpreted as the population density at the 200 x 200 m level.

Code
ggplot(cur_sg) +
  geom_sf(aes(fill = sqrt(POP), color = sqrt(POP))) +
  scale_fill_distiller(palette = "BuPu", direction = 1) +
  scale_color_distiller(palette = "BuPu", direction = 1) +
  labs(title = "Population count") +
  theme_vini

Census Tracts

Census tracts are the strata used by IBGE in their decennial Census. The shape of each census tract usually respects administrative borders, land barriers, public spaces (parks, beaches, etc.), and follows the shape of the city blocks. Census tracts also exhibit relatively homogeneous socioeconomic and demographic characteristics. This makes census tracts a very useful statistical tool in regression analysis and classification.

The map below shows the 2395 census tracts in Curitiba.

Code
ggplot(ct) +
  geom_sf(lwd = 0.15) +
  labs(title = "Census Tracts") +
  theme_vini

It is important to note that these shapes are not temporally consistent, meaning that they changed from 1991 to 2010 and then again from 2010 to 2020. Academics have devised methods to make the census tracts compatible over time and IBGE has also announced that they will facilitate backwards compatibility.

All census tracts contain basic information on population and households (age, race, sex, family configuration, household type, income group) and some information of the infrastructure of the region (garbage, sewage, trees, etc.). Their shape tries to create areas that share similar socioeconomic and demographic characteristics.

Weighting Areas

The weighting areas are an aggregation of census tracts for which there are much more detailed information. For instance, one can estimate the number of 3-bedroom apartments that are rented in a specific weighting area. Information on income, education, and housing quality are also available.

There are 55 weighting areas in Curitiba.

Code
ggplot() +
  geom_sf(data = wa) +
  labs(title = "Weighting Areas") +
  theme_vini

To illustrate how one can aggregate census tracts into weighting areas consider the map below that overlaps both. For this map I only take a subset of weighting areas near the city’s CBD.

Code
#> Create a bounding box around the city's Central Business District (CBD)
cbd_bbox <- st_bbox(
  c(ymin = -25.442283, ymax = -25.421747, xmin = -49.285610, xmax = -49.256937),
  crs = 4326)
#> Create an identifier to filter after joins
cbd_bbox <- st_as_sfc(cbd_bbox)
cbd_bbox <- st_as_sf(cbd_bbox)
cbd_bbox$gid <- 1L

center <- st_coordinates(st_centroid(cbd_bbox))

cbd_wa <- wa |> 
  st_transform(crs = 4326) |> 
  st_join(cbd_bbox) |> 
  filter(!is.na(gid))

ct_inside <- ct %>%
  st_centroid() %>%
  st_transform(crs = 4326) %>%
  st_join(cbd_wa) %>%
  filter(!is.na(code_weighting)) |> 
  pull(code_tract) |> 
  unique()

cbd_ct <- filter(ct, code_tract %in% ct_inside)

m1 <- ggplot() +
  geom_sf(data = cbd_wa, fill = NA, color = "#35978f", lwd = 1) +
  labs(title = "Weighting Areas") +
  theme_vini

m2 <- ggplot() +
  geom_sf(data = cbd_ct, fill = NA, color = "#bf812d") +
  labs(title = "Census Tracts") +
  theme_vini

m3 <- ggplot() +
  geom_sf(data = cbd_wa, fill = NA, color = "#35978f", lwd = 1) +
  geom_sf(data = cbd_ct, fill = NA, color = "#bf812d") +
  labs(title = "Overlap") +
  theme_vini

library(patchwork)

m1 | m2 | m3

Other statistical subdivisions

There are many possible ways to bin the world into equal geometric shapes. Hexagons, squares, and triangles are the most commonly used. Below I show Uber’s H3 hexagons and a native st_make_grid() together with the official statistical grid shown previously.

Code
#> Join statistical grid with CBD
cbd_sg <- cur_sg |> 
  st_transform(crs = 4326) |> 
  st_join(cbd_bbox) |> 
  filter(!is.na(gid))

hex_id <- polyfill(cbd_bbox, res = 9)
h3_grid <- h3_to_geo_boundary_sf(hex_id)

grid_500 <- cbd_bbox |> 
  st_transform(crs = 32722) |> 
  st_make_grid(500, square = FALSE, flat_topped = FALSE) |> 
  st_as_sf() |> 
  st_transform(crs = 4326)

m1 <- ggplot(cbd_sg) +
  geom_sf() +
  ggtitle("Statistical Grid") +
  theme_vini

m2 <- ggplot(h3_grid) +
  geom_sf() +
  ggtitle("Uber H3 (res. 9)") +
  theme_vini

m3 <- ggplot(grid_500) +
  geom_sf() +
  ggtitle("st_make_grid()") +
  theme_vini

m1 | m2 | m3

The interactive map below illustrates the size of each hexagon in relation to the city. For simplicity I show only hexagons near the city center.

Code
leaflet() %>%
  addTiles() %>%
  addPolygons(data = h3_grid, weight = 1) %>%
  addProviderTiles(providers$CartoDB)

Highlight Information

Shapefiles by themselves don’t convey much information. We can combine these shapefiles with other geographic data to make informative chroplethic maps. The main advantage of working with official administrative or statistical shapefiles is that information is readily available.

Neighborhoods by population density

We will make a simple map that shows the population and population density of each neighborhood in Curitiba. The first step is to get the population data from SIDRA using the sidrar package.

Code
# Import population table from SIDRA
pop_nb <- sidrar::get_sidra(
  x = 1378,
  variable = 93,
  classific = "c2",
  geo = "Neighborhood",
  geo.filter = list("City" = code_cur)
)

# Clean table and make it wide
pop_nb <- pop_nb |> 
  as_tibble() |> 
  janitor::clean_names() |> 
  select(code_neighborhood = bairro_codigo, sex = sexo, count = valor) |> 
  mutate(
    sex = str_replace(sex, "Homens", "Male"),
    sex = str_replace(sex, "Mulheres", "Female")
    ) |> 
  pivot_wider(
    id_cols = "code_neighborhood",
    names_from = "sex",
    values_from = "count"
    )

# Make neighborhood codes compatible
pop_nb <- pop_nb |> 
  mutate(code_neighborhood = str_c(
    str_sub(code_neighborhood, 1, 7), "05", str_sub(code_neighborhood, 8, 10))
    )
# Join census table with shapefile
cur_nb <- left_join(nb, pop_nb, by = "code_neighborhood")

# Calculate population density
cur_nb <- cur_nb %>%
  st_transform(crs = 32722) %>%
  mutate(
    area = st_area(.),
    area = as.numeric(area) / 1e5,
    pop_dens = Total / area,
    pop_ntile = ntile(pop_dens, 5)
    )
# Convert back to 4326 for leaflet
cur_nb <- st_transform(cur_nb, crs = 4326)

After cleaning and merging the datasets we can make an interactive map using leaflet.

Code
# Color palette and bins
bins <- quantile(cur_nb$pop_dens, probs = seq(0.2, 0.8, 0.2))
bins <- c(0, bins, max(cur_nb$pop_dens))
pal <- colorBin(
  palette = as.character(met.brewer("Hokusai2", 5)),
  domain = cur_nb$pop_dens,
  bins = bins)

# Labels
labels <- sprintf(
  "<strong>%s</strong><br/> %s people <br/> %g people / ha<sup>2</sup>",
  cur_nb$name_neighborhood,
  format(cur_nb$Total, big.mark = "."),
  round(cur_nb$pop_dens, 1)
  )

labels <- lapply(labels, htmltools::HTML)

# Center of the map for zoom
center <- st_coordinates(st_centroid(border))

leaflet(cur_nb) %>%
  addTiles() %>%
  addPolygons(
    weight = 2,
    color = "white",
    fillColor = ~pal(pop_dens),
    fillOpacity = 0.8,
    highlightOptions = highlightOptions(
      color = "gray20",
      weight = 10,
      fillOpacity = 0.8,
      bringToFront = TRUE),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", "font-family" = "Fira Code")
    )
  ) %>%
  addLegend(
    pal = pal,
    values = ~pop_dens,
    title = "Densidade Pop.",
    position = "bottomright"
  ) %>%
  addProviderTiles(providers$CartoDB) %>%
  setView(lng = center[1], lat = center[2], zoom = 11)

Interpolation

When using different shapefiles it’s necessary to interpolate data from the “source” shapefile to the desired “target” shapefile. For instance, we might have the total number of apartments by census tract (source) but desire to have the same number by H3 hexagons (target). In this example, the number of apartments would be our target variable that should be interpolated from one shape to another.

For simplicity sake, I show how to make a simple areal interpolation from Curitiba’s census tracts to a hexagonal H3 grid.

Import data

I import census tract level socioeconomic data using censobr. I retrieve the total number of houses, apartments, and some information on housing ownership (i.e. rented, owned, etc.).

Code
library(censobr)

hh <- read_tracts(dataset = "Domicilio", showProgress = FALSE)

cur_domicilios <- hh |> 
  filter(code_muni == code_cur) |> 
  select(code_tract, domicilio01_V002:domicilio01_V011) |> 
  collect()

col_names <- c(
  "hh_total", "hh_house", "hh_cndm", "hh_apt",
  "hh_owned1", "hh_owned2", "hh_rented", "hh_cedido1", "hh_cedido2", "hh_other"
)

names(cur_domicilios)[-1] <- col_names

cur_census <- left_join(ct, cur_domicilios, by = "code_tract")

The map below shows the total number of apartments in Curitiba by census tract.

Code
ggplot(cur_census) +
  geom_sf(aes(fill = sqrt(hh_apt), color = sqrt(hh_apt))) +
  theme_vini

Interpolation

To interpolate we use the areal package. It requires the shapefile to be inputed in a planar projection.

Code
index <- h3::polyfill(border, res = 9)
h3_grid <- h3::h3_to_geo_boundary_sf(index)

h3_grid <- st_transform(h3_grid, crs = 31984)
cur_census <- st_transform(cur_census, crs = 31984)

#> Compute areal interpolation
interp <- aw_interpolate(
  h3_grid,
  tid = h3_index,
  source = cur_census,
  sid = "code_tract",
  weight = "sum",
  extensive = "hh_apt"
)

interp <- st_transform(interp, crs = 4326)

Results

Finally, the panel below shows the result of the interpolation.

Code
m1 <- ggplot(cur_census) +
  geom_sf(aes(fill = sqrt(hh_apt), color = sqrt(hh_apt))) +
  ggtitle("Census Tract") +
  scale_fill_distiller(
    name = "Apartaments (total)",
    palette = "BuPu",
    breaks = c(5, 10, 15, 20, 25),
    labels = c(5, 10, 15, 20, 25)^2,
    direction = 1) +
  scale_color_distiller(
    name = "Apartaments (total)",
    palette = "BuPu",
    breaks = c(5, 10, 15, 20, 25),
    labels = c(5, 10, 15, 20, 25)^2,
    direction = 1) +
  ggthemes::theme_map(base_family = "Lato")

m2 <- ggplot(h3_grid) +
  ggtitle("H3 (res. 9)") +
  geom_sf(lwd = 0.05) +
  theme_vini

m3 <- ggplot(interp) +
  geom_sf(aes(fill = sqrt(hh_apt), color = sqrt(hh_apt))) +
  scale_fill_distiller(
    name = "Apartaments (total)",
    palette = "BuPu",
    breaks = c(5, 10, 15, 20, 25),
    labels = c(5, 10, 15, 20, 25)^2,
    direction = 1) +
  scale_color_distiller(
    name = "Apartaments (total)",
    palette = "BuPu",
    breaks = c(5, 10, 15, 20, 25),
    labels = c(5, 10, 15, 20, 25)^2,
    direction = 1) +
  ggtitle("Interpolated") +
  ggthemes::theme_map(base_family = "Lato")

panel <- m1 | m2 | m3

panel + plot_layout(guides = "collect") &
  theme(
    legend.position = "bottom",
    legend.justification = 0.5,
    legend.key.size = unit(0.75, "cm"),
    legend.key.width = unit(1, "cm"),
    legend.text = element_text(size = 12),
    text = element_text(family = "Lato")
    )