Census tracts are the smallest administrative divisions that provide socio-economic and demographic data. They are highly useful for statistical and spatial analysis, offering a robust and reliable set of information at a fine geographical scale.
census
brazil
sao-paulo
data-science
data-visualization
maps
leaflet
ggplot2
Author
Vinicius Oike
Published
June 14, 2024
Understanding Census Tracts in Brazil
Census tracts are the smallest administrative areas for which socioeconomic and demographic data are available. Broadly speaking, census tracts are small areas that exhibit similar socioeconomic and demographic patterns.In another post, I presented all of the Brazilian administrative and statistical subdivisions.
Census tracts are the strata used by National Bureau of Statistics and Geography (IBGE) in their decennial Census. The shape of each census tract usually respects administrative borders, land barriers, public spaces (e.g. parks, beaches, etc.), and follows the shape of roads, highways, or city blocks.
Though they are typically small, census tracts size vary. Large plots of uninhabited land are commonly grouped into a single tract. In dense urban areas, census tracts are very small.
Census tracts exhibit relatively homogeneous socioeconomic and demographic characteristics. This makes census tracts a very useful statistical tool in regression analysis and classification.
The map below shows the 2022 census tracts in Curitiba, a major city in the Southern part of Brazil.
The package geobr (available for both Python and R) provides a convenient way to import census tracts directly into a session. The example code below shows how to import the most recent shape file for São Paulo’s census tracts1.
# Import census tracts for São Paulospo_tract <- geobr::read_census_tract(3550308, year =2022)
Using census tracts
Census tracts come with a large variety of socioeconomic and demographic data. The 2022 Census currently only has a limited set of variables, that include total population and the total number of households. More data should be released in the future.
The 2010 census tracts offers a much richer set of variables including demographic information on age, sex, and race as well as income and education. There is a clear trade-off however as this data is almost 15 years old.
Demographic data
The maps below show basic demographic information from the 2022 Census highlighting the central district in Curitiba. For simplicity, I omit the color legend but darker shades of blue represent higher values, while lighter/whiter shades of blue represent lower values. Technically, these are quintile maps, where the underlying numerical data was ordered and binned into five equal-sized groups.
More sophisticated analysis can be made using census tracts, but - currently - this is only possible using 2010 data. The map below shows household income data at a census tract level for the entire city of Curitiba. The map shows higher levels of income in the city center and lower levels of income in the city’s periphery.
While the absolute values of income are certainly out of date, the overall spatial distribution of the data might still be similar. Since this maps uses deciles of income, instead of the actual income value, it can still communicate valuable information. This is, of course, a strong hypothesis that might be more or less valid in different contexts.
Code
curitiba_income <- curitiba_income |>mutate(decile =ntile(income_pc, 10))ggplot(curitiba_income) +geom_sf(aes(fill = decile, color = decile)) +scale_fill_fermenter(name ="HH Income per capita (deciles)",palette ="Spectral",breaks =1:10,labels =c("Bottom 10%", rep("", 8), "Top 10%"),na.value ="gray50",direction =1) +scale_color_fermenter(name ="HH Income per capita (deciles)",palette ="Spectral",breaks =1:10,labels =c("Bottom 10%", rep("", 8), "Top 10%"),na.value ="gray50",direction =1) +labs(title ="Curitiba: Household Income per capita (deciles)") + ggthemes::theme_map() +theme(plot.title =element_text(size =18, hjust =0.5),legend.position =c(0.9, 0.05) )
There was a significant increase in the number of census tracts in Brazil. While the total number of cities barely changed, the total number of census tracts increased by more than 40%, reaching over 454 thousand.
The table below summarizes the changes in the past 3 editions of the Census.
Year
Num. Cities
Num. Census Tracts
Population
2000
1.058
254.855
169.590.693
2010
5.565
316.545
190.755.799
2022
5.571
454.047
203.080.756
Higher Resolution
An obvious improvement from the 2010 census tracts is the gain in spatial resolution. The maps below show the central area of Curitiba where there was a 33% increase in the number of total census tracts.
Important considerations
Some important considerations to keep in mind when working with census tracts.
Shape is not time consistent.
Number of census tracts varies.
Area of census tracts can be very different.
Census tracts follow the evolution of cities. As cities grow in size and complexity, so do the shape and number of the tracts. In the 2010 Census, São Paulo had nearly 19.000 census tracts, in the most recent Census it had 27.600 tracts, a 45% increase. Note that the borders of the city remained unchanged during this period.
New census tracts aren’t necessarily subsets of previously existing tracts. This means that comparing census tracts through time is not so straightforward. A way to standardize census tracts is to “dissolve” them into a statistical grid and either (1) use this higher resolution grid directly; or (2) use the grid as an intermediary step to interpolate different census data into a common standard.
Census tracts can vary a lot in size. It’s a common practice to include large chunks of uninhabited land (e.g. public parks) into a single huge census tract. This means that creating “intensive” variables such as population density can be tricky in some cases.
While the size of census tracts varies, the underlying demographic data can be much more well-behaved. The histograms below show the distribution of (1) total population and the (2) average number of persons per household among all census tracts in Curitiba. Population data is relatively normal-shaped, with a right skew; note that the distribution also has a small spike at zero, since there are several non-populated census tracts. To improve this visualization, I remove outliers from the second plot2.
Dealing with spatial inconsistency
Spatial interpolation
A simple strategy to deal with census tracts’ spatial inconsistency is to define a common spatial grid. This means choosing either a (1) grid of triangles; (2) grid of squares; or (3) grid of hexagons. In this example, I choose a simple squared grid.
The process of converting one set of data (stored in a particular shape) into another shape is called spatial interpolation. Spatial interpolation is also referred to as areal interpolation or dasymetric interpolation.
Essentially, the problem we are tryting to solve is: we have some data stored in a (big) shape and we wish to estimate the same data in another (smaller) shape. In this case, we will dissolve population count data, stored in the shape of the 2010 census tracts and 2020 census tracts, into a finer squared grid.
To convert the data from one shape to the other we implicitly assume that the variable (population) is uniformly distributed over the shape’s space. That is, we assume that every single person is evenly distributed across each census tract. This assumption works well in small densely populated tracts, but doesn’t hold as well in larger tracts.
Spatial grid for Census Data
The maps below show the same Census household data in a squared 500x500m grid. While I omit the color legend of the plots, I scaled them equally as to make them directly comparable; also, darker shades of green indicate higher values, while lighter shades of green indicate lower values.
The 2010 Census data was directly imported using the censobr R package. To estimate a simple areal interpolation I use the areal package. Executing the areal interpolation involves a few intermediary steps such as choosing a valid UTM CRS.
The final result shows the number of households in 2010 and 2022 in a common spatial grid. This process can be replicated across the entire city to allow an easier comparison of the data.
Code
library(censobr)library(areal)# Get demographic data for 2010 censusbasico <- censobr::read_tracts(dataset ="Basico")# Get number of households (private)basico <- basico %>%filter(code_muni ==4106902) %>%select(code_tract, dom = V001) %>%collect()# Join with census tract shapesetores_centro_antigo <-left_join(setores_centro_antigo, basico, by ="code_tract")# Create a gridgrid_centro <- setores_centro %>%st_union() %>%st_make_valid() %>%st_transform(crs =32722) %>%st_make_grid(cellsize =500) %>%st_as_sf() %>%st_transform(crs =4674) %>%mutate(gid =row_number(.))# Join census data with the new gridgrid_2010 <- setores_centro_antigo %>%select(dom) %>%st_interpolate_aw(grid_centro, extensive =TRUE) # Join 2020 census data with the new gridgrid_2020 <- setores_centro %>%select(dom_prt) %>%st_interpolate_aw(grid_centro, extensive =TRUE)d1 <- grid_2010 %>%st_centroid() %>%st_join(grid_centro) %>%st_drop_geometry() %>%as_tibble() %>%rename(dom_2010 = dom)d2 <- grid_2020 %>%st_centroid() %>%st_join(grid_centro) %>%st_drop_geometry() %>%as_tibble() %>%rename(dom_2020 = dom_prt)full_grid <- grid_centro %>%left_join(d1) %>%left_join(d2)bbox <- setores_centro |>st_union() %>%st_transform(crs =32722) %>%st_buffer(dist =100) %>%st_transform(crs =4674) %>%st_bbox()m3 <- full_grid %>%mutate(chg = (dom_2020 / dom_2010 -1) *100,chg_bin =cut(chg, breaks =c(-Inf, 0, 50, 100, Inf)) ) %>%st_make_valid() %>%filter(!is.na(chg_bin)) %>%ggplot() +geom_sf(aes(fill = chg_bin), color ="white") +scale_fill_brewer(palette ="Greens") +guides(fill ="none") +theme_void() +theme(plot.title =element_text(hjust =0.5))
Conclusion
Census tracts are small statistical areas that exhibit similar socioeconomic and demographic patterns. They are useful for statistical and spatial analysis, offering a robust and reliable set of information at a fine geographical scale.
Census tracts change through time, making it difficult to compare them. A workaround is to dissolve them into common spatial grid to facilitate comparisons.