An Empirical Taxonomy of Common Curb Zoning Configurations in Seattle

URL: https://doi.org/10.32866/001c.32446

Publication: Findings

Publication Date: 2022

Summary:

We utilize an unsupervised learning algorithm called-modes clustering (Huang 1998), which is similar to the better-known-means method (Hartigan and Wong 1979), but with a dissimilarity measure designed for categorical variables (Cao et al. 2012), originally developed for analyzing sequential categorical data such as gene sequences (Goodall 1966), but also amenable to curb zoning types. For a specified, the-modes algorithm finds the top vectors that minimize a distance to all sample vectors in the training dataset. The resulting top modes are representative of distinct clusters of sample vectors, with cluster membership determined by the closest mode. The parameter is chosen through cross-validation by holding out portions of the available training data and finding the smallest that largely minimizes the within-cluster variation in this hold-out set (also called the “elbow method”). We utilize basic matching dissimilarity, as implemented in (Vos 2015). For two vectors and of length, where each element attains categorical values, matching dissimilarity is defined as, where denotes the indicator vector, with value 1 where the bracketed condition is true and 0 otherwise. We’ve chosen this measure of dissimilarity between two sets of categorical variables for a number of reasons: 1) its simplicity, 2) successful use in categorical data clustering (Goodall 1966), and 3) its sensitivity to the ordering of values when vectors and are ordered, specific to how we have chosen to represent curb zoning data.

Authors: Thomas MaxnerDr. Andisheh Ranjbari, Chase Dowling

Recommended Citation:
Dowling, Chase P., Thomas Maxner, and Andisheh Ranjbari. 2022. “An Empirical Taxonomy of Common Curb Zoning Configurations in Seattle.” Findings, February. https://doi.org/10.32866/001c.32446