A Denver Case Study: Where to open a new coffee shop?

Ata Özarslan
8 min readMay 4, 2024

As a nation, we’re obsessed with coffee. Our entire day seems to revolve around the beverage, with many of us unable to get out of bed in the morning without the warm embrace of a cup of coffee.

  • When we get up in the morning,
  • When we want to have something with us while working
  • When we hang out with friends outside
  • When we want to focus on a task and so on…
Seems Like This :)

If you love coffee as much as we do, you might be curious about the coffee statistics and how this obsession translates into numbers.

According to National Coffee Association survey, the percentage of people who say they drank coffee in the past day has increased by nearly %20 percent from 2004 to 2024.

As we can see in the graph above, this rapid increase in coffee consumption in our daily lives is also affecting the rate at which coffee shops are opening. At this point,

Our study aims to identify the optimal locations for a new coffee shop in Denver to maximize profits.

For this purpose, we will use 3 different datasets.

You can find the data dictionary of these datasets below.

Starbucks Locations in Denver, Colorado

  • “StoreNumber” — Store Number as assigned by Starbucks
  • “Name” — Name identifier for the store
  • “PhoneNumber” — Phone number for the store
  • “Street 1, 2, and 3” — Address for the store
  • “PostalCode” — Zip code of the store
  • “Longitude, Latitude” — Coordinates of the store

Neighborhood’s Geographical Information

  • “NBHD_ID” — Neighborhood ID (matches the census information)
  • “NBHD_NAME” — Name of the statistical neighborhood
  • “Geometry” — Polygon that defines the neighborhood

Demographic Information

  • “NBHD_ID” — Neighborhood ID (matches the geographical information)
  • “NBHD_NAME’ — Neighborhood name
  • “POPULATION_2010' — Population in 2010
  • “AGE_ “ — Number of people in each age bracket (< 18, 18–34, 35–65, and > 65)
  • “NUM_HOUSEHOLDS” — Number of households in the neighborhood
  • “FAMILIES” — Number of families in the neighborhood
  • “NUM_HHLD_100K+” — Number of households with income above 100 thousand USD per year

Let’s start our study.

NOTE: You can find all the data and the Jupyter Notebook file used in this project at this GitHub link.

1. Understanding & Reorganizing Data

Let’s start by getting to know our datasets.

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

denver = pd.read_csv('data/denver.csv')
neighborhoods = gpd.read_file('data/neighborhoods.shp')
census = pd.read_csv('data/census.csv')

We see 2 different data formats in our datasets, csv and shp.

  • CSV is a general-purpose format for tabular data,
  • SHP is a specialized format for storing spatial data with associated attributes.
denver.csv
neighborhoods.shp
census.csv

In denver.csv file, it contains longitude and latitude information but it is a csv file. Therefore we must convert it to geospatial data. We can do this withshapley.

from shapely.geometry import Point

# Obtaining Longitude and Latitude information to create point data
denver_geometry = [Point(xy) for xy in zip(denver['Longitude'], denver['Latitude'])]

# Adding point data to our GeoDataFrame
denver = gpd.GeoDataFrame(denver, crs='EPSG:4326', geometry=denver_geometry)
denver.drop(columns=['Longitude','Latitude'], inplace=True)
denver

It will look like this.

Denver Data with Geographical Information

If we look more carefully, we can realize that our neighborhoods and census data contains information about the neighborhoods of Denver and both of them contains NBHD_ID column. So, we can merge these data.

neighborhoods = pd.merge(neighborhoods, census, on='NBHD_ID')
neighborhoods
Merged DataFrame

Now, let’s check their coordinate reference system (CRS). When working with geospatial data, it is so important that their coordinate reference systems are correctly defined and the same.

denver.crs

When we check the CRS of our denver GeoDataFrame, we will see look like this.

<Geographic 2D CRS: EPSG:4326>

Name: WGS 84

Axis Info [ellipsoidal]:

— Lat[north]: Geodetic latitude (degree)

— Lon[east]: Geodetic longitude (degree)

Area of Use:

— name: World.

— bounds: (-180.0, -90.0, 180.0, 90.0)

Datum: World Geodetic System 1984 ensemble

— Ellipsoid: WGS 84

— Prime Meridian: Greenwich

It says that our GeoDataFrame is in EPSG:4326 coordinate system and its geographic location is defined as Latitude and Longitude variables.

For more information about the EPSG:4326 coordinate system, you can check this link.

What about neighborhoods data?

neighborhoods.crs

But it can not return any value. It means that our neighborhoods data does not have a CRS. So, we must define it ourselves!

neighborhoods.set_crs('EPSG:4326', inplace=True)

Our project area focused on Denver, therefore we must work with only Starbucks Coffee Shop in Denver. We should check to see if there’s a coffee shop outside Denver in our datasets.

For this process, we will use clip function.

denver = denver.clip(neighborhoods)
denver

If we check our GeoDataFrame after that process, we will see that our data count is down from 78 to 74. It means that there are 4 coffee locations outside of Denver in our GeoDataFrame.

2. Obtaining Results

Finally, we can start to prepare our maps to see our findings. Firstly, let’s try to create a basic map that shows the Starbucks locations in Denver.

denver.plot();

It looks like so meaningfulness. So, let’s add a basemap for a better visual. To do this, we will use contextily.

import contextily as cx

denver_plot = denver.plot(color='red', markersize=10)

plt.title('Starbucks Coffee Shops in Denver', weight='bold')

cx.add_basemap(denver_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);

It looks like more meaningful than our other map. But, we can make it more beautiful.

fig, ax = plt.subplots(figsize=[12,6])

# Creating neighborhood boundaries
neighborhoods_names = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)

# Creating neighborhood areas
neighborhoods_plot = neighborhoods.plot(ax=ax, alpha=0.5)

# Creating coffee locations
denver_plot = denver.plot(ax=ax, color='red', markersize=10)

plt.title('Starbucks Coffee Shop Distribution Map in Denver', weight='bold')

cx.add_basemap(neighborhoods_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);

Alright! But we can not see clearly how many coffees it has in different neighborhoods in Denver. Let’s try to obtain that information.

# Finding the coffee count in Denver
joined = gpd.sjoin(denver, neighborhoods, how="inner", predicate="within")

# Grouping our findings
coffee_count = pd.DataFrame(joined.groupby('NBHD_ID').size(), columns=['Coffee_Count']).reset_index()
coffee_count.T
Number of Coffees in Different Neighborhoods

And we can merge our findings with our original GeoDataFrame.

neighborhoods = pd.merge(neighborhoods, coffee_count, on='NBHD_ID', how='left')
neighborhoods['Coffee_Count'] = neighborhoods['Coffee_Count'].fillna(0)
neighborhoods

Now, we can enhance our last map with our new feature Coffee_Count.

import contextily as cx

fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_names = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Coffee Count
neighborhoods_plot = neighborhoods.plot('Coffee_Count', ax=ax, alpha=0.7, legend=True, cmap='Reds')
denver_plot = denver.plot(ax=ax, color='blue', markersize=10)
plt.title('Starbucks Coffee Shop Density Map in Denver', weight='bold')

cx.add_basemap(neighborhoods_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);

More easily interpreted map!

In addition to these results, we all know that people in 18–34 are more likely to go to a coffee than other age groups.

What about the age distribution?

fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_boundary = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Age Between 18-34
neighborhoods_age_plot = neighborhoods.plot('AGE_18_TO_34', ax=ax, legend=True, cmap='inferno_r')
plt.title('Number of People Aged 18-34 in Denver', weight='bold')

cx.add_basemap(neighborhoods_age_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);

To easily see the coffee density across neighborhoods, we can divide the number of coffee shops by the number of people between 18 and 34.

neighborhoods['Coffee_Density'] = neighborhoods['Coffee_Count'] / neighborhoods['AGE_18_TO_34']
neighborhoods

If you look at the Coffee_Density column in our GeoDataFrame, you will see its value from 0 to 0.006974. These results can be hard to interpret. So, we can normalize them in the range 0–1.

def min_max_normalize(data):
min_val = min(data)
max_val = max(data)
normalized_data = [(x - min_val) / (max_val - min_val) for x in data]
return normalized_data

normalized_density = min_max_normalize(neighborhoods['Coffee_Density'])
neighborhoods['Coffee_Density'] = normalized_density
neighborhoods

Now, let’s create another map with our Coffee_Density feature.

fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_boundary = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Coffee Density
neighborhoods_density_plot = neighborhoods.plot('Coffee_Density', ax=ax, legend=True, cmap='coolwarm')
plt.title('Starbucks Coffee Shop Density Among 18-34 Years Old People in Denver', weight='bold')

cx.add_basemap(neighborhoods_density_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);

Lastly, let’s put all this information together.

  • We have lots of neighborhoods with no Starbucks Coffee Shops. When you want to open a new coffee shop, it would make the most sense to consider these neighborhoods first.
  • Also, neighborhoods with the highest number of people between the ages of 18 and 34 should be considered as the first choice to appeal to as large a customer base as possible.

Let’s visualize them.

# Filtering the neighborhoods with no Coffee Shops and most people aged 18-34
non_coffee_18_34 = neighborhoods[neighborhoods['Coffee_Density']==0].sort_values('AGE_18_TO_34', ascending=False)

fig, ax = plt.subplots(figsize=[12,6])
non_coffee_18_34_boundary = non_coffee_18_34.head().geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Finding the first 5 neighborhoods matching our filter
non_coffee_18_34_plot = non_coffee_18_34.head().plot(ax=ax, color='red', alpha=0.7)
non_coffee_18_34_centroid = non_coffee_18_34.head().geometry.centroid.plot(ax=ax, markersize=10, color='black')
plt.title('Neighborhoods without a Starbucks Coffee with the Highest Density of 18-34 Year Olds in Denver', weight='bold')

# Writing the names of neighborhoods on the map
for x, y, label in zip(non_coffee_18_34.head().geometry.centroid.x, non_coffee_18_34.head().geometry.centroid.y, non_coffee_18_34.head()['NBHD_NAME_x']):
non_coffee_18_34_boundary.annotate(label, xy=(x, y), xytext=(-20, 5), textcoords="offset points", fontsize=10)

cx.add_basemap(non_coffee_18_34_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);

Let us do the same analysis for the neighborhoods with the highest total population density.

# Filtering the neighborhoods with no Coffee Shops and the most people living
non_coffee_population = neighborhoods[neighborhoods['Coffee_Density']==0].sort_values('POPULATION_2010', ascending=False)

fig, ax = plt.subplots(figsize=[12,6])
non_coffee_population_boundary = non_coffee_population.head().geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Finding the first 5 neighborhoods matching our filter
non_coffee_population_plot = non_coffee_population.head().plot(ax=ax, color='red', alpha=0.7)
non_coffee_population_centroid = non_coffee_population.head().geometry.centroid.plot(ax=ax, markersize=10, color='black')
plt.title('Neighborhoods without a Starbucks Coffee with the Highest Density of Population in Denver', weight='bold')

for x, y, label in zip(non_coffee_population.head().geometry.centroid.x, non_coffee_population.head().geometry.centroid.y, non_coffee_population.head()['NBHD_NAME_x']):
non_coffee_population_boundary.annotate(label, xy=(x, y), xytext=(-20, 5), textcoords="offset points", fontsize=10)

cx.add_basemap(non_coffee_population_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);

We are done! I hope you like my article. To reach all of these codes in the project, you can visit my GitHub repository.

Thanks for giving time 😊

--

--