A Denver Case Study: Where to open a new coffee shop?
As a nation, we’re obsessed with coffee. Our entire day seems to revolve around the beverage, with many of us unable to get out of bed in the morning without the warm embrace of a cup of coffee.
- When we get up in the morning,
- When we want to have something with us while working
- When we hang out with friends outside
- When we want to focus on a task and so on…
If you love coffee as much as we do, you might be curious about the coffee statistics and how this obsession translates into numbers.
According to National Coffee Association survey, the percentage of people who say they drank coffee in the past day has increased by nearly %20 percent from 2004 to 2024.
As we can see in the graph above, this rapid increase in coffee consumption in our daily lives is also affecting the rate at which coffee shops are opening. At this point,
Our study aims to identify the optimal locations for a new coffee shop in Denver to maximize profits.
For this purpose, we will use 3 different datasets.
- Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
- Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
- Census information from the United States Census Bureau. Publicly available information.
You can find the data dictionary of these datasets below.
Starbucks Locations in Denver, Colorado
- “StoreNumber” — Store Number as assigned by Starbucks
- “Name” — Name identifier for the store
- “PhoneNumber” — Phone number for the store
- “Street 1, 2, and 3” — Address for the store
- “PostalCode” — Zip code of the store
- “Longitude, Latitude” — Coordinates of the store
Neighborhood’s Geographical Information
- “NBHD_ID” — Neighborhood ID (matches the census information)
- “NBHD_NAME” — Name of the statistical neighborhood
- “Geometry” — Polygon that defines the neighborhood
Demographic Information
- “NBHD_ID” — Neighborhood ID (matches the geographical information)
- “NBHD_NAME’ — Neighborhood name
- “POPULATION_2010' — Population in 2010
- “AGE_ “ — Number of people in each age bracket (< 18, 18–34, 35–65, and > 65)
- “NUM_HOUSEHOLDS” — Number of households in the neighborhood
- “FAMILIES” — Number of families in the neighborhood
- “NUM_HHLD_100K+” — Number of households with income above 100 thousand USD per year
Let’s start our study.
NOTE: You can find all the data and the Jupyter Notebook file used in this project at this GitHub link.
1. Understanding & Reorganizing Data
Let’s start by getting to know our datasets.
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
denver = pd.read_csv('data/denver.csv')
neighborhoods = gpd.read_file('data/neighborhoods.shp')
census = pd.read_csv('data/census.csv')
We see 2 different data formats in our datasets, csv
and shp
.
- CSV is a general-purpose format for tabular data,
- SHP is a specialized format for storing spatial data with associated attributes.
In denver.csv
file, it contains longitude
and latitude
information but it is a csv file. Therefore we must convert it to geospatial data. We can do this withshapley
.
from shapely.geometry import Point
# Obtaining Longitude and Latitude information to create point data
denver_geometry = [Point(xy) for xy in zip(denver['Longitude'], denver['Latitude'])]
# Adding point data to our GeoDataFrame
denver = gpd.GeoDataFrame(denver, crs='EPSG:4326', geometry=denver_geometry)
denver.drop(columns=['Longitude','Latitude'], inplace=True)
denver
It will look like this.
If we look more carefully, we can realize that our neighborhoods
and census
data contains information about the neighborhoods of Denver and both of them contains NBHD_ID
column. So, we can merge these data.
neighborhoods = pd.merge(neighborhoods, census, on='NBHD_ID')
neighborhoods
Now, let’s check their coordinate reference system (CRS). When working with geospatial data, it is so important that their coordinate reference systems are correctly defined and the same.
denver.crs
When we check the CRS of our denver
GeoDataFrame, we will see look like this.
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
— Lat[north]: Geodetic latitude (degree)
— Lon[east]: Geodetic longitude (degree)
Area of Use:
— name: World.
— bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
— Ellipsoid: WGS 84
— Prime Meridian: Greenwich
It says that our GeoDataFrame is in EPSG:4326
coordinate system and its geographic location is defined as Latitude
and Longitude
variables.
For more information about the EPSG:4326 coordinate system, you can check this link.
What about neighborhoods
data?
neighborhoods.crs
But it can not return any value. It means that our neighborhoods
data does not have a CRS. So, we must define it ourselves!
neighborhoods.set_crs('EPSG:4326', inplace=True)
Our project area focused on Denver, therefore we must work with only Starbucks Coffee Shop in Denver. We should check to see if there’s a coffee shop outside Denver in our datasets.
For this process, we will use clip
function.
denver = denver.clip(neighborhoods)
denver
If we check our GeoDataFrame after that process, we will see that our data count is down from 78 to 74. It means that there are 4 coffee locations outside of Denver in our GeoDataFrame.
2. Obtaining Results
Finally, we can start to prepare our maps to see our findings. Firstly, let’s try to create a basic map that shows the Starbucks locations in Denver.
denver.plot();
It looks like so meaningfulness. So, let’s add a basemap for a better visual. To do this, we will use contextily
.
import contextily as cx
denver_plot = denver.plot(color='red', markersize=10)
plt.title('Starbucks Coffee Shops in Denver', weight='bold')
cx.add_basemap(denver_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);
It looks like more meaningful than our other map. But, we can make it more beautiful.
fig, ax = plt.subplots(figsize=[12,6])
# Creating neighborhood boundaries
neighborhoods_names = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas
neighborhoods_plot = neighborhoods.plot(ax=ax, alpha=0.5)
# Creating coffee locations
denver_plot = denver.plot(ax=ax, color='red', markersize=10)
plt.title('Starbucks Coffee Shop Distribution Map in Denver', weight='bold')
cx.add_basemap(neighborhoods_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);
Alright! But we can not see clearly how many coffees it has in different neighborhoods in Denver. Let’s try to obtain that information.
# Finding the coffee count in Denver
joined = gpd.sjoin(denver, neighborhoods, how="inner", predicate="within")
# Grouping our findings
coffee_count = pd.DataFrame(joined.groupby('NBHD_ID').size(), columns=['Coffee_Count']).reset_index()
coffee_count.T
And we can merge our findings with our original GeoDataFrame.
neighborhoods = pd.merge(neighborhoods, coffee_count, on='NBHD_ID', how='left')
neighborhoods['Coffee_Count'] = neighborhoods['Coffee_Count'].fillna(0)
neighborhoods
Now, we can enhance our last map with our new feature Coffee_Count
.
import contextily as cx
fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_names = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Coffee Count
neighborhoods_plot = neighborhoods.plot('Coffee_Count', ax=ax, alpha=0.7, legend=True, cmap='Reds')
denver_plot = denver.plot(ax=ax, color='blue', markersize=10)
plt.title('Starbucks Coffee Shop Density Map in Denver', weight='bold')
cx.add_basemap(neighborhoods_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.5);
More easily interpreted map!
In addition to these results, we all know that people in 18–34 are more likely to go to a coffee than other age groups.
What about the age distribution?
fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_boundary = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Age Between 18-34
neighborhoods_age_plot = neighborhoods.plot('AGE_18_TO_34', ax=ax, legend=True, cmap='inferno_r')
plt.title('Number of People Aged 18-34 in Denver', weight='bold')
cx.add_basemap(neighborhoods_age_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);
To easily see the coffee density across neighborhoods, we can divide the number of coffee shops by the number of people between 18 and 34.
neighborhoods['Coffee_Density'] = neighborhoods['Coffee_Count'] / neighborhoods['AGE_18_TO_34']
neighborhoods
If you look at the Coffee_Density
column in our GeoDataFrame, you will see its value from 0 to 0.006974. These results can be hard to interpret. So, we can normalize them in the range 0–1.
def min_max_normalize(data):
min_val = min(data)
max_val = max(data)
normalized_data = [(x - min_val) / (max_val - min_val) for x in data]
return normalized_data
normalized_density = min_max_normalize(neighborhoods['Coffee_Density'])
neighborhoods['Coffee_Density'] = normalized_density
neighborhoods
Now, let’s create another map with our Coffee_Density
feature.
fig, ax = plt.subplots(figsize=[12,6])
neighborhoods_boundary = neighborhoods.geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Creating neighborhood areas with Coffee Density
neighborhoods_density_plot = neighborhoods.plot('Coffee_Density', ax=ax, legend=True, cmap='coolwarm')
plt.title('Starbucks Coffee Shop Density Among 18-34 Years Old People in Denver', weight='bold')
cx.add_basemap(neighborhoods_density_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);
Lastly, let’s put all this information together.
- We have lots of neighborhoods with no Starbucks Coffee Shops. When you want to open a new coffee shop, it would make the most sense to consider these neighborhoods first.
- Also, neighborhoods with the highest number of people between the ages of 18 and 34 should be considered as the first choice to appeal to as large a customer base as possible.
Let’s visualize them.
# Filtering the neighborhoods with no Coffee Shops and most people aged 18-34
non_coffee_18_34 = neighborhoods[neighborhoods['Coffee_Density']==0].sort_values('AGE_18_TO_34', ascending=False)
fig, ax = plt.subplots(figsize=[12,6])
non_coffee_18_34_boundary = non_coffee_18_34.head().geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Finding the first 5 neighborhoods matching our filter
non_coffee_18_34_plot = non_coffee_18_34.head().plot(ax=ax, color='red', alpha=0.7)
non_coffee_18_34_centroid = non_coffee_18_34.head().geometry.centroid.plot(ax=ax, markersize=10, color='black')
plt.title('Neighborhoods without a Starbucks Coffee with the Highest Density of 18-34 Year Olds in Denver', weight='bold')
# Writing the names of neighborhoods on the map
for x, y, label in zip(non_coffee_18_34.head().geometry.centroid.x, non_coffee_18_34.head().geometry.centroid.y, non_coffee_18_34.head()['NBHD_NAME_x']):
non_coffee_18_34_boundary.annotate(label, xy=(x, y), xytext=(-20, 5), textcoords="offset points", fontsize=10)
cx.add_basemap(non_coffee_18_34_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);
Let us do the same analysis for the neighborhoods with the highest total population density.
# Filtering the neighborhoods with no Coffee Shops and the most people living
non_coffee_population = neighborhoods[neighborhoods['Coffee_Density']==0].sort_values('POPULATION_2010', ascending=False)
fig, ax = plt.subplots(figsize=[12,6])
non_coffee_population_boundary = non_coffee_population.head().geometry.boundary.plot(ax=ax, color='black', linewidth=0.5)
# Finding the first 5 neighborhoods matching our filter
non_coffee_population_plot = non_coffee_population.head().plot(ax=ax, color='red', alpha=0.7)
non_coffee_population_centroid = non_coffee_population.head().geometry.centroid.plot(ax=ax, markersize=10, color='black')
plt.title('Neighborhoods without a Starbucks Coffee with the Highest Density of Population in Denver', weight='bold')
for x, y, label in zip(non_coffee_population.head().geometry.centroid.x, non_coffee_population.head().geometry.centroid.y, non_coffee_population.head()['NBHD_NAME_x']):
non_coffee_population_boundary.annotate(label, xy=(x, y), xytext=(-20, 5), textcoords="offset points", fontsize=10)
cx.add_basemap(non_coffee_population_plot, crs='EPSG:4326', source=cx.providers.OpenStreetMap.Mapnik, alpha=0.6);
We are done! I hope you like my article. To reach all of these codes in the project, you can visit my GitHub repository.
Thanks for giving time 😊