Predicting prices in the Sneaker Aftermarket

By: Derek Aborde

Introduction

In this notebook we will look at data obtained from Stockx, an online marketplace for shoes. We will focus on transaction data for Adidas Yeezy sneakers, all to hopefully gain some insight into the sneaker aftermarket.

The Sneaker Aftermarket

What is the Sneaker Aftermarket?

Well, let's take a look at the primary market or the retail market first. The primary market is where sneakers are first released for retail to the general public, usually through major retailers like Nike stores, Adidas stores, Footlocker, and many more.

Though, for popular sneakers, you can expect a very limited supply that usually sells out almost immediately. This leaves many shoppers wanting such sold out sneakers that are no longer available in the primary market.

Enter the aftermarket or the secondary market, which is where these shoes are sold once again, except now at a premium. Here, individual sellers who were able to purchase highly coveted sneakers from the primary market can list their sneakers for sale, usually at prices much higher than retail, and resell them to buyers.

Sites like Stockx and FlightClub are major platforms of the sneaker aftermarket.

Motivation

Stockx prides itself on being the 'Stock Market of Things', which got me thinking. We've seen how data science is applied to stock market analysis and stock market predictions, so would it be possible for shoes? Stockx keeps a lot of transaction data and I want to see if it can be used to predict prices on the aftermarket.

Yeezy

The shoes in question we will focus on in this notebook are Yeezys. Founded by both Adidas and Kanye West, this line of sneakers has grown to be one of the most coveted sneakers, which has made them very popular on the sneaker aftermarket, as some models sell upwards to 200-300% above retail.

Goal
Train a model to predict the price that a Yeezy sneaker will sell for on the aftermarket.

1. Data Collection
The code below contains everything needed to get transaction data for every mens Yeezy model. Feel free to modify how much data you want by changing up some of the parameters in 'params'. Let's skip this part and take a look at the CSV file that the code below produces.

import requests
from bs4 import BeautifulSoup
import json
import time
headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
params = {'state':'480','currency':'USD','limit':'10000','page':'1','order':'DESC','country':'US'}

# Gets some quantitative and qualitative data about every mens yeezy shoe
identifying_data = pd.DataFrame()
for i in range(1,5):
    url = 'https://stockx.com/api/browse?_tags=yeezy,adidas&productCategory=sneakers&gender=men&page={}'.format(i)
    try:
        r = requests.get(url, headers=headers)
    except requests.exceptions.RequestException as e:  # This is the correct syntax
        raise SystemExit(e)
    soup = BeautifulSoup(r.text, 'html.parser')
    data = json.loads(str(soup))    
    identifying_data = identifying_data.append(pd.json_normalize(data['Products']))
    # sleep so you don't get rate-limited
identifying_data = identifying_data.dropna(subset=['market.deadstockSold'])

# Function that gets a good number to slice with
def get_split(x):
    y = x / 2500
    if y <= 1:
        return 1
    else:
        return int(y)

# Gets actual transaction data, most importantly, how much and when a yeezy model sold for
df = pd.DataFrame()
count = 0
for i in identifying_data['urlKey']:
    url = 'https://stockx.com/api/products/{}/activity'.format(i)
    try:
        r = requests.get(url, params=params, headers=headers)
    except requests.exceptions.RequestException as e:  # This is the correct syntax
        raise SystemExit(e)
    soup = BeautifulSoup(r.text, 'html.parser')
    data = json.loads(soup.string)
    x = get_split(identifying_data.iloc[count,62])
    df1 = pd.json_normalize(data['ProductActivity'][::x])
    df1['Color'] = identifying_data.iloc[count, 7]
    df1['Release Date'] = identifying_data.iloc[count, 19]
    df1['Retail'] = identifying_data.iloc[count, 22]
    df1['Model'] = identifying_data.iloc[count, 23]
    df1['Name'] = identifying_data.iloc[count, 27]
    df1['Annual High'] = identifying_data.iloc[count, 57]
    df1['Annual Low'] = identifying_data.iloc[count, 58]
    df1['Volatility'] = identifying_data.iloc[count, 61]
    df1['Total Sold'] = identifying_data.iloc[count, 62]
    df1['Total Dollars'] = identifying_data.iloc[count, 71]
    df = df.append(df1)Í
    count += 1
    time.sleep(5)

Let's first read in the CSV file as a pandas dataframe and take a look at our dataset

# Reading in the data
import pandas as pd
df = pd.read_csv('stockx_yeezy_data.csv')
df

From our dataset, you can see we have 491,235 rows and 21 columns.

That means that we have data on 491,235 different Yeezy sales.

Each transaction has 21 columns but we will only focus on the following information:

createdAt - Date the shoe was sold
shoeSize - Size of the shoe
localAmount - The premium price paid for the shoe (resell price)
Color - Color of the shoe
Release Date - Date the shoe originally released in the primary market
Retail - The original price of the shoe (retail price)
Model - The Model (Yeezy 350, 380, 500, 700...)
Name - Name of the shoe
Annual High - The maximum yearly price the shoe sold for
Annual Low - The minimum yearly price
Volatility - Rate at which the selling prices of the shoe changed
Total Sold - How many units sold on stockx
Total Dollars - Total Dollar amount of all sales of the shoe on stockx

2. Data Cleaning
Taking a quick look at the data, you may notice that some cleaning needs to be done.

Delete some columns - because some have useless information

# Gets rid of the columns that we do not want
df = df.drop(columns=['Unnamed: 0','chainId', 'amount', 'productId', 'skuUuid', 'state', 'customerId', 'localCurrency',])

Convert columns to an appropriate type - for easier data manipulation

# Converts columns to a datetime
df['createdAt'] = pd.to_datetime(df['createdAt'])
df['Release Date'] = pd.to_datetime(df['Release Date'])

Get rid of or change empty data points - to avoid errors from learning models

# Check which columns have missing values and how many are missing
pd.isnull(df).sum()

createdAt           0
shoeSize            0
localAmount         0
Color               0
Release Date      159
Retail              0
Model               0
Name                0
Annual High         0
Annual Low       2779
Volatility          0
Total Sold          0
Total Dollars       0
dtype: int64

# Gets rid of the shoes that have not released yet
df = df.dropna(subset=['Release Date'])

# Replaces NaN elements with 0
df = df.fillna(0)

3. Feature Engineering
Here we will create more influential variables from the existing raw data by creating new columns. This will help us categorize our dataset, and potentially help us gain insight into some good predictors for our training model.

Lets first do some manual one hot encodings, which are a representation of categorical variables as binary values

df['is_white'] - 1 if white in 'Color', 0 if not
df['is_black'] - 1 if black in 'Color', 0 if not
df['350'] - 1 if 350 in 'Name', 0 if not
df['Static'] - 1 if reflective in 'Name', 0 if not

I believe that whether or not a shoe is white or black, a 350 model (the most popular model), Static (special color of a model that has reflective accents and are much more limited than their non-reflective counterparts), will all be useful predictors of resell prices

# Makes new column checking if White is found in the Color column
df['is_white'] = [1 if 'White' in x else 0 for x in df['Color']]

# Makes new columns checking if Black is found in the Color column
df['is_black'] = [1 if 'Black' in x else 0 for x in df['Color']]

# Makes new columns checking if 350 is found in the Name column
df['350'] = [1 if '350' in x else 0 for x in df['Name']]

# Makes new columns checking if Static is found in the Name column
df['Static'] = [1 if 'Static' in x else 0 for x in df['Name']]

Now lets quantify some columns for our training models to interact with:

df['days_until_sold'] - Calculated by subtracting the Sale date by Release date. This allows us to compare shoes by how long after release they were sold.
df['price_ratio'] - Calculated by dividing the Sale price by Retail price. This allows us to standardize each shoe to its retail, since Yeezys have varying retail prices.

# Makes new column by subtracting the two dates, getting only the computated days
df['days_until_sold'] = (df['createdAt'].sub(df['Release Date'])).dt.days

# Makes new column by dividing the two prices
df['price_ratio'] = (df['localAmount'])/(df['Retail'])

df.head()

4. Data analysis
Now take a look at our dataset and examine some interesting relations within the dataset.

Let's first take a look at our target variable, price_ratio. From the histogram below, we see that it is right skewed. Clearly there are some really big outliers within the dataset.

import matplotlib.pyplot as plt 

# Histogram plot of our target variable
plt.hist(x=df['price_ratio'], bins=100)
plt.xlabel('Price of sale')
plt.ylabel('Number of Sales')
plt.title('Distribution of Sales')

Text(0.5, 1.0, 'Distribution of Sales')

Now we will look at one of our categorical variables, 'Model', and its relation to our target variable. You may notice that 350 models, on average, tend to have some of the highest price ratios.

# Group our dataframe by each Model
grouped = df.groupby('Model')
x = []
y = []

# Go through each Model i and plot the mean of its price ratio
for i,bin in grouped:
    group = grouped.get_group(i)
    x.append(i)
    y.append(group['price_ratio'].mean())

# Set labels
plt.xticks(rotation=45, ha='right')
plt.title('Price ratios for each model')
plt.xlabel('Model')
plt.ylabel('Price Ratio')
plt.bar(x,y)

<BarContainer object of 19 artists>

Now let's look at our numerical variables and one hot encodings, and their relation to our target variable.

We can visualize this relationship by creating a correlation matrix.

import numpy as np

# Make a new dataframe, consisting of only columns that are numerical
numerical = df.select_dtypes(include=[np.number])

# Creates matrix for each predictor ands its relation to other variables
correlation_matrix = numerical.corr()
correlation_matrix

The heatmap below plots the correlation data between our variables.

You may notice that Annual High, Annual Low, is_black, 350, and days_until_sold have some of the strongest positive correlation with our target variable.

localAmount has a positive correlation, but must be disregarded since it is directly related to price_ratio and thus not a good predictor.

import seaborn as sns

# Creates heatmap
sns.heatmap(correlation_matrix, square=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7fb03c2f8710>

5. Modeling
Now, after all our data cleaning, feature engineering, and data exploration, we will now start our predictive modeling.

We will train a multiple linear regression model to predict our target variable, price_ratio. Multiple linear regression models are like Linear Regression, except it can handle more than 1 feature.

However, because it can handle more than 1 feature, it is imperative that features not be correlated to each other. So to avoid multicollinearity, we must first drop 'localAmount' and 'Retail', which are directly related to our target variable. Anyways, price_ratio is a much better indicator of these 2 values.

'createdAt' and 'Release Date' should be dropped because we have a good unit measuring these 2 values, our 'days_until_sold' value.

'Total Sold' and 'Total Dollars' are also correlated to each other, so we will drop one of them. I decided to keep 'Total Dollars' as a predictor.

Lastly, 'Model', 'Name', and 'Color' will be dropped since we have one hot encoded columns in there place.

df = df.drop(columns=['localAmount','Retail', 'createdAt','Release Date','Total Sold','Model','Name','Color'])

df.head()

With that done, we can start by first splitting our dataset into training and testing data

from sklearn.model_selection import train_test_split

# Seperate the features and target
X = df.drop(columns=['price_ratio'])
y = df['price_ratio']

# Do an 80/20 split of our data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20)

Here we make our linear regression object and train it with our training data.

Then we plot our predictions and our actual test data to compare results. A summary of our model is also shown to highlight our R2 score.

from sklearn import linear_model
from regressors import stats as stats_reg

# Define the multiple linear regression model
lr = linear_model.LinearRegression()

# Fitting the model 
lr.fit(X_train,y_train)

# predict with the data
y_pred = lr.predict(X_test)

# Make scatter plot of actual vs predictions
plt.scatter(y_test, y_pred)
plt.title('Actual vs predictions')
plt.xlabel('Actual')
plt.ylabel('Predicted')

# Get summary of our model
stats_reg.summary(lr, X_train, y_train, X_train.columns)

Residuals:
     Min      1Q  Median      3Q     Max
-75.0473 -0.2185  0.1178  0.4102  6.1145


Coefficients:
                 Estimate  Std. Error   t value  p value
_intercept       0.385242    0.006851   56.2305      0.0
shoeSize        -0.046422    0.000360 -128.8901      0.0
Annual High      0.000483    0.000003  192.5320      0.0
Annual Low       0.004795    0.000014  351.6830      0.0
Volatility       1.354547    0.013527  100.1339      0.0
Total Dollars   -0.000000    0.000000 -122.2528      0.0
is_white         0.554789    0.005187  106.9592      0.0
is_black        -0.139551    0.004364  -31.9792      0.0
350              0.269708    0.003219   83.7852      0.0
Static          -0.103517    0.005400  -19.1692      0.0
days_until_sold  0.000220    0.000006   38.8793      0.0
---
R-squared:  0.57767,    Adjusted R-squared:  0.57766
F-statistic: 53735.29 on 10 features

So our R2 value is mediocre, and our graph confirms that our model can definitely be improved.

But, before we do this, lets first check whether or not our model violates the linear regression assumptions.

So lets use statsmodels to do a normality test.

After doing so and looking at the graph, the S-Shape indicates significant non-normality, thus failing the normality test.

import statsmodels.api as sm

# Get residual
Residual = y_pred - y_test

# Plotting the residuals
sm.qqplot(Residual,line="r");

5.1 Modeling
Perhaps we can solve this violation of normality by performing a nonlinear transformation of variables.

Let's see what a natural log of transformations to the variables might do by comparing our original target variable to the natural log of it.

You may notice in the histogram below that the frequencies and spread change drastically after natural log transformation.

# Histogram plot of the original target
plt.hist(df["price_ratio"], bins=100);

# Histogram plot of the log-transformed target
plt.hist(np.log(df["price_ratio"]),bins=100);

plt.legend(["price ratio", "log transformed price ratio"]);
plt.show()

Now lets fit our training model again, this time, with a log transformed price_ratio.

X_train_l = X_train
X_test_l = X_test

y_train_l = np.log(y_train)
y_test_l = np.log(y_test)

# Define the multiple linear regression model
lr = linear_model.LinearRegression()

# Fitting the model 
lr.fit(X_train_l,y_train_l)

# predict with the data
y_pred_l = lr.predict(X_test_l)

# Make scatter plot of actual vs predictions
plt.scatter(y_test_l, y_pred_l)
plt.title('Actual vs predictions')
plt.xlabel('Actual')
plt.ylabel('Predicted')

# Get summary of our model
stats_reg.summary(lr, X_train_l, y_train_l, X_train_l.columns)

Residuals:
    Min      1Q  Median      3Q     Max
-3.2596 -0.1484    0.04  0.1778  1.9169


Coefficients:
                 Estimate  Std. Error   t value  p value
_intercept      -0.021554    0.002206   -9.7707      0.0
shoeSize        -0.018540    0.000116 -159.8704      0.0
Annual High      0.000274    0.000001  339.0615      0.0
Annual Low       0.001371    0.000004  312.2526      0.0
Volatility       0.172036    0.004356   39.4978      0.0
Total Dollars   -0.000000    0.000000 -183.2716      0.0
is_white         0.234277    0.001670  140.2774      0.0
is_black        -0.014263    0.001405  -10.1509      0.0
350              0.192296    0.001036  185.5287      0.0
Static           0.035994    0.001739   20.7007      0.0
days_until_sold  0.000065    0.000002   35.6023      0.0
---
R-squared:  0.66500,    Adjusted R-squared:  0.66499
F-statistic: 77984.51 on 10 features

# Get residual
Residual_l = y_pred_l - y_test_l

# Plotting the residuals
sm.qqplot(Residual_l,line="r");

Although log transforming our target variable helps improve our model, as told by a higher R2 score, it fails to pass the normality test once again.

A good next step would be to train a different model.

Conclusions

Although we were able to train a relatively mediocre multiple linear regression model, we were unable to meet the linear regression requirements. Nonetheless, we were still able to gain some insight into the sneaker aftermarket. Most importantly we learned the features that drive the resell prices on Yeezys, like Annual Highs, differences between sale date and release date, and even color.

Also, I believe that the number of pairs released of a certain shoe is the ultimate dictator of the aftermarket prices. This article explains how Nike controls the sneaker aftermarket through a perfect execution of limiting supply to increase demand, encouraging a cult-like customer devotion to Jordans, like Adidas has done with Yeezys. So with limited supply, you can only expect the sneaker aftermarket prices to match this type of demand. Getting the number of pairs released of each Yeezy model would be difficult to get (most are unknown), but if it were attainable, it would greatly improve our model as another feature.

But, without this, the best next course would be to train another model, here is a great article if you would like to learn some more about this: https://www.quality-control-plan.com/StatGuide/mulreg_alts.htm

	Unnamed: 0	chainId	amount	createdAt	shoeSize	productId	skuUuid	state	customerId	localAmount	...	Color	Release Date	Retail	Model	Name	Annual High	Annual Low	Volatility	Total Sold	Total Dollars
0	0	12479908586289232936	1900.0	2017-02-21T23:52:48+00:00	8.5	NaN	07c69548-8a79-427f-b3b2-14a3b0d463e8	480	NaN	1900	...	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0
1	1	12484419258910299232	1750.0	2017-02-28T16:45:47+00:00	8.5	NaN	07c69548-8a79-427f-b3b2-14a3b0d463e8	480	NaN	1750	...	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0
2	2	12482557743936126591	1405.0	2017-03-07T03:56:36+00:00	8.5	NaN	07c69548-8a79-427f-b3b2-14a3b0d463e8	480	NaN	1405	...	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0
3	3	12486416309791974108	1295.0	2017-03-20T03:36:58+00:00	8.5	NaN	07c69548-8a79-427f-b3b2-14a3b0d463e8	480	NaN	1295	...	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0
4	4	12518755073694717688	1250.0	2017-04-16T14:30:30+00:00	8.5	NaN	07c69548-8a79-427f-b3b2-14a3b0d463e8	480	NaN	1250	...	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
491230	12	12845045342143458938	450.0	2018-07-11T03:00:38+00:00	8.5	NaN	e330718c-38c7-4fa1-b046-3815a34d9651	480	NaN	450	...	Shadow Black/Shadow Black/Shadow Black	2018-07-07 23:59:59	200.0	adidas Yeezy 500	adidas Yeezy 500 Shadow Black (Friends & F...	368.0	200.0	0.228261	2.0	568.0
491231	13	12846048067616868422	300.0	2018-07-12T04:09:36+00:00	12.0	NaN	e6e4dfb6-1ed2-40d1-9614-4c7d8c12539e	480	NaN	300	...	Shadow Black/Shadow Black/Shadow Black	2018-07-07 23:59:59	200.0	adidas Yeezy 500	adidas Yeezy 500 Shadow Black (Friends & F...	368.0	200.0	0.228261	2.0	568.0
491232	14	12864744039539808601	350.0	2018-08-09T17:45:22+00:00	12.0	NaN	e6e4dfb6-1ed2-40d1-9614-4c7d8c12539e	480	NaN	350	...	Shadow Black/Shadow Black/Shadow Black	2018-07-07 23:59:59	200.0	adidas Yeezy 500	adidas Yeezy 500 Shadow Black (Friends & F...	368.0	200.0	0.228261	2.0	568.0
491233	15	13126144066807342257	200.0	2019-08-02T15:05:15+00:00	12.0	NaN	e6e4dfb6-1ed2-40d1-9614-4c7d8c12539e	480	NaN	200	...	Shadow Black/Shadow Black/Shadow Black	2018-07-07 23:59:59	200.0	adidas Yeezy 500	adidas Yeezy 500 Shadow Black (Friends & F...	368.0	200.0	0.228261	2.0	568.0
491234	0	13378915605124868988	260.0	2020-07-16T09:17:10+00:00	7.0	NaN	03c50378-dafa-4279-93cd-55262f20a5a1	480	NaN	260	...	Eliada/Eliada/Eliada	NaN	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Eliada	260.0	260.0	0.000000	1.0	260.0

	createdAt	shoeSize	localAmount	Color	Release Date	Retail	Model	Name	Annual High	Annual Low	Volatility	Total Sold	Total Dollars	is_white	is_black	350	days_until_sold	price_ratio
0	2017-02-21 23:52:48+00:00	8.5	1900	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0	1	1	1	-5	8.636364
1	2017-02-28 16:45:47+00:00	8.5	1750	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0	1	1	1	2	7.954545
2	2017-03-07 03:56:36+00:00	8.5	1405	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0	1	1	1	9	6.386364
3	2017-03-20 03:36:58+00:00	8.5	1295	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0	1	1	1	22	5.886364
4	2017-04-16 14:30:30+00:00	8.5	1250	White/Core Black/Red	2017-02-25 23:59:59	220.0	adidas Yeezy Boost 350 V2	adidas Yeezy Boost 350 V2 Zebra	795.0	240.0	0.093012	43852.0	14898400.0	1	1	1	49	5.681818

	shoeSize	localAmount	Retail	Annual High	Annual Low	Volatility	Total Sold	Total Dollars	is_white	is_black	350	Static	days_until_sold	price_ratio
shoeSize	1.000000	-0.034040	-0.018138	0.043257	0.053645	-0.003941	-0.058769	-0.054049	0.016919	0.016013	0.007335	-0.032904	0.054933	-0.032731
localAmount	-0.034040	1.000000	0.188779	0.684746	0.797130	-0.116144	-0.246750	-0.145108	0.092942	0.288643	0.318114	0.101822	0.282630	0.945043
Retail	-0.018138	0.188779	1.000000	0.204439	0.265161	-0.334810	0.093847	0.132985	-0.233395	-0.074635	-0.072043	0.071808	-0.015671	-0.078229
Annual High	0.043257	0.684746	0.204439	1.000000	0.757629	-0.193593	-0.217600	-0.034445	0.042052	0.286530	0.415975	0.180736	0.216902	0.619168
Annual Low	0.053645	0.797130	0.265161	0.757629	1.000000	-0.257214	-0.281419	-0.155236	0.115505	0.437701	0.381372	0.113696	0.325368	0.722766
Volatility	-0.003941	-0.116144	-0.334810	-0.193593	-0.257214	1.000000	-0.024666	-0.116580	-0.056458	0.100900	-0.112545	0.037206	0.045494	-0.074649
Total Sold	-0.058769	-0.246750	0.093847	-0.217600	-0.281419	-0.024666	1.000000	0.956442	0.070519	-0.065438	0.077054	-0.079571	-0.089143	-0.269435
Total Dollars	-0.054049	-0.145108	0.132985	-0.034445	-0.155236	-0.116580	0.956442	1.000000	0.080911	-0.008001	0.165801	0.032354	-0.080686	-0.175199
is_white	0.016919	0.092942	-0.233395	0.042052	0.115505	-0.056458	0.070519	0.080911	1.000000	0.136891	-0.012604	-0.087662	0.158469	0.172526
is_black	0.016013	0.288643	-0.074635	0.286530	0.437701	0.100900	-0.065438	-0.008001	0.136891	1.000000	0.056350	-0.051567	0.261502	0.302886
350	0.007335	0.318114	-0.072043	0.415975	0.381372	-0.112545	0.077054	0.165801	-0.012604	0.056350	1.000000	0.144962	0.041389	0.353919
Static	-0.032904	0.101822	0.071808	0.180736	0.113696	0.037206	-0.079571	0.032354	-0.087662	-0.051567	0.144962	1.000000	-0.087191	0.082349
days_until_sold	0.054933	0.282630	-0.015671	0.216902	0.325368	0.045494	-0.089143	-0.080686	0.158469	0.261502	0.041389	-0.087191	1.000000	0.284903
price_ratio	-0.032731	0.945043	-0.078229	0.619168	0.722766	-0.074649	-0.269435	-0.175199	0.172526	0.302886	0.353919	0.082349	0.284903	1.000000

	shoeSize	Annual High	Annual Low	Volatility	Total Dollars	is_white	is_black	350	days_until_sold	price_ratio
0	8.5	795.0	240.0	0.093012	14898400.0	1	1	1	-5	8.636364
1	8.5	795.0	240.0	0.093012	14898400.0	1	1	1	2	7.954545
2	8.5	795.0	240.0	0.093012	14898400.0	1	1	1	9	6.386364
3	8.5	795.0	240.0	0.093012	14898400.0	1	1	1	22	5.886364
4	8.5	795.0	240.0	0.093012	14898400.0	1	1	1	49	5.681818