Calculating Support & Resistance in Python using K-Means Clustering

Are you an algo trader? Integrating support and resistance levels into your predictive model can help better anticipate price reversals. Here we take a deep dive into how KMeans clustering might be applied to achieve such a goal.
k means clustering support resistance python alpharithms

Support and resistance levels are popular measures in technical analysis for stock trading. Support levels reflect price ranges at which a certain stock has trouble exceeding while resistance levels are those at which a stock’s price tends not to fall below.

Programmatic calculation of these levels is of paramount importance to algorithmic trading, technical analysis, and high-frequency trading. The K-means clustering algorithm is one means via which pricing levels can be identified such that support and resistance levels can be discovered.

Highlights

In this article, we discuss an approach for calculating support and resistance levels using the K-Means clustering algorithm. This is a partitioning algorithm by which different “groups” of prices can be discovered in ways that the boundaries between groups can be used as resistance levels. By the end, we’ll cover the following:

  1. Basic concepts of support and resistance levels
  2. How Support & Resistance levels are used among traders and technical analysts.
  3. Real-world examples of support & resistance levels.
  4. Calculating support & resistance using the Scikit-Learn package for Python.
  5. Considerations for selecting cluster count.
  6. Visualizing clusters, support, and resistance levels with Plotly.
  7. Assessing the validity and utility of KMeans clustering in the context of calculating support and resistance levels.

TL;DR

There’s no TL;DR here — this is a deep dive. KMeans is a great clustering algorithm but, as we see here, falls short of being a useful means of calculating high timeframe support and resistance levels (at least for Bitcoin.)

Intro: Support & Resistance Levels 101

Support and resistance levels are used in technical analysis to predict reversals in price trends. A falling price might be likelier to stop falling when it nears a support level. Conversely, a rising stock price might be likelier to stop increasing when it nears a resistance level. Support and resistance levels are not infallible and, as we will see later, determining such price ranges is no simple task.

Support levels become resistance levels once broken and likewise resistance levels become support levels when they are broken. As such, the same price levels can be either support or resistance depending on price action. According to Richard Schabacker’s classic Technical Analysis & Stock Market Profits such key levels can be described as follows:

  • Support describes levels where downward trends are halted
  • Resistance describes levels where upward trends are halted
  • Support levels can also be called ‘demand areas’
  • Resistance levels can be called ‘supply areas’
  • When support levels are broken they become new resistance levels;
  • When resistance levels are broken they become new support levels;
  • Market psychology is a factor and price trends “remember” previous support and resistance levels.

This last one is important — certain numbers that stand out tend to become support and resistance levels. For example, $100, $500, or $1,000 may represent concrete areas of support and resistance. However, smaller whole-dollar amounts such as $15, $25, and $30 may also act as support and resistance depending on the dynamics of the underlying asset, price history, and time period.

support resistance lines concept alpharithms 1
Figure 1: A rising price breaks through initial resistance which later becomes support.

In figure 1 we see a rising price “breaking through” a previous level to find a new range of resistance, after which the previous resistance level becomes a new support level. This figure illustrates how support and resistance levels can predict price reversals.

Real-World Example

Figure 1 illustrates a very contrived example where support and resistance levels are clearly defined. In practice, technical analysts rarely see such well-defined levels. Let’s consider the daily chart for Amazon from 2018 to midway through 2022.

amazon daily chart support resistance alpharithms
Figure 2: Daily price history for Amazon from 2018 through mid-2022 showing a sharp fall in price that “finds support” at a previous level of resistance.

In 2018, Amazon’s price rose through the bottom line, briefly pulled back and retested, before rising further to the upper line where it met at the red circle marked 2. The price then fell, rose, and fell again within this range before eventually moving up again into a new “range.”

Towards the start of 2022, the price began to fall and, in March of 2022, began to fall sharply. The price then “found support” at the point on the upper line marked 5. This is the same point on the line where, at the points marked 2 & 4, Amazon’s price had previously shown resistance.

Identifying Support & Resistance Levels

There is much debate regarding how one identifies support and resistance levels. Below are some common points to consider:

  • Horizontal vs. Diagonal
  • Intraday vs. Long Term
  • Major vs. Minor levels
  • Multiple Re-tests

How one chooses to incorporate each of these considerations greatly influences the nature of how support and resistance levels are calculated. For example, diagonal support and resistance can be powerful in helping predict small pullbacks during an uptrend. In this case, “breaking support” can be identified as a potential trend reversal.

spy weekly chart 2021 2022 alpharithms
Figure 3: Weekly price chart for the S&P 500 ETF showing ascending diagonal support and resistance levels.

In this weekly chart of the S&P 500 ETF, an ascending diagonal support and resistance level is established starting around November 2021 and moving through September 2022. As Figure 3 illustrates, these “trendlines” aren’t infallible and there are areas in which the price periodically rises above or below these lines.

Around September 2022, the first weekly close below this line is noted. The price recovers over the next few weeks, all the way up to the resistance line again, before entering a downtrend in the following weeks after that. The trendlines drawn here seem incredibly useful but have the benefit of hindsight. Calculating support and resistance levels in real-time is never as easy.

Drawing lines on this chart is much easier given the history available to guide one’s pencil. The phenomenon, known as Hindsight Bias a.k.a. the I knew it all along effect, typically makes drawing trendlines after the fact seem easy 1. Plotting these lines in real-time accurately enough to make trading decisions is much more difficult. For that, one needs some solid rules to fall back on.

Calculating Support, Resistance, & Trendlines

There are many ways to calculate support and resistance. Here we will focus on two primary means:

  1. Long-term support & resistance levels, drawn as horizontal lines.
  2. Shorter-term trendlines, draw as either diagonal or horizontal lines.

Long-term levels are used to help predict large price reversals marking the start and completion of price movements on longer timelines such as the daily or weekly charts. Trendlines are more useful to predict intraday movements or shorter daily movements.

Here we use K-Means clustering to identify long-term support and resistance levels. For trendlines, a combination of linear regression and minima-maxima calculation is used. Each offer different benefits but, as with many technical indicators, are more powerful when used together.

Support & Resistance via K-Means Clustering

This isn’t an article on k-means clustering and glazes over the technical details. Essentially, K-Means clustering is an algorithmic way to identify subsets within a larger set of values. See here for more details.

Here we apply K-Means clustering to identify long-term support and resistance levels in Python using the scikit-learn library and 5-years’ worth of historical weekly Bitcoin pricing data. To get started, complete the following actions:

  1. Install the Scikit-Learn package: pip install scikit-learn
  2. Install the Pandas library: pip install pandas
  3. Install the plotly library: pip install plotlyDownload the historical data from Github courtesy of Yahoo Finance.

With these dependencies, we can load our data, perform a KMeans clustering analysis, and then plot the results for visual inspection. The last part isn’t strictly necessary but will help assess the usefulness of our results.

Load Data as DataFrame

To get started, we’ll load the data as a DataFrame object:

import pandas as pd

# Load local CSV file and parse into dataframe object
btc = pd.read_csv('BTC-USD.06282017-06282022.csv')

# Parse date as DateTime
btc['Date'] = pd.to_datetime(btc['Date'])

# Set the date as the index
btc.set_index(['Date'], inplace=True)

# View the results
print(btc)

# Output
                    Open          High  ...     Adj Close        Volume
Date                                    ...                            
2017-06-26   2567.560059   2588.830078  ...   2506.469971    3393913024
2017-07-03   2498.560059   2916.139893  ...   2518.439941    5831748992
2017-07-10   2525.250000   2537.159912  ...   1929.819946    7453121024
2017-07-17   1932.619995   2900.699951  ...   2730.399902    9947990080
2017-07-24   2732.699951   2897.449951  ...   2757.179932    6942860928
...                  ...           ...  ...           ...           ...
2022-06-06  29910.283203  31693.291016  ...  26762.648438  215929645934
2022-06-13  26737.578125  26795.589844  ...  20553.271484  309685915250
2022-06-20  20553.371094  21783.724609  ...  21027.294922  175909056122
2022-06-27  21028.238281  21478.089844  ...  20280.634766   42347230868
2022-06-29  20291.271484  20360.972656  ...  20055.185547   24722872320

[263 rows x 6 columns]

Note that we have more data here than expected. For 5 years each containing 52.1428 weeks we should have a total of 260.714 periods in our data. A quick glance notes the last weekly period includes less than a full week, which means we’re only off by ~2 periods. For our application here — that’s close enough.

Perform a KMeans Cluster Analysis

This step involves 2 actions; performing the analysis and assigning each price a cluster.

# Convert adjusted closing price to numpy array
btc_prices = np.array(btc["Adj Close"])
print("BTC Prices:\n", btc_prices)

# Perform cluster analysis
K = 6
kmeans = KMeans(n_clusters=6).fit(btc_prices.reshape(-1, 1))

# predict which cluster each price is in
clusters = kmeans.predict(btc_prices.reshape(-1, 1))
print("Clusters:\n", clusters)

# View ouput
BTC Prices:
 [ 2506.469971  2518.439941  1929.819946  2730.399902  2757.179932
  3213.939941  4073.26001   4087.659912  4382.879883  4582.959961

  ... lots of extra data rows here ...

 39716.953125 39469.292969 38469.09375  34059.265625 31305.113281
 30323.722656 29445.957031 29906.662109 26762.648438 20553.271484
 21027.294922 20280.634766 20055.185547]

Clusters:
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 3 5 5 3 5 5 5 5 5 5 5 5 5
 5 5 0 0 5 5 5 5 5 5 0 5 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 5
 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 2 2
 2 2 2 2 1 4 1 1 4 4 4 4 4 4 1 4 4 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 3
 3 3 3 3]

Many of the actual prices have been manually removed from the printout here to save space. However, the full printout of the clusters variable indicates that each price has been assigned to one of the 6 total clusters. Let’s visualize this via Plotly as a scatter plot where each price has a unique color assigned based on which group it is in.

Visualizing Clusters via Plotly

Plotly integrates conveniently with DataFrame objects. One need only specify that Plotly is to be used via the set_options method. With this configured, the following approach is taken to assign each historical price data point a unique color based on which cluster it was assigned by our KMeans algorithm:

# Assigns plotly as visualization engine
pd.options.plotting.backend = 'plotly'

# Arbitrarily 6 colors for our 6 clusters
colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo']

# Create Scatter plot, assigning each point a color based
# on it's grouping where group_number == index of color.
fig = btc.plot.scatter(
    x=btc.index,
    y="Adj Close",
    color=[colors[i] for i in clusters],
)

# Configure some styles
layout = go.Layout(
    plot_bgcolor='#efefef',
    showlegend=False,
    # Font Families
    font_family='Monospace',
    font_color='#000000',
    font_size=20,
    xaxis=dict(
        rangeslider=dict(
            visible=False
        ))
)
fig.update_layout(layout)

# Display plot in local browser window
fig.show()

With this code executed, an instance of the Plotly viewer will launch in the default web browser of the local system:

colorized btc weekly pricing data kmeans clusters alpharithms
Figure 4: Colorized plot of KMeans -clustered data points where each point is assigned a color based on its assigned cluster.

Here we see each marker, representing a single historical price, assigned a unique color based on which cluster it was assigned. From this data, we can extract our long-term support and resistance values by finding the boundaries between the clusters.

Find Cluster Minimum & Maximum Values

Figure 4 illustrates our clusters separated by color. Conceptually, there would exist a line between each cluster as well — these are the bounding regions that will represent our support and resistance lines. To find these, we need to calculate the minimum and maximum values from each cluster, then decide how we will interpret them.

# Create list to hold values, initialized with infinite values
min_max_values = []

# init for each cluster group
for i in range(6):

    # Add values for which no price could be greater or less
    min_max_values.append([np.inf, -np.inf])

# Print initial values
print(min_max_values)

# Get min/max for each cluster
for i in range(len(btc_prices)):

    # Get cluster assigned to price
    cluster = clusters[i]

    # Compare for min value
    if btc_prices[i] < min_max_values[cluster][0]:
        min_max_values[cluster][0] = btc_prices[i]

    # Compare for max value
    if btc_prices[i] > min_max_values[cluster][1]:
        min_max_values[cluster][1] = btc_prices[i]
# Print resulting values
print(min_max_values)

# Output
[[inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf]]
[[15455.400391, 26762.648438], [41247.824219, 51753.410156], [29445.957031, 39974.894531], [1929.819946, 7564.345215], [54771.578125, 65466.839844], [7679.867188, 14156.400391]]

Here we see each cluster has been assigned a unique min, max pair of prices. This is the smallest and largest price value for each uniquely-colored cluster of prices in Figure 4. These can be visualized as horizontal lines on our chart as follows. The following code adds these lines and adds a line to plot the price for added visual clarity:

import plotly.graph_objects as go

# Again, assign an arbitrary color to each of the 6 clusters
colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo']

# Create Scatter plot, assigning each point a color where
# point group = color index.
fig = btc.plot.scatter(
    x=btc.index,
    y="Adj Close",
    color=[colors[i] for i in clusters],
)

# Add horizontal lines
for cluster_min, cluster_max in min_max_values:
    fig.add_hline(y=cluster_min, line_width=1, line_color="blue")
    fig.add_hline(y=cluster_max, line_width=1, line_color="blue")

# Add a trace of the price for better clarity
fig.add_trace(go.Trace(
    x=btc.index,
    y=btc['Adj Close'],
    line_color="black",
    line_width=1
))

# Make it pretty
layout = go.Layout(
    plot_bgcolor='#efefef',
    showlegend=False,
    # Font Families
    font_family='Monospace',
    font_color='#000000',
    font_size=20,
    xaxis=dict(
        rangeslider=dict(
            visible=False
        ))
)
fig.update_layout(layout)
fig.show()

This code produces the following chart — again launched in the local web browser:

clustered data with support and resistance lines alpharithms
Figure 5: Pricing data plotted with clustering groups with horizontal lines overlaid at the y-values of group minimum and maximum values.

These lines represent our support and resistance lines. However, there are some issues — both conceptual and practical — that we need to address:

  • Can we combine lines between groups into a single line?
  • Are the absolute minimum and absolute maximum good s/r lines?
  • Did we choose an appropriate number of clusters?

Let’s address each of these in order.

Consolidating Boundary Lines

Figure 5 shows a horizontal line plotted for both the minimum and maximum of each cluster. However, it makes conceptual sense that a single line could separate each cluster such that the maximum value of cluster 1 (containing smaller price values) could be combined with cluster 2 (containing larger price values) as such:

combining cluster minima maxima support resistance alpharithms
Figure 6: The minimum and maximum prices from adjacent clusters can be combined to form a single averaged line representing support & resistance levels.

Here the maximum value from cluster 1 is combined (averaged) with the minimum value from cluster 2 to form a single boundary between each cluster. Conceptually, this one would represent a resistance line for prices in cluster 1 and a support line for prices in cluster 2.

To apply this concept to our data we need to first ensure that the data is sorted such that the first min, max pair reflects the bottom-most cluster (cluster 1) on our chart and the last min, max pair reflects the top-most cluster (cluster 6) on our chart. This will ensure that the combination of lines happens only between neighboring clusters. The following code will combine the lines appropriately, first sorting to ensure accurate adjacency:

print("Initial Min/Max Values:\n", min_max_values)

# Create container for combined values
output = []

# Sort based on cluster minimum
s = sorted(min_max_values, key=lambda x: x[0])

# For each cluster get average of
for i, (_min, _max) in enumerate(s):

    # Append min from first cluster
    if i == 0:
        output.append(_min)

    # Append max from last cluster
    if i == len(min_max_values) - 1:
        output.append(_max)

    # Append average from cluster and adjacent for all others
    else:
        output.append(sum([_max, s[i+1][0]]) / 2)

print("Sorted Min/Max Values:\n", output)

# Resulting output
Initial Min/Max Values:
[[1929.819946, 7564.345215], [41247.824219, 51753.410156], [7679.867188, 14156.400391], [29445.957031, 39974.894531], [54771.578125, 65466.839844], [15455.400391, 26762.648438]]

Sorted Min/Max Values:
[1929.819946, 7622.106201500001, 14805.900391, 28104.3027345, 40611.359375, 53262.4941405, 65466.839844]

To assess our results we will again plot these values as before, except we use our values from output and only call a single add_hline per group:

# Add horizontal lines
for cluster_avg in output:
    fig.add_hline(y=cluster_avg, line_width=1, line_color="blue")

This produces the following chart:

combined cluster min max support resistance levels alpharithms
Figure 7: Combined minima/maxima per cluster reflecting a single boundary between each cluster as support/resistance.

These lines look much cleaner and more conceptually appropriate. Now we can assess whether using the absolute minima and maxima are useful.

Assessing Absolute Minima/Maxima

The bottommost and topmost horizontal blue lines represent the smallest and largest recorded prices, respectively. These prices do not provide a useful measure of support and resistance. Arguably, the maximum value could play a role in the future but the price would need to re-enter that “cluster” for it to be relevant. To drop the first and last lines, we will alter our previous code to the following:

# Add horizontal lines 
for cluster_avg in output[1:-1]:
    fig.add_hline(y=cluster_avg, line_width=1, line_color="blue")

Re-running the code from our original chart creation using this updated add_hline snippet, we produce the following chart:

removed min max cluster boundaries alpharithms
Figure 8: The absolute maximum and absolute minimum horizontal lines were removed.

This chart looks much more akin to those reflecting support and resistance lines. At first glance, however, these lines don’t appear to be placed in entirely useful locations. To illustrate this point, consider the same figure with manually placed support and resistance lines:

manual support resistance lines overlay alpharithms
Figure 8: Overlay of manually-placed support and resistance lines.

These are by no means perfect but seem to reflect areas in which price action has reversed on several occasions. Finding more appropriate boundary lines can be approached by using different cluster values.

Determining the Right Number of Clusters

There are many ways in which one can approach optimizing the number of clusters chosen from a dataset. We will use the elbow method — but know there are plenty of other options. The elbow method involves calculating a KMeans clustering for a series of different k values.

For each iteration, the sum of the squares of the distance from each cluster point to the cluster center is calculated. Fortunately, this measure is provided by the scikit-learn KMeans class as the inertia_ property. We calculate each of these values as such:

# create a list to contain output values
values = []

# Define a range of cluster values to assess
K = range(1, 10)

# Performa a clustering using each value, save inertia_ value from each
for k in K:
    kmeans_n = KMeans(n_clusters=k)
    kmeans_n.fit(btc_prices.reshape(-1, 1))
    values.append(kmeans_n.inertia_)

# Output
[80453452183.20088, 10385720870.392435, 4899414919.598458, 3200341434.3973494, 2038101574.9904993, 1230938654.2410448, 913656416.2350059, 711089387.2186545, 573668652.8908113]

This provides a list of really big numbers from which we can approach our optimization. As described by the scikit-learn documentation for the KMeans class, the inertia_ property is:

[The] Sum of squared distances of samples to their closest cluster center, weighted by the sample weights if provided.

Let’s now plot this new data to assess how we might begin approaching a method to programmatically select an optimal number of KMeans clusters. We’ll use a different graphing method here, still using Plotly:

import plotly.graph_objects as go

# Create initial figure
fig = go.Figure()

# Add line plot of inertia values
fig.add_trace(go.Trace(
    x=list(K),
    y=values,
    line_color="black",
    line_width=1
))

# Make it pretty
layout = go.Layout(
    plot_bgcolor='#efefef',
    showlegend=False,
    # Font Families
    font_family='Monospace',
    font_color='#000000',
    font_size=20,
    xaxis=dict(
        rangeslider=dict(
            visible=False
        ))
)
fig.update_layout(layout)
fig.show()

Using our list of k values as the x-axis and the resulting inertia_ values as the y, we get the following graph:

elbow method kmeans optimization alpharithms
Figure 9: An “elbow” visualized via the sum of squares method located at the second cluster.

This suggests that we should use a k-value of 2 — though that hardly seems accurate. Nonetheless, let’s consider have a look:

k mean cluster count 2 alpharithms
Figure 10: A plot of a support and resistance line located using a k-value of 2.

This seems to be practically useless, producing only a single line somewhere in the $26k range. Clearly, the elbow method isn’t the best approach here. Rather than spraying and praying let’s take a more manual approach and view the results of each of our potential k values ranging from 2 to 10 (1 being totally useless.)

misc k values support resistance alpharithms
Figure 11: Resulting HTF support/resistance lines from k values ranging from 2-9. (click to enlarge)

None of these are great. Notice that the lines representing cluster boundaries aren’t dispersed evenly — they tend to be added nearest areas in which less volatility in price has been recorded over time. In other words, vertical regions in which there is a greater density of recorded points.

Discussion

We’ve covered a fair amount of ground in this article so let’s recap. The following points of discussion have been addressed:

  • Understanding support and resistance lines.
  • Understanding trendlines.
  • Using K-Means clustering to identify price levels at which support and resistance levels can be automatically calculated.
  • Considering optimal ways in which to select the number of clusters.
  • Visualizing results with Plotly

K-Means is a versatile clustering algorithm but lacks a clear directive for selecting an optimal number of clusters. As mentioned earlier, there are several common approaches by which such an optimization can be done. We tried the Elbow approach here, but other methods include:

Each of these approaches has its merits and may, or may not, be appropriate given a specific dataset. The Elbow approach is a popular method of optimizing clustering which is why it was elected here. However, it is by no means meant to be interpreted as the “best” k-means optimization technique.

Final Thoughts

K-means clustering can help distill structure from seemingly random data. It has its limits and, as was the case here, doesn’t always produce jaw-dropping results. This type of partitioning can help group data into segments that represent novel behaviors or values but don’t always hit the mark. In the context of high timeframe support and resistance calculations, it seems that K-Means is not a great solution.

There are many technical indicators used for algorithmic trading. These include the mean average divergence-convergence (MACD), stochastic oscillator, simple moving averages, and even the Bollinger Bands method. Beyond single indicators, methods such as linear regression can be used to help predict stock prices as well. Conceivably, especially in the context of multivariate linear regression, support and resistance levels could be incorporated as meaningful features into a predictive model.

References

  1. Bernstein, Daniel M et al. “Hindsight bias and developing theories of mind.” Child development vol. 78,4 (2007): 1374-94. doi:10.1111/j.1467-8624.2007.01071.x
Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.