Support and resistance levels are popular measures in technical analysis for stock trading. Support levels reflect price ranges at which a certain stock has trouble exceeding while resistance levels are those at which a stock’s price tends not to fall below.
Programmatic calculation of these levels is of paramount importance to algorithmic trading, technical analysis, and high-frequency trading. The K-means clustering algorithm is one means via which pricing levels can be identified such that support and resistance levels can be discovered.
Highlights
In this article, we discuss an approach for calculating support and resistance levels using the K-Means clustering algorithm. This is a partitioning algorithm by which different “groups” of prices can be discovered in ways that the boundaries between groups can be used as resistance levels. By the end, we’ll cover the following:
- Basic concepts of support and resistance levels
- How Support & Resistance levels are used among traders and technical analysts.
- Real-world examples of support & resistance levels.
- Calculating support & resistance using the Scikit-Learn package for Python.
- Considerations for selecting cluster count.
- Visualizing clusters, support, and resistance levels with Plotly.
- Assessing the validity and utility of KMeans clustering in the context of calculating support and resistance levels.
TL;DR
There’s no TL;DR here — this is a deep dive. KMeans is a great clustering algorithm but, as we see here, falls short of being a useful means of calculating high timeframe support and resistance levels (at least for Bitcoin.)
Intro: Support & Resistance Levels 101
Support and resistance levels are used in technical analysis to predict reversals in price trends. A falling price might be likelier to stop falling when it nears a support level. Conversely, a rising stock price might be likelier to stop increasing when it nears a resistance level. Support and resistance levels are not infallible and, as we will see later, determining such price ranges is no simple task.
Support levels become resistance levels once broken and likewise resistance levels become support levels when they are broken. As such, the same price levels can be either support or resistance depending on price action. According to Richard Schabacker’s classic Technical Analysis & Stock Market Profits such key levels can be described as follows:
- Support describes levels where downward trends are halted
- Resistance describes levels where upward trends are halted
- Support levels can also be called ‘demand areas’
- Resistance levels can be called ‘supply areas’
- When support levels are broken they become new resistance levels;
- When resistance levels are broken they become new support levels;
- Market psychology is a factor and price trends “remember” previous support and resistance levels.
This last one is important — certain numbers that stand out tend to become support and resistance levels. For example, $100, $500, or $1,000 may represent concrete areas of support and resistance. However, smaller whole-dollar amounts such as $15, $25, and $30 may also act as support and resistance depending on the dynamics of the underlying asset, price history, and time period.
In figure 1 we see a rising price “breaking through” a previous level to find a new range of resistance, after which the previous resistance level becomes a new support level. This figure illustrates how support and resistance levels can predict price reversals.
Real-World Example
Figure 1 illustrates a very contrived example where support and resistance levels are clearly defined. In practice, technical analysts rarely see such well-defined levels. Let’s consider the daily chart for Amazon from 2018 to midway through 2022.
In 2018, Amazon’s price rose through the bottom line, briefly pulled back and retested, before rising further to the upper line where it met at the red circle marked 2. The price then fell, rose, and fell again within this range before eventually moving up again into a new “range.”
Towards the start of 2022, the price began to fall and, in March of 2022, began to fall sharply. The price then “found support” at the point on the upper line marked 5. This is the same point on the line where, at the points marked 2 & 4, Amazon’s price had previously shown resistance.
Identifying Support & Resistance Levels
There is much debate regarding how one identifies support and resistance levels. Below are some common points to consider:
- Horizontal vs. Diagonal
- Intraday vs. Long Term
- Major vs. Minor levels
- Multiple Re-tests
How one chooses to incorporate each of these considerations greatly influences the nature of how support and resistance levels are calculated. For example, diagonal support and resistance can be powerful in helping predict small pullbacks during an uptrend. In this case, “breaking support” can be identified as a potential trend reversal.
In this weekly chart of the S&P 500 ETF, an ascending diagonal support and resistance level is established starting around November 2021 and moving through September 2022. As Figure 3 illustrates, these “trendlines” aren’t infallible and there are areas in which the price periodically rises above or below these lines.
Around September 2022, the first weekly close below this line is noted. The price recovers over the next few weeks, all the way up to the resistance line again, before entering a downtrend in the following weeks after that. The trendlines drawn here seem incredibly useful but have the benefit of hindsight. Calculating support and resistance levels in real-time is never as easy.
Drawing lines on this chart is much easier given the history available to guide one’s pencil. The phenomenon, known as Hindsight Bias a.k.a. the I knew it all along effect, typically makes drawing trendlines after the fact seem easy 1. Plotting these lines in real-time accurately enough to make trading decisions is much more difficult. For that, one needs some solid rules to fall back on.
Calculating Support, Resistance, & Trendlines
There are many ways to calculate support and resistance. Here we will focus on two primary means:
- Long-term support & resistance levels, drawn as horizontal lines.
- Shorter-term trendlines, draw as either diagonal or horizontal lines.
Long-term levels are used to help predict large price reversals marking the start and completion of price movements on longer timelines such as the daily or weekly charts. Trendlines are more useful to predict intraday movements or shorter daily movements.
Here we use K-Means clustering to identify long-term support and resistance levels. For trendlines, a combination of linear regression and minima-maxima calculation is used. Each offer different benefits but, as with many technical indicators, are more powerful when used together.
Support & Resistance via K-Means Clustering
This isn’t an article on k-means clustering and glazes over the technical details. Essentially, K-Means clustering is an algorithmic way to identify subsets within a larger set of values. See here for more details.
Here we apply K-Means clustering to identify long-term support and resistance levels in Python using the scikit-learn
library and 5-years’ worth of historical weekly Bitcoin pricing data. To get started, complete the following actions:
- Install the Scikit-Learn package:
pip install scikit-learn
- Install the Pandas library:
pip install pandas
- Install the plotly library:
pip install plotly
Download the historical data from Github courtesy of Yahoo Finance.
With these dependencies, we can load our data, perform a KMeans clustering analysis, and then plot the results for visual inspection. The last part isn’t strictly necessary but will help assess the usefulness of our results.
Load Data as DataFrame
To get started, we’ll load the data as a DataFrame
object:
import pandas as pd # Load local CSV file and parse into dataframe object btc = pd.read_csv('BTC-USD.06282017-06282022.csv') # Parse date as DateTime btc['Date'] = pd.to_datetime(btc['Date']) # Set the date as the index btc.set_index(['Date'], inplace=True) # View the results print(btc) # Output Open High ... Adj Close Volume Date ... 2017-06-26 2567.560059 2588.830078 ... 2506.469971 3393913024 2017-07-03 2498.560059 2916.139893 ... 2518.439941 5831748992 2017-07-10 2525.250000 2537.159912 ... 1929.819946 7453121024 2017-07-17 1932.619995 2900.699951 ... 2730.399902 9947990080 2017-07-24 2732.699951 2897.449951 ... 2757.179932 6942860928 ... ... ... ... ... ... 2022-06-06 29910.283203 31693.291016 ... 26762.648438 215929645934 2022-06-13 26737.578125 26795.589844 ... 20553.271484 309685915250 2022-06-20 20553.371094 21783.724609 ... 21027.294922 175909056122 2022-06-27 21028.238281 21478.089844 ... 20280.634766 42347230868 2022-06-29 20291.271484 20360.972656 ... 20055.185547 24722872320 [263 rows x 6 columns]
Note that we have more data here than expected. For 5 years each containing 52.1428 weeks we should have a total of 260.714 periods in our data. A quick glance notes the last weekly period includes less than a full week, which means we’re only off by ~2 periods. For our application here — that’s close enough.
Perform a KMeans Cluster Analysis
This step involves 2 actions; performing the analysis and assigning each price a cluster.
# Convert adjusted closing price to numpy array btc_prices = np.array(btc["Adj Close"]) print("BTC Prices:\n", btc_prices) # Perform cluster analysis K = 6 kmeans = KMeans(n_clusters=6).fit(btc_prices.reshape(-1, 1)) # predict which cluster each price is in clusters = kmeans.predict(btc_prices.reshape(-1, 1)) print("Clusters:\n", clusters) # View ouput BTC Prices: [ 2506.469971 2518.439941 1929.819946 2730.399902 2757.179932 3213.939941 4073.26001 4087.659912 4382.879883 4582.959961 ... lots of extra data rows here ... 39716.953125 39469.292969 38469.09375 34059.265625 31305.113281 30323.722656 29445.957031 29906.662109 26762.648438 20553.271484 21027.294922 20280.634766 20055.185547] Clusters: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 3 5 5 3 5 5 5 5 5 5 5 5 5 5 5 0 0 5 5 5 5 5 5 0 5 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 2 2 2 2 2 2 1 4 1 1 4 4 4 4 4 4 1 4 4 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3]
Many of the actual prices have been manually removed from the printout here to save space. However, the full printout of the clusters
variable indicates that each price has been assigned to one of the 6 total clusters. Let’s visualize this via Plotly as a scatter plot where each price has a unique color assigned based on which group it is in.
Visualizing Clusters via Plotly
Plotly integrates conveniently with DataFrame objects. One need only specify that Plotly is to be used via the set_options
method. With this configured, the following approach is taken to assign each historical price data point a unique color based on which cluster it was assigned by our KMeans
algorithm:
# Assigns plotly as visualization engine pd.options.plotting.backend = 'plotly' # Arbitrarily 6 colors for our 6 clusters colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo'] # Create Scatter plot, assigning each point a color based # on it's grouping where group_number == index of color. fig = btc.plot.scatter( x=btc.index, y="Adj Close", color=[colors[i] for i in clusters], ) # Configure some styles layout = go.Layout( plot_bgcolor='#efefef', showlegend=False, # Font Families font_family='Monospace', font_color='#000000', font_size=20, xaxis=dict( rangeslider=dict( visible=False )) ) fig.update_layout(layout) # Display plot in local browser window fig.show()
With this code executed, an instance of the Plotly viewer will launch in the default web browser of the local system:
Here we see each marker, representing a single historical price, assigned a unique color based on which cluster it was assigned. From this data, we can extract our long-term support and resistance values by finding the boundaries between the clusters.
Find Cluster Minimum & Maximum Values
Figure 4 illustrates our clusters separated by color. Conceptually, there would exist a line between each cluster as well — these are the bounding regions that will represent our support and resistance lines. To find these, we need to calculate the minimum and maximum values from each cluster, then decide how we will interpret them.
# Create list to hold values, initialized with infinite values min_max_values = [] # init for each cluster group for i in range(6): # Add values for which no price could be greater or less min_max_values.append([np.inf, -np.inf]) # Print initial values print(min_max_values) # Get min/max for each cluster for i in range(len(btc_prices)): # Get cluster assigned to price cluster = clusters[i] # Compare for min value if btc_prices[i] < min_max_values[cluster][0]: min_max_values[cluster][0] = btc_prices[i] # Compare for max value if btc_prices[i] > min_max_values[cluster][1]: min_max_values[cluster][1] = btc_prices[i] # Print resulting values print(min_max_values) # Output [[inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf], [inf, -inf]] [[15455.400391, 26762.648438], [41247.824219, 51753.410156], [29445.957031, 39974.894531], [1929.819946, 7564.345215], [54771.578125, 65466.839844], [7679.867188, 14156.400391]]
Here we see each cluster has been assigned a unique min, max
pair of prices. This is the smallest and largest price value for each uniquely-colored cluster of prices in Figure 4. These can be visualized as horizontal lines on our chart as follows. The following code adds these lines and adds a line to plot the price for added visual clarity:
import plotly.graph_objects as go # Again, assign an arbitrary color to each of the 6 clusters colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo'] # Create Scatter plot, assigning each point a color where # point group = color index. fig = btc.plot.scatter( x=btc.index, y="Adj Close", color=[colors[i] for i in clusters], ) # Add horizontal lines for cluster_min, cluster_max in min_max_values: fig.add_hline(y=cluster_min, line_width=1, line_color="blue") fig.add_hline(y=cluster_max, line_width=1, line_color="blue") # Add a trace of the price for better clarity fig.add_trace(go.Trace( x=btc.index, y=btc['Adj Close'], line_color="black", line_width=1 )) # Make it pretty layout = go.Layout( plot_bgcolor='#efefef', showlegend=False, # Font Families font_family='Monospace', font_color='#000000', font_size=20, xaxis=dict( rangeslider=dict( visible=False )) ) fig.update_layout(layout) fig.show()
This code produces the following chart — again launched in the local web browser:
These lines represent our support and resistance lines. However, there are some issues — both conceptual and practical — that we need to address:
- Can we combine lines between groups into a single line?
- Are the absolute minimum and absolute maximum good s/r lines?
- Did we choose an appropriate number of clusters?
Let’s address each of these in order.
Consolidating Boundary Lines
Figure 5 shows a horizontal line plotted for both the minimum and maximum of each cluster. However, it makes conceptual sense that a single line could separate each cluster such that the maximum value of cluster 1 (containing smaller price values) could be combined with cluster 2 (containing larger price values) as such:
Here the maximum value from cluster 1 is combined (averaged) with the minimum value from cluster 2 to form a single boundary between each cluster. Conceptually, this one would represent a resistance line for prices in cluster 1 and a support line for prices in cluster 2.
To apply this concept to our data we need to first ensure that the data is sorted such that the first min, max
pair reflects the bottom-most cluster (cluster 1) on our chart and the last min, max
pair reflects the top-most cluster (cluster 6) on our chart. This will ensure that the combination of lines happens only between neighboring clusters. The following code will combine the lines appropriately, first sorting to ensure accurate adjacency:
print("Initial Min/Max Values:\n", min_max_values) # Create container for combined values output = [] # Sort based on cluster minimum s = sorted(min_max_values, key=lambda x: x[0]) # For each cluster get average of for i, (_min, _max) in enumerate(s): # Append min from first cluster if i == 0: output.append(_min) # Append max from last cluster if i == len(min_max_values) - 1: output.append(_max) # Append average from cluster and adjacent for all others else: output.append(sum([_max, s[i+1][0]]) / 2) print("Sorted Min/Max Values:\n", output) # Resulting output Initial Min/Max Values: [[1929.819946, 7564.345215], [41247.824219, 51753.410156], [7679.867188, 14156.400391], [29445.957031, 39974.894531], [54771.578125, 65466.839844], [15455.400391, 26762.648438]] Sorted Min/Max Values: [1929.819946, 7622.106201500001, 14805.900391, 28104.3027345, 40611.359375, 53262.4941405, 65466.839844]
To assess our results we will again plot these values as before, except we use our values from output
and only call a single add_hline
per group:
# Add horizontal lines for cluster_avg in output: fig.add_hline(y=cluster_avg, line_width=1, line_color="blue")
This produces the following chart:
These lines look much cleaner and more conceptually appropriate. Now we can assess whether using the absolute minima and maxima are useful.
Assessing Absolute Minima/Maxima
The bottommost and topmost horizontal blue lines represent the smallest and largest recorded prices, respectively. These prices do not provide a useful measure of support and resistance. Arguably, the maximum value could play a role in the future but the price would need to re-enter that “cluster” for it to be relevant. To drop the first and last lines, we will alter our previous code to the following:
# Add horizontal lines for cluster_avg in output[1:-1]: fig.add_hline(y=cluster_avg, line_width=1, line_color="blue")
Re-running the code from our original chart creation using this updated add_hline
snippet, we produce the following chart:
This chart looks much more akin to those reflecting support and resistance lines. At first glance, however, these lines don’t appear to be placed in entirely useful locations. To illustrate this point, consider the same figure with manually placed support and resistance lines:
These are by no means perfect but seem to reflect areas in which price action has reversed on several occasions. Finding more appropriate boundary lines can be approached by using different cluster values.
Determining the Right Number of Clusters
There are many ways in which one can approach optimizing the number of clusters chosen from a dataset. We will use the elbow method — but know there are plenty of other options. The elbow method involves calculating a KMeans clustering for a series of different k values.
For each iteration, the sum of the squares of the distance from each cluster point to the cluster center is calculated. Fortunately, this measure is provided by the scikit-learn
KMeans class as the inertia_
property. We calculate each of these values as such:
# create a list to contain output values values = [] # Define a range of cluster values to assess K = range(1, 10) # Performa a clustering using each value, save inertia_ value from each for k in K: kmeans_n = KMeans(n_clusters=k) kmeans_n.fit(btc_prices.reshape(-1, 1)) values.append(kmeans_n.inertia_) # Output [80453452183.20088, 10385720870.392435, 4899414919.598458, 3200341434.3973494, 2038101574.9904993, 1230938654.2410448, 913656416.2350059, 711089387.2186545, 573668652.8908113]
This provides a list of really big numbers from which we can approach our optimization. As described by the scikit-learn
documentation for the KMeans class, the inertia_ property is:
[The] Sum of squared distances of samples to their closest cluster center, weighted by the sample weights if provided.
Let’s now plot this new data to assess how we might begin approaching a method to programmatically select an optimal number of KMeans clusters. We’ll use a different graphing method here, still using Plotly:
import plotly.graph_objects as go # Create initial figure fig = go.Figure() # Add line plot of inertia values fig.add_trace(go.Trace( x=list(K), y=values, line_color="black", line_width=1 )) # Make it pretty layout = go.Layout( plot_bgcolor='#efefef', showlegend=False, # Font Families font_family='Monospace', font_color='#000000', font_size=20, xaxis=dict( rangeslider=dict( visible=False )) ) fig.update_layout(layout) fig.show()
Using our list of k values as the x-axis and the resulting inertia_
values as the y, we get the following graph:
This suggests that we should use a k-value of 2 — though that hardly seems accurate. Nonetheless, let’s consider have a look:
This seems to be practically useless, producing only a single line somewhere in the $26k range. Clearly, the elbow method isn’t the best approach here. Rather than spraying and praying let’s take a more manual approach and view the results of each of our potential k values ranging from 2 to 10 (1 being totally useless.)
None of these are great. Notice that the lines representing cluster boundaries aren’t dispersed evenly — they tend to be added nearest areas in which less volatility in price has been recorded over time. In other words, vertical regions in which there is a greater density of recorded points.
Discussion
We’ve covered a fair amount of ground in this article so let’s recap. The following points of discussion have been addressed:
- Understanding support and resistance lines.
- Understanding trendlines.
- Using K-Means clustering to identify price levels at which support and resistance levels can be automatically calculated.
- Considering optimal ways in which to select the number of clusters.
- Visualizing results with Plotly
K-Means is a versatile clustering algorithm but lacks a clear directive for selecting an optimal number of clusters. As mentioned earlier, there are several common approaches by which such an optimization can be done. We tried the Elbow approach here, but other methods include:
Each of these approaches has its merits and may, or may not, be appropriate given a specific dataset. The Elbow approach is a popular method of optimizing clustering which is why it was elected here. However, it is by no means meant to be interpreted as the “best” k-means optimization technique.
Final Thoughts
K-means clustering can help distill structure from seemingly random data. It has its limits and, as was the case here, doesn’t always produce jaw-dropping results. This type of partitioning can help group data into segments that represent novel behaviors or values but don’t always hit the mark. In the context of high timeframe support and resistance calculations, it seems that K-Means is not a great solution.
There are many technical indicators used for algorithmic trading. These include the mean average divergence-convergence (MACD), stochastic oscillator, simple moving averages, and even the Bollinger Bands method. Beyond single indicators, methods such as linear regression can be used to help predict stock prices as well. Conceivably, especially in the context of multivariate linear regression, support and resistance levels could be incorporated as meaningful features into a predictive model.
References
-
Bernstein, Daniel M et al. “Hindsight bias and developing theories of mind.” Child development vol. 78,4 (2007): 1374-94. doi:10.1111/j.1467-8624.2007.01071.x