Buying a garage as an investment

This project was born after a conversation with friends about investing in real estate. We discussed how profitable it is to buy an apartment, a parking lot or a keller for rent, and whether it is profitable at all.

I decided to analyze the market for the sale and rental of garages and parking spaces in my city. Apartments are too expensive investment objects, and as for garages and parking lots, there is much less “entrance”, and there seems to be always a demand for rent.

When evaluating investments, two parameters always come to the fore: the volume of investments (the cost of the object) and the rent. Knowing these parameters, you can calculate the return on investment. Of course, there are other parameters, such as the quality of the object, the level of demand for rent, the dynamics of the value of the object over time, etc. but it is the purchase price and rental income that determine whether such an object is worth considering at all.

I decided to find among the ads for sale, on the one hand, such objects for which the price is somewhat underestimated for some reason. For example, when putting up an object, the owner did not evaluate neighboring offers, or he needs to sell the object urgently. On the other hand, I was interested in those objects that are geographically located in areas of high rental demand, where rental rates are high.

To solve this problem, I downloaded advertisements for the sale and rent of garages and parking lots in Yekaterinburg, parsed from several popular real estate sites, over the past month. Received, respectively, two dataframes – objects for sale and objects for rent. Among other data in the datasets, the following information was presented:

  • Name – the name assigned by the ad platform automatically when the ad is published

  • Price – the price determined by the owner when submitting the ad

  • date – date of announcement

  • Description – description of the object from the seller in any form

  • lat – geographical latitude of the location of the object in degrees

  • lng – geographical longitude of the object location point in degrees

  • Additional parameters – characteristics of the object, compiled by the ad platform automatically according to the data that the seller enters when submitting the ad

What problems did I encounter during data processing?

1. There were many duplicates in the datasets, which were obtained, for example, by the fact that the owners of garages posted ads on several sites at once. However, they use the same description and set the same price. Therefore, such ads can be recognized by the simultaneous coincidence of these two parameters. Duplicates are removed from the dataset.

df_sel = df_sel.drop_duplicates(subset=[‘Описание’, ‘Цена’])

2. Some of the objects are double, or even triple places. That is, those that can accommodate two cars, and which, accordingly, can be rented to two tenants. The problem was that among the parameters of the objects there was no sign indicating whether the object was single or double. In the additional parameters, the area was indicated. Therefore, I had to pull it out of there and write it in an additional column.

df_sel['Доп.параметры'] = df_sel['Доп.параметры'].where(df_sel['Доп.параметры'].str.contains('Площадь'), other=np.nan)

However, some of the advertisements did not have information about the area.

Despite the fact that there are quite a few objects with an indefinite area, about 5%, we cannot simply discard them. The fact is that the presence or absence of these objects is of little importance for the entire market, but locally in their microdistricts, accounting or not accounting for these objects can give incorrect price signals. Therefore, the task of determining what type these objects belong to (single or double) remains.

I made the assumption that a place is considered double if its area is greater than or equal to 30 m2. And, if the area is less than 30 m2, then we consider this a single place. Let’s see how single parking lots, double parking lots and parking lots with an indefinite area are distributed by prices.

As can be seen from the graph, the distribution curve for double parking is shifted to the right, which is quite obvious, because. most often they are more expensive than single ones. The distribution curve of objects for which the area is not determined is more like the distribution curve of single objects. However, it cannot be concluded from this that all objects with an indefinite area are single parking lots, since the distribution curve shows that there are very expensive objects among them.

Therefore, it is necessary to allocate double places from objects with an indefinite area in a different way. Let’s try to highlight the words that are typical for single and double objects from the descriptions given by the owners. For this we use the class counter module collections. Let’s display the TOP-30 words that are used by the owners when describing their objects. Words are sorted in descending order of frequency of use.

It cannot be said that this method made it possible to uniquely differentiate objects by the number of places, but the results are quite expected. For example, the fact that the words “2” and “two” are in the first and second place in terms of the most frequently occurring words among two-place objects. Also, it is interesting that in third place is the word “viewing”. Indeed, inspection pits are more common in large garages. (For single objects “observation” only on the 27th place).

Now let’s compile a list of words that are most likely to be found in the description of double parking lots. I took the following words and phrases:

'2 машиноместа', '2 паркинга', '2 гаража', 'двойное', 'семейное', 'семейный', 'два', 'две', 'Два', 'места'

Ads with an indefinite area will be divided into single and double ads according to the following criteria: if their description contains words from the list above, then this object is double, otherwise it is single. Then we will see if the distribution of ads with a newly defined type corresponds to the distribution of objects with a previously known area.

The graphs show that the distribution curves of ads with newly defined features and ads for which the feature was previously known are similar to each other.

Based on the attribute single/double, we divide all sales announcements into two dataframes – single and double parking lots.

Thus, according to the results of data processing, we received the following statistics on the objects offered for sale:

Now we do all the same actions with objects that are rented out and get the following statistics on them.

Search for investment objects

The task of determining the objects most interesting for investment is as follows. Among all the objects put up for sale, you need to find those that are located in areas of increased rental demand. Those. where the highest rental rates are formed. And if the ratio of the price of an object for sale and the average (or median) rental price in this zone is at a certain level, which ensures a return on investment within a given timeframe, then such an object is promising in terms of investment for renting it out.

To solve this problem, it is necessary to form rental zones (clusters) on a territorial basis. To form clusters, we use a dataframe of single-site objects for rent. To do this, we use the latitude and longitude of the location of objects. Partitioning into clusters is carried out by the KMeans algorithm.

We select the number of clusters that will provide the most efficient partitioning. Efficiency is determined by the silhouette_score metric. Consider splitting into 50, 100, 150, 200, 250, 300, 350, 400, 450 and 500 clusters (naturally, in a cycle). For each split, calculate the silhouette_score and cross-check error best_cv_err. The search for the best value can be done using the GridSearchCV class – searching for the best set of parameters that provide a minimum of cross-check error.

After each partition, we check on the validation set. As a training model, we use the K nearest neighbors method of KNeighborsClassifier. And for starters, we select the best parameter n_neighbors – the same number of neighbors by which the model determines which cluster the object should belong to. Based on the results of the check on the validation sample, we calculate the parameter R2 and the accuracy.

The results are summarized in a table and displayed on a graph.

The most efficient clustering is achieved with the number of clusters equal to 150. The accuracy and silhouette_score value at this value are almost maximum. We divide single-occupancy objects for rent into 150 clusters.

km = KMeans(n_clusters=150, random_state=0) # задаём число кластеров, равное 150,
                                            # и фиксируем значение random_state для воспроизводимости результата
labels = km.fit_predict(X_sc) # применяем алгоритм к данным и формируем вектор кластеров
# сохраняем метки кластера в поле нашего датасета
df_rent_single_slot['rent_cluster'] = labels

We put clusters on the map with color gradation depending on the average cost of rent within the cluster. For graphical display, we use a map of Yekaterinburg in shape format and apply the methods of the GeoPandas library. The geographic centers of clusters are the average values ​​of latitudes and longitudes of objects belonging to this cluster. The objects themselves are marked with red dots – parking lots for rent.

Now that we have defined clusters, let’s create a model that can determine which cluster this object belongs to by the coordinates of an object.

As a training model, we use the K nearest neighbors method of KNeighborsClassifier. The n_neighbors parameter is taken equal to 1 – this is the value that provides the smallest cross-check error. Let’s check what the error on the test sample is equal to for this parameter value

# создаем обучающую и валидационную выборки
Xtrain, Xtest, ytrain, ytest = train_test_split(df_rent_single_slot[['lat','lng']],
                                                df_rent_single_slot['rent_cluster'],
                                                test_size=0.2)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
validates = knn.predict(X_test)
print('R2: {:.2f}'.format(r2_score(y_test, validates)))
print('Точность: {:.2f}'.format(accuracy_score(y_test, validates)))
R2: 0.9
Точность: 0.96

Based on the coordinates of the object, our model determines which cluster the object belongs to with an accuracy of 96%.

For each object from the df_sel_single_slot dataset (sale of single parking lots), we determine which of the clusters this object belongs to. And the same for double parking lots. We determine for each object put up for sale the payback period in months. The payback period is found by dividing the cost of the object by middle cluster rental rate (to calculate the payback period for double car parks, their cost is divided by double the average rental rate in the cluster). The distribution of objects by payback period is as follows.

Now that we know the distribution of objects for rent by clusters, we compile a table with the characteristics of clusters. We are interested in the investment attractiveness of the cluster. Let us assume that the cluster is the more attractive, the higher the cost of rent in it and the shorter the period of rental exposure in it. This follows from the assumption that if there is a high rental price in a certain cluster, and objects are rented out quickly, then in this cluster there is an increased demand for parking rentals.

We assign a rating to each cluster. To calculate the rating, the average rental prices in clusters are ranked from 0 to 1, where 0 is the lowest rental price, 1 is the highest. Similarly, the median terms of ad exposure are ranked, 0 is the longest exposure period, 1 is the shortest. Then these indicators are summed up and form a rating. Based on the fact that the cluster rating lies in the range from 0 to 2, the “best” clusters are those whose rating is higher than 1.

Let’s discard objects like “vegetable storage”, metal garages, etc. from potential objects for investment, since it is problematic to rent them out. Let’s select objects, the payback period of which does not exceed 60 months and sort them in descending order of the cluster rating.

As a result, we received the most promising objects in terms of investment – 44 single ads and 9 double ads. Let’s display these objects on the map, at the same time designating the best clusters.

Selective analysis of results

Now we can consider any of the selected objects and analyze it separately. For example, let’s consider an object number 42. In the cluster where this object is located, 5 more objects are also sold and 3 objects are rented – a potential competitor.

Let’s consider these objects on a map fragment.

In addition, let’s see how the object of interest to us correlates with neighboring ones in terms of payback period and area.

If a decision is made to invest in this object, we believe that we will rent it out at the average price in the cluster (dotted line on the graph). Let’s see how it will compare with competitors in terms of price and area.

conclusions

As a result of the work done, out of 2,200 advertisements for the sale of garages and parking lots, 53 objects of interest for investment were selected. This dramatically facilitates the selection of an object for investment, taking into account that the selection is based on the fact that investors’ investments pay off as quickly as possible, and the acquired objects are in demand on the rental market.

In addition to the main task solved, we managed to highlight such areas on the city map where there is a shortage of supply for parking rentals and high prices are formed. This information could be useful to those engaged in construction in determining the sale price of their properties.

In addition, we received a tool for a brief visual analysis of a particular object.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *