A Study of Static and Dynamic Significance Weighting Multipliers on the Pearson Correlation for Collaborative Filtering


Creative Commons License

Okyay S., Aygün S.

European Journal of Science and Technology, sa.Special Issue, ss.270-275, 2020 (Hakemli Dergi)

Özet

Recommender systems as a field of data mining and knowledge discovery have a tremendous impact on movie recommendation platforms. Proper recommendation for the audience, considering profiles, is a measurable argument. By inferencing the linear combinations between some numerical data such as user rating actions, statistical analyses can be done. Thus, any item such as a movie can be recommended or not. The numerical calculation of correlations, namely the similarity weight, should be recomputed before prediction to increase the effect of user similarities for further constant multiplications. This method is named as the significance weighting that processes one more step to stress the impact of similarities. The affinity between users can simply be the total number of co-rated items, or any further inference using more complex computations. In this work, the significance weighting method related to Pearson Correlation is inspected using comparative approaches. The MovieLens dataset, both including ML100K and ML1M releases, are used in the experiments. k-fold cross-validation method is applied in a shifting fashion to increase the number of tests. After having Pearson Correlation Coefficients for user-user similarities, weights are signified using three different approaches. Then, neighbors are sorted to choose the top-N closest users for the user in the test. Concerning experimental results, over two other techniques, an explicit method that utilizes only the co-rated item count is preferred taking its simplicity and performance into account. In the plots of experimental results section, accuracy and error metrics are presented for three different significance weighting approaches. Especially for the ML100K dataset, the simple weighting method outperforms in terms of the error metrics.