Received 03.09.2024, Revised 26.11.2024, Accepted 26.12.2024

Optimising fuzzy hash function parameters for ensuring compliance with Open Data Regulations

Leonid Maidanevych, Natalia Kondratenko, Vitaliy Kazmirevsky

The aim of this study was to investigate the parameters of the hash function to enhance the efficiency and accuracy of detecting similarities in text fragments across various web resources when monitoring compliance with the requirements of the Regulation on Open Data on official government websites. The research focused on assessing three key parameters of the hash function: block size, prime number base, and modulus. To achieve this, a series of experiments was conducted, employing different combinations of these parameters to generate hash values for text data. The results demonstrated which parameter combinations provide the best balance between accuracy, completeness, F-measure, and execution time. The study showed that specific parameter configurations enable a significant improvement in algorithm accuracy while minimising computational costs, which is particularly important for real-time data analysis. It is established that optimising the parameters of the hash function reduces the occurrence of false positives and false negatives, which are common issues in similarity detection. In particular, selecting optimal values for each parameter significantly enhances the accuracy and completeness of the analysis, leading to more precise text fragment comparisons and reduced execution time. This optimisation makes the fuzzy hashing algorithm well-suited for use in automated systems that monitor government websites for compliance with open data regulations. Furthermore, the study found that parameter optimisation decreases the number of duplicate records, which is especially relevant for ensuring that open data adheres to legislative requirements. The conclusions drawn from this research can be applied to the development of software tools designed to efficiently identify deficiencies and improve transparency and legal compliance. Additionally, the findings can contribute to further optimisation of fuzzy hash function algorithms, thereby advancing data monitoring technologies for regulatory compliance. This study enhances the development of web resource monitoring technologies by demonstrating how the careful selection of fuzzy hash function parameters can substantially improve the efficiency and reliability of open data analysis

fuzzy hash function parameters; website monitoring; government electronic resources; algorithm accuracy; optimization parameters; similarity detection; violation of provisions
65-76
Maidanevych, L., Kondratenko, N., & Kazmirevsky, V. (2024). Optimising fuzzy hash function parameters for ensuring compliance with Open Data Regulations. Information Technologies and Computer Engineering, 21(3), 65-76. https://doi.org/10.63341/vitce/3.2024.65

References

[1] AlMajali, A., Elmosalamy, A., Safwat, O., & Abouelela, H. (2024). Adaptive ransomware detection using similaritypreserving hashing. Applied Sciences, 14(20), article number 9548. doi: 10.3390/app14209548.

[2] Baba, T., Baba, K., & Yamauchi, T. (2022). Malware classification by deep learning using characteristics of hash functions. In: L. Barolli, F. Hussain & T. Enokido, (Eds.), Advanced information networking and applications (Vol. 450, pp. 480-491). Cham: Springer. doi: 10.1007/978-3-030-99587-4_40.

[3] Chanajitt, R., Pfahringer, B., Gomes, H.M., & Yogarajan, V. (2022). Multiclass malware classification using either static opcodes or dynamic API calls. In: H. Aziz, D. Corrêa & T. French (Eds.), AI 2022: Advances in artificial intelligence (Vol. 13728, pp 427-441). Springer, Cham. doi: 10.1007/978-3-031-22695-3_30.

[4] Chen, J., Fontugne, R., Kato, A., & Fukuda, K. (2014). Clustering spam campaigns with fuzzy hashing. In Proceedings of the 10th Asian internet engineering conference (pp. 66-73). New York: ACM. doi: 10.1145/2684793.2684803.

[5] Davies, S.R., Macfarlane, R., & Buchanan, W.J. (2021). Review of current ransomware detection techniques. In Proceeding of the 7 th international conference on engineering and emerging technologies (ICEET) (pp. 696-701). Istanbul: IEEE. doi: 10.1109/ICEET53442.2021.9659643.

[6] Eleks, M., Rebstadt, J., Fukas, P., & Thomas, O. (2022). Learning without looking: Similarity preserving hashing and its potential for machine learning in privacy critical domains. In INFORMATIK 2022lecture notes in informatics (LNI) (pp.161-177). Bonn: IBiS. doi: 10.18420/inf2022_16.

[7] Fleming, M., & Olukoya, O. (2024). A temporal analysis and evaluation of fuzzy hashing algorithms for Android malware analysis. Forensic Science International: Digital Investigation, 49, article number 301770. doi: 10.1016/j.fsidi.2024.301770.

[8] Guerrero, M. (2022). Comparative study between Type-1 and interval Type-2 fuzzy systems in parameter adaptation for the Cuckoo search algorithm. Symmetry, 14(11), article number 2289. doi: 10.3390/sym14112289.

[9] Kida, M., & Olukoya, O. (2023). Nation-state threat actor attribution using fuzzy hashing. IEEE Access, 11, 1148-1165. doi: 10.1109/ACCESS.2022.3233403.

[10] Kondratenko, N.R. (2023). Interval type-2 generalizing fuzzy model for monitoring the states of complex systems using expert knowledge. System Research and Information Technologies, 2. doi: 10.20535/SRIT.2308-8893.2023.2.05.

[11] Kondratenko, N.R., & Snihur O.O. (2019). Research on the adequacy of interval type-2 fuzzy models in identifying complex objects. System Research and Information Technologies, 4, 94-104.

[12] Kumar, K.V., Harikiran, J., & Chandana, B.S. (2022). Human activity recognition with privacy preserving using deep learning algorithms. In 2nd international conference on artificial intelligence and signal processing (AISP) (pp. 1-8). Vijayawada: IEEE. doi: 10.1109/AISP53593.2022.9760596.

[13] Li, T.-Z., Shen, B., Mi, K., Kao, Y.-C., & Cui, Y. (2019). A method of piecewise hash for fuzzy hashing. Journal of Computers, 30(2), 150-157. doi:10.3966/199115992019043002013.

[14] Mahrous, W.A., Farouk, M., & Darwish, S.M. (2021). An enhanced blockchain-based IoT digital forensics architecture using fuzzy hash. IEEE Access, 9, 151327-151336. doi: 10.1109/ACCESS.2021.3126715.

[15] Martín-Pérez, M., Rodríguez, R.J., & Breitinger, F. (2021). Bringing order to approximate matching: Classification and attacks on similarity digest algorithms. Forensic Science International: Digital Investigation, 36, article number 301120. doi: 10.1016/j.fsidi.2021.301120.

[16] Ministry of Digital Transformation of Ukraine. (n.d.). Retrieved from https://thedigital.gov.ua/.

[17] Naik, N., Jenkins, P., & Savage, N. (2019b). A ransomware detection method using fuzzy hashing for mitigating the risk of occlusion of information systems. IEEE international symposium on systems engineering. (ISSE) (pp. 1-6). Edinburgh: IEEE. doi:10.1109/ISSE46696.2019.8984540.

[18] Naik, N., Jenkins, P., Gillett, J., Mouratidis, H., Naik, K., & Song, J. (2019a). Lockout-Tagout Ransomware: A detection method for Ransomware using fuzzy hashing and clustering. IEEE symposium series on computational intelligence (SSCI) (pp. 641-648). Xiamen: IEEE. doi: 10.1109/SSCI44817.2019.9003148.

[19] Namanya, A.P, Awan, I.U., Disso, J.P., & Younas, M. (2020). Similarity hash based scoring of portable executable files for efficient malware detection in IoT. Future Generation Computer Systems, 110, 824-832. doi: 10.1016/j.future.2019.04.044.

[20] Nandal, A., Blagojevic, M., Milosevic, D., Dhaka, A., & Mishra, L.N. (2021). Fuzzy enhancement and deep hash layer based neural network to detect Covid-19. Journal of Intelligent & Fuzzy Systems, 41(1), pp. 1341-1351. doi: 10.3233/JIFS-210222.

[21] Natella, R. (2022). StateAFL: Greybox fuzzing for stateful network servers. Empirical Software Engineering, 27, article number 191. doi: 10.1007/s10664-022-10233-3.

[22] National Bank of Ukraine. (n.d.). Retrieved from https://bank.gov.ua.

[23] Open Data Portal. (n.d.). Retrieved from https://data.gov.ua.

[24] Pension Fund of Ukraine. (n.d.). Retrieved from https://www.pfu.gov.ua.

[25] Resolution of the Cabinet of Ministers of Ukraine No. 835 “On Approval of the Regulation on Data Sets Subject to Disclosure in the Form of Open Data”. (2015, October). Retrieved from https://zakon.rada.gov.ua/laws/show/835-2015-%D0%BF#Text.

[26] Ssdeep-project. (n.d.). Fuzzy hashing API. Retrieved from https://github.com/ssdeep-project/ssdeep.

[27] State Service of Special Communications and Information Protection of Ukraine. (n.d.). Retrieved from https://cip.gov.ua.

[28] State Statistics Service of Ukraine. (n.d.). Retrieved from https://ukrstat.gov.ua.

[29] State Tax Service of Ukraine. (n.d.). Retrieved from https://tax.gov.ua.

[30] Verkhovna Rada of Ukraine. Official Web Portal of the Parliament of Ukraine. (n.d.). Retrieved from https://www.rada.gov.ua/.