Метод багатоцільового пошуку термів в термінологічній базі

Андрій Яровий; Дмитро Кудрявцев; Андрій Яровий; Дмитро Кудрявцев

https://doi.org/10.63341/vitce/3.2024.20

Взято з Т. 21, № 3, 2024

Отримано 02.09.2024, Доопрацьовано 22.11.2024, Прийнято 26.12.2024

Метод багатоцільового пошуку термів в термінологічній базі

Андрій Яровий, Дмитро Кудрявцев

У статті досліджувався метод багатоцільового пошуку термів у термінологічній базі знань, який базується на семантичному аналізі та використанні сучасних методів обробки природної мови. Розглянуто ключові фактори, що впливають на ефективність пошуку, зокрема структуру організації даних, формат і параметри даних, а також обсяг вибірки. Особлива увага була приділена семантичній подібності між термами, що дозволяє підвищити точність пошуку за рахунок векторних представлень та алгоритму Лувена. У статті також описано застосування косинусної подібності для кількісної оцінки подібності між термами. Крім того, оптимізовано процес пошуку шляхом фільтрації релевантних баз даних і динамічного визначення релевантних термів за допомогою метрики модульності. Виконано порівняльний аналіз наявних методів пошуку термів за визначеними факторами. Відзначено переваги та недоліки використання алгоритму Лувена у порівнянні з алгоритмами пошуку в графових структурах даних. Виконано ряд експериментів на вибірках даних, включаючи словникову структуру даних, графову та мережеву структуру даних. Проаналізовано використання логістичних обмежень для пошуку в мережевих структурах даних та відзначено можливість оптимізації за рахунок рівномірного та динамічного розподілу даних. Результати експериментів показали ефективність застосування комбінації алгоритму Лувена та мережевих структур даних в термінологічних базах знань. Подано приклади сфери застосування даного методу в інформаційних технологіях пошуку та обробки текстових даних. Розроблено схему архітектури програмного забезпечення із використанням програмного інтерфейсу та можливості інтеграції для веб-застосунків у вигляді пакету чи бібліотеки. Пропонований підхід продемонстрував ефективність у контексті інтелектуальних систем підтримки рішень і автоматизованих чат-ботів, що робить його особливо корисним для галузей, де критично важливий доступ до точних фахових термів. Розроблено базову версію програмного інтерфейсу для використання даного методу в інформаційних технологіях пошуку та аналізу даних для використання в пошукових системах

термінологічна база знань; семантична подібність; алгоритм Лувена; векторні представлення; обробка природної мови

20-28

Yarovyi, A., & Kudriavtsev, D. (2024). Method of multi-purpose term search in the terminology database. Information Technologies and Computer Engineering, 21(3), 20-28. https://doi.org/10.63341/vitce/3.2024.20

Використані джерела

[1] Abdykerimova, L., Abdikerimova, G.B., Konyrkhanova, A., Nurova, G., Bazarova, M., Bersugir, M., Kaldarova, M., & Yerzhanova, A. (2024). Analysis of the emotional coloring of text using machine and deep learning methods. International Journal of Electrical and Computer Engineering (IJECE), 14, article number 3055. doi: 10.11591/ijece. v14i3.pp3055-3063 .

[2] Baqal, H., & Sidiq, M. (2024). Graph databases: Revolutionizing database design and data analysis. Current Journal of Applied Science and Technology, 43, 45-56. doi: 10.9734/cjast/2024/v43i114443.

[3] Beeram, D. (2024). Combining deep learning and heuristic search for efficient text summarization . International Research Journal of Engineering and Technology (IRJET), 11(8), 23-34.

[4] Bienvenu, M., Bourgaux, C., & Jean, R. (2024). Cost-based semantics for querying inconsistent weighted knowledge bases. In Proceedings of the 21st international conference on principles of knowledge representation and reasoning (pp. 167-177). Hanoi: CAI Organization. doi: 10.24963/kr.2024/16.

[5] Bourgaux, C., Guimarães, R., Koudijs, R., Lacerda, V., & Ozaki, A. (2024). Knowledge base embeddings: Semantics and theoretical properties. In Proceedings of the 21st international conference on principles of knowledge representation and reasoning (pp. 823-833). Hanoi: International Joint Conferences on Artificial Intelligence Organization. doi: 10.24963/ kr.2024/77 .

[6] Gabriel, A. (2020). Kensho derived Wikimedia dataset. Retrieved from https://www.kaggle.com/datasets/ kenshoresearch/kensho-derived-wikimedia-data.

[7] George, S., Elayidom, M.S., & Santhanakrishnan, T. (2019). Semantic desktop search engine using graph database . International Journal of Recent Technology and Engineering, 8(1S2), 373-375.

[8] Gupta, A., & Singh, T. (2024). Study of various frameworks to develop intelligent chatbots. International Journal of Innovative Science and Research Technology (IJISRT), 9(4), 2969-2978. doi: 10.38124/ijisrt/IJISRT24APR1290.

[9] Kaya, C., Kilimci, Z.H., Uysal, M., & Kaya, M. (2024). A review of metaheuristic optimization techniques in text classification. International Journal of Computational and Experimental Science and Engineering, 10(2). doi: 0.22399/ ijcesen.295 .

[10] Li, C., Liang, M., & Qiu, D. (2022). An intelligent search system based on knowledge graph. In 2022 International conference on artificial intelligence of things and crowdsensing (AIoTCs) (pp. 66-70). Nicosia: IEEE. doi: 10.1109/ AIoTCs58181.2022.00017 .

[11] Lindemann, N.F. (2024). Chatbots, search engines, and the sealing of knowledges. AI & Society. doi: 10.1007/s00146 024-01944-w .

[12] Mohabir, S.E., & Joshi, Y.C. (2024). A bibliometric analysis of the knowledge base on multinational corporations’ behavior. SN Business & Economics, 4, article number 105. doi: 10.1007/s43546-024-00705-7.

[13] Morayo, A., Samuel, J., Kennedy, O., Adeyinka, A., Adenugba, A., & Imhade, O. (2024). Development of an artificial intelligent health chatbot for improved telemedicine. In C. So In, N.D. Londhe, N. Bhatt & M. Kitsing (Eds.), Information systems for intelligent systems. ISBM 2023. Smart innovation, systems and technologies (Vol. 379, pp. 585600). Singapore: Springer. doi: 10.1007/978-981-99-8612-5_48 .

[14] Rathje, S., Mirea, D.-M., Sucholutsky, I., Marjieh, R., Robertson, C., & Van Bavel, J. (2024). GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences of the United States of America, 121, article number e2308950121. doi: 10.1073/pnas.2308950121.

[15] Roy, S., Bharaty, A., Sarkar, S., Sehgal, M., & Panchal, R. (2024). A hybrid ensemble approach for short-text sentiment analysis integrating deep learning and traditional machine learning methods. ResearchGate. doi: 10.13140/ RG.2.2.15182.88643 .

[16] Sattar, N.S., & Arifuzzaman, S. (2018). Parallelizing Louvain algorithm: Distributed memory challenges. In 2018 IEEE 16th Intl conf on dependable, autonomic and secure computing, 16th intl conf on pervasive intelligence and computing, 4th intl conf on Big Data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/ CyberSciTech) (pp. 695-701). Athens: IEEE. doi: 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00122.

[17] Simian, D., & Șerban, M.-E. (2024). Improving search query accuracy for specialized websites through intelligent text correction and reconstruction models. Information, 15, article number 683. doi: 10.3390/info15110683.

[18] Sutramiani, N., Arthana, I.M.T., Lampung, P.F., Aurelia, S., Fauzi, M., & Darma, I.W.A.S. (2024). The performance comparison of DBSCAN and K-Means clustering for MSMEs grouping based on asset value and turnover. Journal of Information Systems Engineering and Business Intelligence, 10, 13-24. doi: 10.20473/jisebi.10.1.13-24.

[19] Wu, L., Hu, J., Teng, F., Li, T. & Du, S. (2023). Text semantic matching with an enhanced sample building method based on contrastive learning. International Journal of Machine Learning and Cybernetics, 14, 3105-3112. doi: 10.1007/ s13042-023-01823-8 .

[20] Yarovyi, A. & Kudriavtsev, D. (2021). Multi-purpose search to determine the context of a text message based on the dictionary data structure. In 2021 IEEE 16th international conference on computer sciences and information technologies (CSIT) (pp. 65-68). Lviv: IEEE. doi: 10.1109/CSIT52700.2021.9648803.

[21] Yuehgoh, F., Djebali, S., & Travers, N. (2024). Leveraging recommendations using a multiplex graph database. International Journal of Web Information Systems, 20(5). doi: 10.1108/IJWIS-05-2024-0137.

[22] Zhang, Y. et al. (2024). A materials terminology knowledge graph automatically constructed from text corpus. Scientific Data, 11, article number 600. doi: 10.1038/s41597-024-03448-0 .

[23] Zhao, Y., & Wang, T. (2024). Knowledge base embeddings for a recommendation based on overlapping knowledge and graph learning. Arabian Journal for Science and Engineering. doi: 10.1007/s13369-024-09573-7 .