Algorithms for searching and analysing information from open sources in the context of cyber threats
Oksana Onyshchuk, Lyudmila HlynchukThe article presented the development of an algorithm for searching and analysing information from open sources in the context of cyber threats. The proposed algorithm is an effective tool for detecting, monitoring, assessing and neutralising threats in the digital environment. The work described the main stages of the algorithm, which include: quick access to relevant information, assessment of data reliability, trend analysis, identification of connections between objects, and prediction of potential threats. The development of the algorithm involved searching for and analysing information from open sources; noise filtering; contextual analysis and cross-checking to improve the reliability of results; and constructing relationship graphs to identify dependencies between objects and determine their potential danger. These tasks were implemented by collecting data through API (Application Programming Interface), web scraping (BeautifulSoup), using search operators, processing data with NLP (Natural Language Processing) tools, and classification using machine learning models and regular expressions. The article analysed the information obtained using relationship graphs, identifies key objects and evaluates the reliability of sources. The developed algorithm reduced the time required for searching and analysing information, increased the relevance and accuracy of the data obtained, and provides effective support for cybersecurity decisions. As an example, the algorithm was applied to monitor suspicious job postings on the LinkedIn platform, where phishing ads containing invalid links or false information were detected. The use of the LinkedIn API and web scraping made it possible to automate the collection of job postings and compare them with a database of known phishing websites. The developed algorithm reduced the time spent searching for and analysing information compared to manual methods. The implementation of such solutions helps prevent cyber threats and ensure security in the digital environment. The algorithm significantly improves the efficiency of working with open sources, providing an automated process for collecting, processing, and analysing data for further threat assessment
References
[1] Alazab, M., Abu Khurma, R., García-Arenas, M., Jatana, V., Baydoun, A., & Damaševičius, R. (2024). Enhanced threat intelligence framework for advanced cybersecurity resilience. Egyptian Informatics Journal, 27(3), article number 100521. doi: 10.1016/j.eij.2024.100521.
[2] AlSalem, T.S., Almaiah, M., & Lutfi, A. (2023). Cybersecurity risk analysis in the IoT: A systematic review. Electronics, 12(18), article number 3958. doi: 0.3390/electronics12183958.
[3] Bazzell, M. (2021). Open source intelligence techniques: Resources for searching and analyzing online information. Washington: IntelTechniques.
[4] Censys. (n.d.). Retrieved from https://censys.io.
[5] Chen, H., Shen, Z., Wang, Y., Hu, K., & Xu J. (2024). Threat detection driven by artificial intelligence: Enhancing cybersecurity with machine learning algorithms. World Journal of Innovation and Modern Technology, 7(6), 58-70. doi: 10.53469/wjimt.2024.07(06).09.
[6] Clemen, J.M., & Teleron, J. (2023). Advancements in encryption techniques for secure data communication. International Journal of Advanced Research in Science Communication and Technology, 3(2), 444-451. doi: 10.48175/ IJARSCT-13875.
[7] Dey, A.K., Gupta, G.P., & Sahu, S.P. (2023). Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in IoT-enabled networks. Procedia Computer Science, 218, 318-327. doi: 10.1016/j.procs.2023.01.014.
[8] FOCA. (2022). FOCA – metadata extraction tool. Retrieved from https://www.elevenpaths.com.
[9] Google Search Operators Cheat Sheet. (2021). Retrieved from https://surl.li/bmjxzv.
[10] Goyal, P., Hossain, K.S.M.T., Deb, A., Tavabi, N., Bartley, N., Abeliuk, A., Ferrara, E., & Lerman, K. (2018). Discovering signals from web sources to predict cyber attacks. ArXiv. doi: 10.48550/arXiv.1806.03342.
[11] Islam, M.T., Niger, M., Kynatun, M., & Mission, M.R. (2025). Systematic review of cybersecurity threats in IoT devices focusing on risk vectors, vulnerabilities, and mitigation strategies. American Journal of Scholarly Research and Innovation, 1(1), 108-136. doi: 10.2139/ssrn.5190439.
[12] Khan, R., Kumar, P., Jayakody, D.N.K., & Liyanage, M. (2019). A survey on security and privacy of 5G technologies: Potential solutions, recent advancements and future directions. IEEE Communications Surveys & Tutorials, 22(1), 196-248. doi: 10.1109/COMST.2019.2933899.
[13] Kovalchuk, D. (2025). Utilising large language models for automated real-time cyber threat analysis. Bulletin of Cherkasy State Technological University, 30(1), 48-58. doi: 10.62660/bcstu/1.2025.48
[14] Kruse, C.S., Frederick, B., Jacobson, T., & Monticone, D.K. (2017). Cybersecurity in healthcare: A systematic review of modern threats and trends. Technology and Health Care, 25(1), 1-10. doi: 10.3233/THC-161263.
[15] Kumar Birthriya, S., Ahlawat, P., & Kumar Jain, A. (2024). An efficient spam and phishing email filtering approach using deep learning and bio-inspired particle swarm optimization. International Journal of Computing and Digital Systems, 15(1). doi: 10.12785/ijcds/150144.
[16] Majumder, G., Pakray, P., & Pinto, D. (2019). Measuring interpretable semantic similarity of sentences using a multi chunk aligner. Journal of Intelligent & Fuzzy Systems, 36(5), 4797-4808. doi: 10.3233/JIFS-179028.
[17] Maltego technologies. (n.d.). Maltego evidence user manual. Retrieved from https://support.maltego.com/en/support/ solutions/folders/15000013724.
[18] Nagy, A., Du, X., Wang, X., Oates, M., Aronson, S., Plasek, J., Babb, L., Rehm, H., Zhou, L., & Lebo, M. (2025). P642: Facilitating machine learning and artificial intelligence in genetic databases: An open-source tool for data integration and summarization. Genetics in Medicine Open, 3, article number 103011. doi: 10.1016/j.gimo.2025.103011.
[19] Onoh, G. (2018). Predicting cyber-attacks using publicly available data. Journal of the Colloquium for Information System Security Education (CISSE), 6(1).
[20] OSINT Framework. (n.d.). Retrieved from https://osintframework.com.
[21] Recon-ng. (2022). Recon-ng framework documentation. Recon-ng. Retrieved from https://www.recon-ng.com.
[22] Shaukat, K., Luo, S., Varadharajan, V., Hameed, I.A., & Xu, M. (2020). A survey on machine learning techniques for cyber security in the last decade. IEEE Access, 8, 222310-222354. doi: 10.1109/ACCESS.2020.3041951.
[23] Shodan. (n.d.). Search engine for the Internet of everythings. Retrieved from https://www.shodan.io.
[24] Spiderfoot. (2022). Spiderfoot OSINT Framework. GitHub. Retrieved from https://github.com/smicallef/spiderfoot.
[25] Vashishtha, L.K., & Chatterjee, K. (2025). Strengthening cybersecurity: TestCloudIDS Dataset and SparkShield algorithm for robust threat detection. Computers & Security, 151, article number 104308. doi: 10.1016/j. cose.2024.104308.
[26] Yadav, A, Kumar, A. & Singh, V. (2023). Open-source intelligence: A comprehensive review of the current state, applications and future perspectives in cyber security. Artifcial Intelligence Review, 56, 12407-12438. doi: 10.1007/ s10462-023-10454-y.
[27] Yang, T., Qiao, Y., & Lee, B. (2024). Towards trustworthy cybersecurity operations using Bayesian Deep Learning to improve uncertainty quantification of anomaly detection. Computers & Security, 144, article number 103909. doi: 10.1016/j.cose.2024.103909.
[28] Zaplatynskyi, N., Lub, P., & Zaporozhtsev, S. (2024). Improving cybersecurity with artificial intelligence. Bulletin of Cherkasy State Technological University, 29(4), 53-61. doi: 10.62660/bcstu/4.2024.53.