РАЗРАБОТКА ПРОГРАММЫ СБОРА ДАННЫХ О СТРУКТУРЕ ВЕБ-САЙТОВ
Аннотация
Ключевые слова
Полный текст:
PDFЛитература
Гиперссылка. https://ru.wikipedia.org/wiki/Гиперссылка.
Горбунов А.Л. Марковские модели посещаемости веб-сайтов // Интернет-математика 2007: сб. работ участников конкурса научных проектов по информационному поиску. 2007. С. 65–73.
Левитин А.В. Алгоритмы. Введение в разработку и анализ / М.: Вильямс, 2006. 576 с.
Печников А.А., Чернобровкин Д.И. Об исследованиях веб-графа сайта // Материалы конференции «Управление в технических, эргатических, организационных и сетевых системах». 2012. С. 1069-1072.
An Extended Standard for Robot Exclusion. http://www.conman.org/people/spc/robots2.html.
Boost Multi-index Containers Library. http://www.boost.org/doc/libs/1_58_0/libs/multi_index/doc.
Baeza-Yates R., Castillo C. Crawling the Infinite Web: Five Levels are Enough // Lecture Notes in Computer Science. Algorithms and Models for the Web-Graph, Third International Workshop. 2004. Vol. 3243. P. 156-167.
ComparseR – специализированная программа. http://parser.alaev.info.
Gephi – The Open Graph Viz Platform. https://gephi.org.
HTTP/1.1: Header field definitions [Электронный ресурс]. – режим доступа: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.
HTTP 300 Status Codes [Электронный ресурс]. – режим доступа: http://developer.att.com/application-resource-optimizer/docs/best-practices/http-300-status-codes.
Y. Liu, Z.M. Ma, C. Zhou Web Markov Skeleton Processes and Their Applications // Tohoku Mathematical Journal. 2011. №63. P. 665-695.
Pant G. Crawling the Web / G. Pant, P. Srinivasan, F. Menczer // In Web Dynamics. M. Levene and A. Poulovassilis, eds. Springer, 2004. P.153-178.
Qt – Home. http://www.qt.io.
RFC 1738 - Uniform Resource Locators (URL). 1994. https://tools.ietf.org/html/rfc1738.
RFC 3986 - Uniform Resource Identifier (URI). 2005. https://tools.ietf.org/html/rfc3986.
Schonfeld U., Bar-Yossef Z., Keidar I. Do not crawl in the dust: different URLs with similar text // ACM Transactions on the Web. 2009. Vol. 3. No.1. P. 111–131.
Screaming Frog. https://www.screamingfrog.co.uk/seo-spider.
Status codes in HTTP [Электронный ресурс]. – режим доступа: https://www.w3.org/Protocols/HTTP/HTRESP.html.
Web crawler. Open-source crawlers. https://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers.
References
Hyperlink. Wikipedia. URL: https://en.wikipedia.
org/wiki/Hyperlink (accessed: 15.04.2016).
Gorbunov A. L. Markovskie modeli poseschaemosti
saitov [Markov models of website
visitation]. Internet-matematika 2007: sb. rabot
uchastnikov konkursa nauchnyh proektov po
informacionnomu poisku [Internet mathematics
: Proceedings of the contest of scientific projects
for information retrieval]. 2007. P. 65–73.
Levitin A. V. Algoritmy. Vvedenie v
razrabotku i analiz [Algorithms. Introduction to
the design and analysis]. Moscow: Vil’ams, 2006.
p.
Pechnikov A. A., Chernobrovkin D. I. Ob
issledovanijah web-grafa saita [On the research
of site web-graph]. Upravlenie v tehnicheskih,
ergaticheskih, organizacionnyh i setevyh sistemah:
materialy konferencii [Control in technical,
ergatic, organizational and network systems:
Conference proceedings]. 2012. P. 1069–1072.
An Extended Standard for Robot Exclusion.
URL: http://www.conman.org/people/spc/
robots2.html (accessed: 16.04.2016).
Boost Multi-index Containers Library. URL:
http://www.boost.org/doc/libs/1_58_0/
libs/multi_index/doc (accessed: 17.04.2016).
Baeza-Yates R., Castillo C. Crawling the
Infinite Web: Five Levels are Enough. Lecture
Notes in Computer Science. Algorithms and
Models for the Web-Graph, Third International
Workshop. 2004. Vol. 3243. P. 156–167.
ComparseR – specialized software. URL:
http://parser.alaev.info (accessed: 15.04.2016).
Gephi – The Open Graph Viz Platform. URL:
https://gephi.org (accessed: 14.04.2016).
HTTP/1.1: Header field definitions. URL:
https://www.w3.org/Protocols/rfc2616/rfc2616–
sec14.html (accessed: 17.04.2016).
HTTP 300 Status Codes. URL:
http://developer. att.com/application-resourceoptimizer/
docs/best–practices/
http–300–statuscodes (accessed: 25.04.2016).
Liu Y., Ma Z. M., Zhou C. Web Markov
Skeleton Processes and Their Applications.
Tohoku Mathematical Journal. 2011. No. 63.
P. 665–695.
Pant G., Srinivasan P., Menczer F. Crawling
the Web. In Web Dynamics. M. Levene and
A. Poulovassilis, eds. Springer, 2004. P. 153–178.
Qt – Home. URL: http://www.qt.io (accessed:
04.2016).
RFC 1738 – Uniform Resource Locators.
URL: https://tools.ietf.org/html/rfc1738
(accessed: 25.04.2016).
RFC 3986 – Uniform Resource Identifier.
URL: https://tools.ietf.org/html/rfc3986
(accessed: 16.04.2016).
Schonfeld U., Bar-Yossef Z., Keidar I. Do not
crawl in the dust: different URLs with similar text.
ACM Transactions on theWeb. 2009. Vol. 3, no. 1.
P. 111–131.
Screaming Frog. URL: https://www.screamingfrog.
co.uk/seo–spider (accessed: 25.04.2016).
Status codes in HTTP. URL: https://www.w3.
org/Protocols/ HTTP/HTRESP.html (accessed:
04.2016).
Web crawler. Open-source crawlers. URL:
https://en.wikipedia.org/
wiki/Web_crawler#Open-source_crawlers
(accessed: 14.04.2016).
DOI: http://dx.doi.org/10.17076/mat381
Ссылки
- На текущий момент ссылки отсутствуют.
© Труды КарНЦ РАН, 2014-2019