-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathBlogpost5.html
More file actions
1 lines (1 loc) · 5.35 KB
/
Blogpost5.html
File metadata and controls
1 lines (1 loc) · 5.35 KB
1
<html><head><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">ol{margin:0;padding:0}table td,table th{padding:0}.c1{padding-top:0pt;padding-bottom:0pt;line-height:1.5;orphans:2;widows:2;text-align:left;height:11pt}.c0{color:#000000;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:11pt;font-family:"Arial";font-style:normal}.c8{padding-top:0pt;padding-bottom:0pt;line-height:1.5;orphans:2;widows:2;text-align:center;height:11pt}.c10{color:#000000;font-weight:700;text-decoration:none;vertical-align:baseline;font-size:12pt;font-family:"Arial";font-style:normal}.c9{padding-top:0pt;padding-bottom:0pt;line-height:1.5;orphans:2;widows:2;text-align:center}.c4{padding-top:0pt;padding-bottom:0pt;line-height:1.5;orphans:2;widows:2;text-align:left}.c2{padding-top:0pt;padding-bottom:0pt;line-height:1.15;orphans:2;widows:2;text-align:left}.c6{color:#000000;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:11pt;font-family:"Arial"}.c7{text-decoration-skip-ink:none;-webkit-text-decoration-skip:none;color:#1155cc;text-decoration:underline}.c3{background-color:#f3cfb5;max-width:468pt;padding:72pt 72pt 72pt 72pt}.c11{color:inherit;text-decoration:inherit}.c5{font-style:italic}.title{padding-top:0pt;color:#000000;font-size:26pt;padding-bottom:3pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}.subtitle{padding-top:0pt;color:#666666;font-size:15pt;padding-bottom:16pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}li{color:#000000;font-size:11pt;font-family:"Arial"}p{margin:0;color:#000000;font-size:11pt;font-family:"Arial"}h1{padding-top:20pt;color:#000000;font-size:20pt;padding-bottom:6pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}h2{padding-top:18pt;color:#000000;font-size:16pt;padding-bottom:6pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}h3{padding-top:16pt;color:#434343;font-size:14pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}h4{padding-top:14pt;color:#666666;font-size:12pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}h5{padding-top:12pt;color:#666666;font-size:11pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2;text-align:left}h6{padding-top:12pt;color:#666666;font-size:11pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;font-style:italic;orphans:2;widows:2;text-align:left}</style></head><body class="c3 doc-content"><p class="c2"><span class="c7"><a class="c11" href="https://www.google.com/url?q=https://diogo-code.github.io/&sa=D&source=editors&ust=1681469973674303&usg=AOvVaw2kReRVSHBsnGZo8T_IW79_">Return to main page</a></span></p><p class="c1"><span class="c0"></span></p><p class="c9"><span class="c10">Final Design Implementation</span></p><p class="c4"><span class="c5">Overcoming Search Limitations:</span><span class="c0"> To overcome search limitations, we divide the web scraping tasks to be run from two separate domains with different IP addresses to avoid triggering search limits. This way we reduce the chances of getting blocked or facing restrictions from Google.</span></p><p class="c1"><span class="c0"></span></p><p class="c4"><span class="c5">Writing output errors into log file: </span><span class="c0">The log file provides a documented record of errors encountered during the web scraping process. This can be helpful for demonstrating accountability to 6Grain, who require insights into the performance and reliability of our web scraper. Additionally, websites often update their structures, which can break the web scraper. An error log file can help you identify when a website's structure has changed, allowing us to quickly update the code to adapt to the new structure and continue scraping data.</span></p><p class="c1"><span class="c0"></span></p><p class="c4"><span class="c5">Saving Scraped Websites:</span><span class="c0"> Having the scraped websites saved helps us avoid redundant scraping operations, reducing the time and resources spent on extracting the same information. This makes the web scraping process more efficient and cost-effective.</span></p><p class="c8"><span class="c10"></span></p><p class="c9"><span class="c10">Document Chapter 5</span></p><p class="c4"><span class="c5">Challenges: </span><span class="c0">Challenges we faced with the code consist of duplicates, ensuring we can get around HTTPS limits and filtering out documents to download.</span></p><p class="c1"><span class="c0"></span></p><p class="c4"><span class="c5">Future Work: </span><span class="c0">Adding alternative language to the search tags could broaden the data collection as it would help the web scraper collect a wider range of data from diverse sources.</span></p><p class="c1"><span class="c5 c6"></span></p><p class="c4"><span class="c5">Project Importance: </span><span class="c0">Data-driven decision-making, supply chain efficiency and climate change adaptation. By harnessing this valuable data, farmers can make better-informed decisions and promote more sustainable and productive agricultural practices.</span></p><p class="c1"><span class="c0"></span></p></body></html>