Question: I am checking a bunch of website response statuses and exporting them to a CSV file. There are a couple of websites having DNSLookupError or NO WEBSITE FOUND and not storing anything in the CSV file. How can I ...
Question: I’m trying to scrape one site which partially renders content using JS. I went ahead and found this project: https://github.com/scrapinghub/sample-projects/tree/master/splash_smart_proxy_manager_example, which quite neatly explains how to set things out. Here’s what I have right now: Docker compose: spider: And ...
Question: Is there any way to get around this error? I am using splash to grab the HTML, but the response.body returned gives me an access denied. I can view the data in chrome developer tool, but the HTML is ...
Question: I am trying to scrape products on Amazon’s Best Seller 100 for a particular category. For example – https://www.amazon.com/Best-Sellers-Home-Kitchen/zgbs/home-garden/ref=zg_bs_nav_0 The 100 products are divided into two pages with 50 products on each page. Earlier, the page was static and ...
Question: I am receiving this 504 gateway error while using splash with scrapy while learning splash where I was trying to crawl this https://www.lazada.com.my/ Could you help me, please? Splash is running on a docker container on port 8050 spider ...
Question: I am trying to run a scrapy script with splash, as I want to scrape a javascript based webpage, but with no results. When I execute this script with python command, I get this error: crochet._eventloop.TimeoutError. In addition the ...