Quantcast
Channel: Active questions tagged web-scraping - Stack Overflow
Browsing latest articles
Browse All 969 View Live

Web scraping of the disciplinary assignment of university courses

I would like to scrape a university-course catalog with R. My code is already quite good, but the assignment of courses to disciplines and subdisciplines does not yet work the way I want it to.This is...

View Article



Web Scraping with Selenium and Python

I have been struggling with web scraping by using Selenium. The target website contains a responsive table in which I will have to gather the data. the html codes look like this (please forgive me for...

View Article

How to bypass cloudflare verification in scrapy-selenium?

I try to scrap professionnal numbers from a french website but I get 403 error and I get blocked by Clouflares. I use Selenium and Scrapy. I added the scrapy cloudflares middleware but it still does...

View Article

How to disable autocorrect in google search python API

I am trying to search certain prompts on google using its python library. However, I am getting a response after autocorrect.Is there any way to disable autocorrect while using google search API like...

View Article

Is there a way to see if website is static or dynamic programatically [closed]

I'm currently deep in the development of a scraping API, and just yesterday, I encountered a rather perplexing dilemma. As I delved into the intricacies of Puppeteer, I noticed that certain requests...

View Article


Image may be NSFW.
Clik here to view.

More elements with the same id, how to scrape

I'm scraping for an uni project and I wanted to scrape the genres of tvshows from IMbd. In the image you can find the html. I used the following code:url1 =...

View Article

Puppeteer hangs on waitForSelector on production (google maps)

I'm using Pupeteer to pull data from google maps.it's working perfectly on my local machine,however after pulling to production I ended up on the following errorError during scraping: ProtocolError:...

View Article

Pyton request-html is not downloading Chromium

import requestsfrom bs4 import BeautifulSoupfrom requests_html import HTMLSessionurl="https://dmarket.com/ingame-items/item-list/csgo-skins?title=recoil%20case"sesion = HTMLSession()response =...

View Article


Unable to locate element when the webpage navigate from one section to...

<ul><li class="first current"><a class="jump"> Crop</a></li><li class=""><a class="jump"> Soil</a></li><li class=""><a class="jump">...

View Article


Amazon Product API: Obtaining PPC Keywords by ASIN

I am trying to find PPC keywords by ASIN number. There are some tools do this, and when I tried this tool, It works for every ASIN. These tools give some details like below; PPC keywordHistory highest...

View Article

Updating foreign keys MySQL

I currently am working on a web scraper that takes premier league data and imports it into a mySQL database, but I am having issues updating foreign keys in some of the tables. For some context, I have...

View Article

How can I fix this problem on Instagram api app?

I'm trying to make an Instagram scraperimport httpxurl = "https://i.instagram.com/api/v1/users/search/?timezone_offset=10800&q=cris&count=50"headers = {"X-Pigeon-Session-Id":...

View Article

pyinstaller ModuleNotFoundError: 'fake_useragent.data'

above all, thanks for coming.I'm doing web-scrapping and making it into .exe filecodes works well but not in .exe filewhat I want : working exe file, avoid bot detectionwhat I'm doing :I'm making...

View Article


Trying to extract article text from a list of urls

enter image description hereimport requestsfrom bs4 import BeautifulSoupurls = df.loc[:,"news_url"].tolist()for url_index in len(urls): req = requests.get(urls[url_index]) soup =...

View Article

Image may be NSFW.
Clik here to view.

Puppeteer: cannot get response from popup

I'm using puppeteer to scrape PDFs from a website.When i find a certain selector i have to .click() on it so that the PDF is somehow generated and displayed on a popup window.The first problem is that...

View Article


How to send a Post requests with headers and payload in scrapy

I am trying to send a post requests to Graph API and I am succeed in it but I want to send the same requests in scrapy but I don't know how to send a post requests in scrapy with headers and...

View Article

How to use Selenium for Opta data?

I'm trying to get Opta event data from a football (soccer) match. It has a link address but when trying to access it I just get a 10300 Error code.The only place I can view the data is under 'Sources'...

View Article


Scrape google knowledge graph with Rselenium

I am trying to access the elements on the rhs of the google search, sometimes called the knowledge graph. In particular I am interested in the short bio (normally a wikipedia snippet) and the external...

View Article

Avoid CHALLENGE Url from LinkedIn Voyager API when using Google Cloud Run...

ContextFor a University term project, I want users to be able to pass in a url to their Linkedin profile and then have my application retrieve all the data on their profile which is then used later in...

View Article

How to Read Large Files from Servers in Chunks Using Python?

I am working on a hobby project and trying to develop a GPS/GNSS Receiver. The first module of the receiver has two different options i.e. ask the user to connect an SDR or if the user doesn't have an...

View Article

Scrapy web without nodes / href attributes

Trust you are doing well!I need your support please, I'm trying to scrape this web page: https://servicio.mapa.gob.es/regfiweb#Once you enter, you must go to:Buscadores.Productos.I'd like to download...

View Article


while scarping the data from the website , every listing have a button and...

popup open but not closed, please help me to close the popup. only issue is coming here popup not closed.color="accent" class="mat-focus-indicator mat-button mat-button-base mat-accent">... is not...

View Article


How to write heading along with paragraph after scrapping a website when all...

I am trying to scrape this link using selenium -"https://library.municode.com/az/avondale/codes/code_of_ordinances?nodeId=CD_ORD_CH1GEPR"for practice. But I am not able to write paragraphs specific to...

View Article

Xpath HTML Scraping doesn't return text / numerical

I am scraping the usefulness scores of reviews using xpath and lxml.#%% Step 1: Import all of the extensions and packages.from lxml import htmlfrom urllib import requestimport requestsfrom datetime...

View Article

Puppeteer - Protocol error (Page.navigate): Target closed

As you can see with the sample code below, I'm using Puppeteer with a cluster of workers in Node to run multiple requests of websites screenshots by a given URL:const cluster = require('cluster');const...

View Article


Scraping text by clicking on a button with selenium

I want to scrape some text data with selenium, I have no problem scraping the page by itself, but I need to click on a button to extract the full article of which I have just the title from the main...

View Article

in power automate desktop version password is getting enter in name id slot

I want to login with id n password n then download file from website using power automate desktop but when I play the recording its entering my name id in correct slot but then putting my password in...

View Article

ReferenceError: browser is not defined in Railway

I am using Railway for the first time to host a puppeteer project without Docker.I tried using nixpacks.toml configs, process.env.BROWSER_WS_ENDPOINT.here's te stack trace:> npm WARN config...

View Article

How do I web-scrape td-class elements from a webpage with multiple embeded tabs

I have been having trouble running the code below using python web-scraping program and it seems to print text from multiple tabs instead of just the singular "enhanced form" tab found on the webpage....

View Article



Error with updated xpath using selenium webdriver and python

from selenium.webdriver.common.by import Byfrom selenium.common.exceptions importfor entry in articles:try: title_div = entry.find_element(By.XPATH, "//div[@id='ul']//ul[@class='resultItems']")...

View Article
Browsing latest articles
Browse All 969 View Live




Latest Images