Active questions tagged web-scraping

Channel: Active questions tagged web-scraping - Stack Overflow

Web scraping of the disciplinary assignment of university courses

April 23, 2024, 2:13 pm

I would like to scrape a university-course catalog with R. My code is already quite good, but the assignment of courses to disciplines and subdisciplines does not yet work the way I want it to.This is...

View Article

Web Scraping with Selenium and Python

April 24, 2024, 10:04 pm

I have been struggling with web scraping by using Selenium. The target website contains a responsive table in which I will have to gather the data. the html codes look like this (please forgive me for...

View Article

How to bypass cloudflare verification in scrapy-selenium?

April 25, 2024, 4:33 am

I try to scrap professionnal numbers from a french website but I get 403 error and I get blocked by Clouflares. I use Selenium and Scrapy. I added the scrapy cloudflares middleware but it still does...

View Article

How to disable autocorrect in google search python API

April 25, 2024, 5:01 am

I am trying to search certain prompts on google using its python library. However, I am getting a response after autocorrect.Is there any way to disable autocorrect while using google search API like...

View Article

Is there a way to see if website is static or dynamic programatically [closed]

April 25, 2024, 5:12 am

I'm currently deep in the development of a scraping API, and just yesterday, I encountered a rather perplexing dilemma. As I delved into the intricacies of Puppeteer, I noticed that certain requests...

View Article

Image may be NSFW.
Clik here to view.

More elements with the same id, how to scrape

April 25, 2024, 5:58 am

I'm scraping for an uni project and I wanted to scrape the genres of tvshows from IMbd. In the image you can find the html. I used the following code:url1 =...

View Article

Puppeteer hangs on waitForSelector on production (google maps)

April 25, 2024, 6:05 am

I'm using Pupeteer to pull data from google maps.it's working perfectly on my local machine,however after pulling to production I ended up on the following errorError during scraping: ProtocolError:...

View Article

Pyton request-html is not downloading Chromium

April 25, 2024, 6:48 am

import requestsfrom bs4 import BeautifulSoupfrom requests_html import HTMLSessionurl="https://dmarket.com/ingame-items/item-list/csgo-skins?title=recoil%20case"sesion = HTMLSession()response =...

View Article

Unable to locate element when the webpage navigate from one section to...

April 25, 2024, 8:40 am

<ul><li class="first current"><a class="jump"> Crop</a></li><li class=""><a class="jump"> Soil</a></li><li class=""><a class="jump">...

View Article

Amazon Product API: Obtaining PPC Keywords by ASIN

April 25, 2024, 11:08 am

I am trying to find PPC keywords by ASIN number. There are some tools do this, and when I tried this tool, It works for every ASIN. These tools give some details like below; PPC keywordHistory highest...

View Article

Updating foreign keys MySQL

April 25, 2024, 1:41 pm

I currently am working on a web scraper that takes premier league data and imports it into a mySQL database, but I am having issues updating foreign keys in some of the tables. For some context, I have...

View Article

How can I fix this problem on Instagram api app?

April 25, 2024, 2:31 pm

I'm trying to make an Instagram scraperimport httpxurl = "https://i.instagram.com/api/v1/users/search/?timezone_offset=10800&q=cris&count=50"headers = {"X-Pigeon-Session-Id":...

View Article

pyinstaller ModuleNotFoundError: 'fake_useragent.data'

April 25, 2024, 7:12 pm

above all, thanks for coming.I'm doing web-scrapping and making it into .exe filecodes works well but not in .exe filewhat I want : working exe file, avoid bot detectionwhat I'm doing :I'm making...

View Article

Trying to extract article text from a list of urls

April 25, 2024, 11:10 pm

enter image description hereimport requestsfrom bs4 import BeautifulSoupurls = df.loc[:,"news_url"].tolist()for url_index in len(urls): req = requests.get(urls[url_index]) soup =...

View Article

Image may be NSFW.
Clik here to view.

Puppeteer: cannot get response from popup

April 25, 2024, 11:32 pm

I'm using puppeteer to scrape PDFs from a website.When i find a certain selector i have to .click() on it so that the PDF is somehow generated and displayed on a popup window.The first problem is that...

View Article

How to send a Post requests with headers and payload in scrapy

April 26, 2024, 1:32 am

I am trying to send a post requests to Graph API and I am succeed in it but I want to send the same requests in scrapy but I don't know how to send a post requests in scrapy with headers and...

View Article

How to use Selenium for Opta data?

April 26, 2024, 2:33 am

I'm trying to get Opta event data from a football (soccer) match. It has a link address but when trying to access it I just get a 10300 Error code.The only place I can view the data is under 'Sources'...

View Article

Scrape google knowledge graph with Rselenium

April 26, 2024, 2:40 am

I am trying to access the elements on the rhs of the google search, sometimes called the knowledge graph. In particular I am interested in the short bio (normally a wikipedia snippet) and the external...

View Article

Avoid CHALLENGE Url from LinkedIn Voyager API when using Google Cloud Run...

April 26, 2024, 4:59 am

ContextFor a University term project, I want users to be able to pass in a url to their Linkedin profile and then have my application retrieve all the data on their profile which is then used later in...

View Article

How to Read Large Files from Servers in Chunks Using Python?

April 26, 2024, 6:14 am

I am working on a hobby project and trying to develop a GPS/GNSS Receiver. The first module of the receiver has two different options i.e. ask the user to connect an SDR or if the user doesn't have an...

View Article

Scrapy web without nodes / href attributes

April 26, 2024, 6:15 am

Trust you are doing well!I need your support please, I'm trying to scrape this web page: https://servicio.mapa.gob.es/regfiweb#Once you enter, you must go to:Buscadores.Productos.I'd like to download...

View Article

while scarping the data from the website , every listing have a button and...

April 26, 2024, 7:56 am

popup open but not closed, please help me to close the popup. only issue is coming here popup not closed.color="accent" class="mat-focus-indicator mat-button mat-button-base mat-accent">... is not...

View Article

How to write heading along with paragraph after scrapping a website when all...

April 26, 2024, 8:11 am

I am trying to scrape this link using selenium -"https://library.municode.com/az/avondale/codes/code_of_ordinances?nodeId=CD_ORD_CH1GEPR"for practice. But I am not able to write paragraphs specific to...

View Article

Xpath HTML Scraping doesn't return text / numerical

April 26, 2024, 8:31 am

I am scraping the usefulness scores of reviews using xpath and lxml.#%% Step 1: Import all of the extensions and packages.from lxml import htmlfrom urllib import requestimport requestsfrom datetime...

View Article

Puppeteer - Protocol error (Page.navigate): Target closed

April 26, 2024, 8:48 am

As you can see with the sample code below, I'm using Puppeteer with a cluster of workers in Node to run multiple requests of websites screenshots by a given URL:const cluster = require('cluster');const...

View Article

Scraping text by clicking on a button with selenium

April 26, 2024, 9:11 am

I want to scrape some text data with selenium, I have no problem scraping the page by itself, but I need to click on a button to extract the full article of which I have just the title from the main...

View Article

in power automate desktop version password is getting enter in name id slot

April 26, 2024, 9:21 am

I want to login with id n password n then download file from website using power automate desktop but when I play the recording its entering my name id in correct slot but then putting my password in...

View Article

ReferenceError: browser is not defined in Railway

April 26, 2024, 9:30 am

I am using Railway for the first time to host a puppeteer project without Docker.I tried using nixpacks.toml configs, process.env.BROWSER_WS_ENDPOINT.here's te stack trace:> npm WARN config...

View Article

How do I web-scrape td-class elements from a webpage with multiple embeded tabs

April 26, 2024, 11:11 am

I have been having trouble running the code below using python web-scraping program and it seems to print text from multiple tabs instead of just the singular "enhanced form" tab found on the webpage....

View Article

Error with updated xpath using selenium webdriver and python

April 26, 2024, 11:25 am

from selenium.webdriver.common.by import Byfrom selenium.common.exceptions importfor entry in articles:try: title_div = entry.find_element(By.XPATH, "//div[@id='ul']//ul[@class='resultItems']")...

View Article

More Pages to Explore .....

Latest Images