ScrapeBox Forum
Problem With Email Scraper Custom Crawler - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Problem With Email Scraper Custom Crawler (/Thread-problem-with-email-scraper-custom-crawler)



Problem With Email Scraper Custom Crawler - WebPlanDesign - 06-17-2020

So my issue is trying to get wild cards in there if possible.  Basically I have text like this
Code:
<div class="AAA" data-name="BBB">
    <div class="CCC">

        <h4 class="DDD-title">United States, NY</h4>
        <strong>New York<br>Brooklyn</strong>
        
        <p>
            EEE<br>

My issue is I need to get EEE scraped but I can't seem to figure how how.  Is there any way to do multiple markers?  I would like to do it like this somehow.

- Start it with class="AAA"
- Then Go to <p>
- Then end with <br>

Is there a way to just add in a wildcard to take care of all the inbetween text from AAA to the <p>


RE: Problem With Email Scraper Custom Crawler - loopline - 06-18-2020

Just put in AAA as before marker and </p> as after marker and let it scrape everything inbetween. You would have to post process to get rid of anything you do not want.

However there is no way to use multiple markers.

Else you could try and find different markers that did not get the extra text.

The only other way is regex, which may work better, although Im not a regex expert.


RE: Problem With Email Scraper Custom Crawler - hummersport - 03-03-2021

Hellow !

Please tell me:
    - how to limit the search for letters to one section, I'm interested in the section (for example, so that I could get all Email only from the "Jobs" section).
    - how you can receive Email from this section ("Jobs") and the URLs where they are located (for example, the "Email | URL" table or the "email, url" list).

As a result, I need to get a list or table in which there is an Email only from the "Jobs" section, and the URL where this email is located.