Data Mining

Introducing Octoparse 7.1 – Web scraping for dummies is official!

Throughout the years of working in the data industry, the Octoparse team have always maintained a steady pace in making data more accessible and readily available to all people. It’s rooted in our belief that in the era of big data, anyone should be blessed with the capability to collect data so as to harness the power of big data.

This November, the Octoparse team is releasing the new Version 7.1. And it includes one of the most revolutionary moves in years – Template Mode Scraping.

What makes Template Mode Scraping so special?

If you have ever wondered about the level of technical proficiencies required to build a web scraper? The answer is “None” with the newly launched Template Mode Scraping. More specifically, now there are about dozens of built-in templates within the program and all ready to be used to fetch data instantly, with nearly zero learning curve!

Many popular sites like Amazon, Indeed, Booking, Trip Advisors, Twitters, YouTube, YellowPage, Walmart, Zillow, Realtor and many more are covered at this moment. What’s more, if you feel that a website should be added to the list, then you simply need to get in touch with the Octaparse team. And they’ll consider creating a template for the site.

Who is this for?

Anyone! Yes, anyone who wants fast and easy data extraction. If they already have a template you need, that’s great! If not, let them know!

Template Mode Scraping can be especially valuable to anyone who needs to extract data from some of the most popular web pages and sites out there and maybe those that would prefer to skip the learning and does not require a high level of data customization.

How is it different from the old Wizard Mode Scraping

If you are not new to Octoparse, you may have already tried our old Wizard Mode Scrapers. In fact, the new Template Mode Scraping and Wizard Mode Scraping are completely different. The old Wizard Mode works for a few specific page structures while the Template Mode scrapers are pre-built scrapers that extract pre-defined data fields from specific websites. In contrary to the Wizard Mode which users are required to correctly identify the proper web page structure and tell Octoparse what data fields need to be captured, the Template Scrapers take over all the heavy lifting so all you have to do is tell Octoparse your search criteria. For example, restaurant in New York, then click “start” to get data.

How to use it?

  • Select “Task Templates” from the home screen
  • Pick a template
  • Check the pre-defined data fields and parameters
  • Select “Use Template”
  • Enter the variable for the parameters, such as “iPhone” for the search keyword
  • Save the template and run

And there are more upgrades…

Not to leave behind Octoparse’s commitment to large-scale scraping of even the most complex/difficult websites, the new release also includes features focusing on more efficient, effective and powerful data scraping.

  • Million-level URLs Input

Earlier, you could input only 20,000 URLs to any crawling task. Now, you can add up to 1 million URLs to any tasks. Better yet, import the list of URLs from local files (txt, csv or xls) or from another task directly. You can even associate two running tasks by having one extract the URLs and the second one to fetch additional data from each individual URL extracted. In short, you can now associate the two tasks directly without having to manually “transfer” the URLs from one task to another.

Moreover, the new URL Generator feature enables “generating” URL list based on specific patterns. A straightforward example will be one that only has the page number changes.

Possible user cases include:

  • Scraping from a large list of URL list
  • Scraping massive products from E-commercial sites. Getting product URLs and product details separately can greatly improve the efficiency and consistency of the scrapes, at the same time, also reduces the chance of getting blocked and missing data.
  • Scraping sites that blocks easily. Tasks running on a list of URLs can be assigned to run on various servers and thus better leverage IP resources to avoid getting banned.
  • Scraping from a large number of different pages from a particular website. Use the URL generator to quickly generate all the page URLs and scrape all the pages simultaneously. No need to go through the pages one by one.

Improved Dashboard

Compared to the Dashboard in version 7.0, the improved Dashboard layout is more informative, customizable and efficient.

The new version offers two kinds of dashboard layouts to choose from based on your preference (arrange tasks by date created or by task groups). You can also choose what task information you would like to see in the dashboard, including scraping status, time used, number of runs, next run (if scheduled), scraping completion time.

Upgraded Anti-blocking mechanism

  • Auto switch browser (User agent)
  • Auto clear cookies

Two more anti-scraping options have been added to help reduce the chance of getting blocked by scraping-sensitive websites. In version 7.1, Octoparse can automatically switch UA and clear cookies for you.

Need more details? Check out the official post What’s New in Octoparse 7.1.

The Next Step…

According to Octaparse, there are two things they aim to include in their products: ease of use and robustness. In this regard, the team promises to always work to provide the most accessible screen scraping experience. And of course, they would love to hear your feedback on the new features added.

To get 40% discount on Octoparse Black Friday Sale, click here.

1 Comment
  1. Rohit 3 weeks ago
    Reply

    Phenomenal merchandise from you, man. I have comprehend your stuff past to and you’re simply excessively fantastic. I extremely like what you have obtained here, surely like what you’re expressing and the manner by which you say it. You make it engaging regardless you care for to keep it brilliant. I cannot hold up to peruse substantially more from you. This is really a spectacular site.

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest