Web scraping is the process of using bots to extract information from a website. In recent years, the web scraping debate has grown more complex as business intelligence and data privacy issues arise.
The practice of web scraping has lasted almost as long as there have been websites. To be fair, there is “good” web scraping that, in fact, is a fundamental foundation of the internet. Here are some examples of practicing “good” web scraping:
- The “good” search engine crawlers crawl websites to index, analyze and rank their content
- Price comparison sites deploy bots to automatically research product prices and descriptions for allied seller websites, allowing consumers to compare prices of goods and services and make more informed purchasing choices
- Market research companies use web scrapers to mine data from forums and social media to gauge public opinion (i.e., report on “what’s trending”).
This, however, is where the good part of the web scraping story ends. Bad bots, which according to Imperva Bad Bots Report 2022 accounted for 27.7% of all web, mobile and API traffic, an increase of 2.1% over the previous year, retrieve content from a website with the intention of use it for purposes beyond the control of the site owner. Apart from web scraping, cyber criminals use bad bots to carry out various harmful activities including denial of service attacks, competitive data mining, online fraud, account takeover, data theft, intellectual property, unauthorized vulnerability scans, spam and digital ad fraud. .
The two main ways malicious actors use web scraping maliciously are lowering prices to gain an unfair competitive advantage and stealing copyrighted content and intellectual property. The question remains, is it illegal?
The case of LinkedIn and hiQ Labs
In the summer of 2017, LinkedIn sued San Francisco-based startup hiQ Labs. hiQ scrapes publicly available LinkedIn profiles to offer clients, according to its website, “a crystal ball that helps you identify skill gaps or turnover risks months in advance.”
The idea that your public LinkedIn profile could be used against you by your employer is quite troubling. However, on August 14, 2017, a judge ruled that everything was fine. Judge Edward Chen of the U.S. District Court in San Francisco accepted hiQ’s claim in a lawsuit that Microsoft-owned LinkedIn violated antitrust laws by blocking the startup from accessing that data. He ordered LinkedIn to remove the barriers within 24 hours. LinkedIn appealed.
The decision goes against previous court rulings that suggested cracking down on web scraping. And it raises myriad questions about the privacy of social media users and the right of businesses to protect themselves against data breaches. There is also the issue of fairness. LinkedIn has spent years creating something of real value. Why should he have to hand it over to hiQ – paying for servers and bandwidth to host all that bot traffic on top of their own human users, just so hiQ can surf LinkedIn?
The final word has yet to be spoken in the legal battle between LinkedIn and hiQ Labs, which describes itself as a “data science company, informed by public data sources, applied to human capital.” LinkedIn attempts to prevent hiQ from removing personal information from users’ public profiles. After the Ninth Circuit Court of Appeals’ ruling in favor of allowing bots to scrape publicly available content, LinkedIn filed its motion seeking Supreme Court review in March 2020. Indeed, in June 2021 , the Supreme Court gave LinkedIn another chance to stop hiQ. The Supreme Court, however, said it would not take up the case. Instead, he ordered the appeals court to rehear the case in light of his recent ruling, which found a person cannot violate the Computer Fraud and Abuse Act. (CFAA) if it improperly accesses data on a computer it is authorized to use.2 That’s not the only legal battle LinkedIn is currently waging; in February 2022, LinkedIn filed a lawsuit against Singapore-based data scraper group Mantheos Pte. Ltd., Jeremiah Tang, Yuxi Chew and Stan Kosyakov. The complaint alleges that they illegally profit from scraping data from LinkedIn’s website, in violation of its terms of service and to the detriment of its users. The case continues.
What’s the verdict on web scraping?
As discussed here, the legality of web scraping is unsettled as website owners continue to pursue legal actions to prevent their sites from being scraped. As the courts attempt to further determine the legality of web scraping, you could likely have your data stolen and your website’s business logic abused. Instead of seeking legal remedies to overcome this technological challenge, consider solving it with advanced bot protection and anti-scraping technology today.
Imperva Advanced Bot Protection protects your websites, mobile apps, and APIs from automated attacks without affecting the flow of business-critical traffic. Learn more.
The post Is it illegal to scrape a website for content? appeared first on Blog.
*** This is a syndicated blog from the Security Bloggers Blog Network written by Bruce Lynch. Read the original post at: https://www.imperva.com/blog/is-it-illegal-to-scrape-a-website-for-content/