Or Lenchner, CEO of Bright Data

Or Lenchner, CEO of Bright Data, has led the market-leading web data collection platform since 2018, driving its expansion, innovation, and growth to over USD 100 million in annual revenue. Bright Data enables Fortune 500 corporations, leading businesses, renowned universities, and public sector entities to access public web data in real-time and at scale. Lenchner is a strong advocate for keeping public web data open and accessible, emphasizing its critical role in driving innovation.

What inspired your journey into the world of data and AI, and since becoming CEO in 2018, how have you shaped Bright Data’s mission and vision?

I’ve always been fascinated by the power of data, particularly with how it can drive decisions and fuel innovation. When used right, data can also drive transparency in business. Becoming CEO of Bright Data in 2018 gave me an opportunity to help shape how AI researchers and businesses go about sourcing and utilizing public web data.

What are the key challenges AI teams face in sourcing large-scale public web data, and how does Bright Data address them?

Scalability remains one of the biggest challenges for AI teams. Since AI models require massive amounts of data, efficient collection is no small task. And since AI models are only as good as the data they are trained on, ensuring teams have access to fresh, high-quality data is a constant challenge. This is especially true as the web evolves in real time.

Another major concern is compliance. Data privacy laws and requirements continuously evolve, so AI teams need to always be aware of those changes. They also have to understand how to deal with websites that enforce anti-bot mechanisms, which can complicate the data gathering process.

The platform that we’ve built at Bright Data takes care of these challenges. We provide scalable, automated data collection that delivers structured real-time data. Our AI-driven tools clean and validate data to ensure accuracy. We have strict measures in place to ensure legal and ethical data collection for compliance. The idea is to empower AI teams to focus on building great models, while we handle the complexities of data sourcing.

How does high-quality web data contribute to AI model performance, and what are the best practices for ensuring data accuracy?

High-quality data means data that is complete, free from biases, and most importantly, accurate. If data is lacking or mired in inconsistencies and mistakes, the resulting AI model won’t perform according to expectations.

To achieve accuracy, it’s best to source data from a variety of public sources that have established reliability. Using only a few, or worse, a single data source, results in problems such as incompleteness. Having multiple sources provides the ability to cross-reference data and build a more balanced and well-represented dataset. Additionally, organizations should consider automated data validation and cleansing, to efficiently get rid of erroneous and inconsistent data.

At Bright Data, we take all of these factors into account. We provide AI teams with structured and real-time data that has been validated for accuracy. That way, they can train models with confidence.

What are the biggest ethical concerns in public web data collection today?

Privacy remains to be one of the biggest concerns in public web data collection. People worry about their data getting exposed to abuse and misuse. To make sure that data remains private, it is vital to emphasize transparency. Organizations that accumulate data must be upfront regarding the data they collect. It is important to assure the public that their data is used under strict ethical guidelines.

One other major concern is monopolization. Certain large companies have control over a vast amount of data, which creates an uneven playing field wherein only a select few have access to information necessary to train AI models and drive innovation. This is not how things should be. Public web data should remain accessible to businesses, researchers, and developers. That way, AI development is not concentrated in the hands of just a few major players.

Ethics are not an afterthought at Bright Data. They’re embedded into every decision we make. We do not just follow industry standards – we set them. We lead in the data collection industry in defining the right ethical standards. We want to ensure that public web data is accessed responsibly, transparently, and in full compliance with global regulations.

How does Bright Data ensure compliance with global data privacy regulations while still enabling large-scale data collection?

Our organization is committed to adhering to global legal and regulatory requirements on data gathering and utilization. We see to it that we comply with the requirements of GDPR, CPRA, CCPA, and other relevant regulations. Importantly, we strictly follow Know Your Customer (KYC) protocols to ensure that only legitimate users get to access our platform. Our data solutions may only be accessed by legitimate businesses and researchers.

Our Acceptable Use Policy is also clear in defining what data can and cannot be collected. This includes responsible use. We have a dedicated compliance team responsible for the continuous monitoring of regulations to ascertain that we are up to date with the latest legal and regulatory requirements.

Regardless, we still believe that public web data should remain accessible. Our goal is to provide AI teams with the data they need while ensuring compliance with privacy and legal standards.

How do you balance business growth with maintaining ethical data collection practices?

We always think of ethics and growth as not mutually exclusive. The trust of our customers and the relationship we build with them are paramount concerns. We understand that we may only achieve long-term success if we collect data under transparent terms and in accordance with applicable laws.

Thus, we put in place a strict vetting protocol for our users. This is designed to ensure that the data we collect is used ethically. We allocate time, effort, and resources towards compliance and security to protect our customers and the public in general. By observing ethical data collection, we succeed business-wise while contributing to the establishment of a transparent and responsible AI ecosystem.

How does Bright Data stay ahead of regulatory changes in data privacy?

We understand that our data use processes and policies inevitably have to change to reflect changes in relevant laws and regulations. As such, we regularly consult legal experts and communicate with regulatory bodies. We also engage in discussions with legislators and others involved in policy building, providing input in the crafting of meaningful data regulations. We aim to strike a balance between innovation and data privacy.

Our data collection and use framework evolves as new laws are issued and regulations revised. We have a compliance team that proactively updates our data use policies to make sure that our platform is always fully compliant. Moreover, we operate customer education initiatives to promote ethical data use.

What are the emerging trends in AI data collection that companies should be aware of?

Real-time data collection is becoming a must for today’s AI models. It is crucial for them to access the latest or freshest data to deliver a high level of accuracy and provide better user experiences.

Another notable trend is the reliance on synthetic data used for data augmentation, wherein AI generates data that supplements datasets gathered from real-world scenarios.

I’m also seeing strong interest in pursuing explainable AI. Most of the AI models at present suffer from the black box effect, or a lack of transparency in their decision making processes. Companies are seeking to change this paradigm by creating AI models that can detail how they arrived at the outputs or decisions they make.

Lastly, companies are aware of growing data privacy concerns. That’s why AI techniques aimed at preserving data privacy, such as federated learning, are becoming in-demand. Organizations want to maximize AI model training without any user data privacy compromises.

We make sure we are on top of these trends, so we can build solutions that allow AI teams to keep a competitive edge.

How do you see AI-powered agents and automation changing the data collection landscape?

Currently, AI models make use of structured datasets that are mostly collected manually. These datasets also go through preprocessing, cleansing, and other procedures that usually involve human intervention. This is set to change in the near future with the rise of AI agents for autonomous collection and processing of data for AI training. They make it possible to automatically learn from real-time web data at an unprecedented scale.

We have created infrastructure that supports the deployment and evolution of AI agents, enabling smooth access to high-quality, real-time data on the web. This technology allows sophisticated AI systems to continuously interface with dynamic web data, learn from it, and grow bigger and better.

AI agents can transform industries as they allow AI systems to access and learn from constantly changing datasets on the web instead of relying on static and manually processed data. This can lead to banking or cybersecurity AI chatbots, for example, that are capable of coming up with decisions that reflect the most recent realities. This results in massive efficiency advances and more areas for automation.

At Bright Data, we are not only enabling this transformation in the data collection landscape. We believe we are at the forefront, introducing a technology that ushers the next generation of artificial intelligence. We are excited to assist businesses and AI teams as they harness the full potential of AI agents for their operations.

Thank you for the great interview, readers who wish to learn more should visit Bright Data.