Brave Search: Industry Revolution or Ad Bar in a Face Mask?

Wednesday 21 July 2021
Bob Leggitt
"Depending on what you search for, you may in fact be getting 100% of the results sourced from Google."
Money signs
Photo by Elena Mozhvilo on Unsplash (image modified)

It's been heralded as game-changing in that it headlined its post-intro fanfare with a non-Google, non-Microsoft search index. But now that it's made its beta version public, does Brave Search look like the wise choice it promised to be in the pre-launch posturing? Is this a revolution in web search, or is it basically a 2008 ad bar in a face mask?

One of the problems with the discourse about privacy is that we can get so focused on who is or isn't getting their hands on our data, that we lose sight of the bigger issue. Namely, the corruption of information integrity that advertising companies have a lucrative incentive to engineer. And one of the problems with Brave is that however much it screams the word “privacy” into our faces, it's still an advertising company, whose primary goal is to show us ads. Just like Google. Just like Facebook. The methods and data-gobbling capacity may be different, but the funding still comes from people whose only concern is that we buy their shit.

So there are really three questions hovering over Brave Search.

1. Is it what we thought it would be?

2. Given that user privacy and online commerce roundly detest each other, does it really offer good privacy?

3. How much does the company's advertising focus interfere with the integrity of the results?

The impact of the latter question can be a lot bigger than the other two. Search routines that skimp on organic results while populating the service too heavily with ads, risk dropping into the unsavoury zone of adware. And it's a fine line to tread. Especially for a new search engine using its own, limited size index. We'll explore all of this, but first, does the backbone of Brave Search look the way its pre-launch fanfare portrayed it?…

PROMISE FULFILLED?

Brave Search's stated mission was to serve as a completely independent, privacy-focused search engine, using its own index. In my speculative, pre-launch post, I said that Brave's in-house index - acquired via Tailcat - would not be able to compete with Google for non-populist searches. Brave is now freely admitting that. But the concerted effort to improve results through human input, as announced in the pre-launch posturing, seems to have taken a back seat in favour of: “Let's just sub in the lacking results from Google”. That's a considerably different plan from the one announced.

Brave has claimed that it only subs in Google results if users select the so-called Google Fallback Mixing. From the Privacy Notice

"As a new search engine and as the ranking index and algorithms for Brave Search are refined it's possible that a search may not return the results you were expecting. In this case, you can choose to enable Google Fallback Mixing. If you choose this, Brave will anonymously check Google search and mix the results in the Brave browser and add the clicked result to Brave's search index."

However, this doesn't apply if you're using search.brave.com within a browser other than Brave. In that context I was served third party (i.e. Google-sourced) search results without touching any settings. What's more, Brave Image Search reported all of the results as being third-party sourced.

Brave asserts that third parties are queried anonymously, but then, DuckDuckGo, Startpage and Qwant make the same assertion, so if you don't trust them, you really have no more reason to trust Brave at the present time.

I know this is beta, and that things have to start somewhere, but even in intent, Brave Search looks to have veered away from the message sent out in the pre-launch posturing. It queries Google, and you can't, as far as I can see, reliably stop it from querying Google unless you use it within Brave Browser - which I don't intend to do.

So this is not the truly independent alternative to Big Tech we were led to believe it would be. Depending on what you search for, you may in fact be getting 100% of the results sourced from Google.

As with all the other search engines that serve third party results, the specifics of Brave's deal with Google are not disclosed. We don't know what Google gets out of it, but we do know it's getting something, and we do know that Google's vice is data. We can be 100% sure that Google the data company is not idly reclining in charitable resignation while Brave exploits its search results for profit.

BUT THIS IS ONLY BETA, SO BRAVE WILL SOON STOP INTEGRATING GOOGLE RESULTS, RIGHT?

I very much doubt that. The burden I mentioned in my speculative post - of search engines having to take responsibility for the removal of damaging or violating results - is unlikely to be adopted at scale by Brave for a very long time. It's just too labour-intensive. That means, particularly in areas where takedown requests are rife, there's an incentive to farm in the results from Google.

That way, come any problems, Brave can just say: “Well, the particular result you object to didn't actually come from us. So pop along and speak to Google about that…” Exactly what DuckDuckGo does, except it refers to Microsoft rather than Google. And I sense that Brave is gonna want that get-out-of-jail card on hand for a long time to come.

It sometimes seems that Brave's own index isn't keen on areas of the Web with a high likelihood of copyright breaches or other violations. Indeed, as I said, it appears the whole of Image Search is currently third-party sourced, and that conveniently means Brave does not have to respond to media takedown orders.

Brave Search attaches a percentage to each search, which reveals to the user the proportion of results from Brave's own index, versus the proportion from third parties. And in my tests, for subdomain-specific searches relating to UGC sites like WordPress, Tumblr and Blogspot, the percentage of third-party search results usually ran in excess of 70% - sometimes over 85%. But this could just be the fact that subdomain pages are typically buried deeper on the Web, and so would less likely be picked up by an indie crawler anyway.

I do think high-volume removal notices are something Brave knows it absolutely MUST avoid at present. But whether the brand is deliberately mitigating against those notices by subbing in with Google whether it needs to or not, or is just currently enjoying a happy natural consequence of having a small index, I can't say.

QUALITY AND VOLUME OF BRAVE'S OWN RESULTS?

Donald Trump's Twitter is still on the results page for the single-word term “Twitter”.

Obviously, finding a six-months suspended, political Twitter account among what should be the flagship results for a leading, universally-recognised website, doesn't inspire much confidence in Brave's crawl regularity or impartiality. But actually, the general quality of Brave results in most searches looks acceptable, if almost totally devoid of anything at all from smaller sites.

Incidentally, if you're wondering why I said “the results page”, as opposed to “page one of the results”, that's because there is, as far as I can establish, only one results page for any given search. I suspect the reason for this is that Brave wants its own index to appear, in percentage terms, as the dominant results provider.

If every search only has one page to fill, Brave can dominate the results with a small index. Extend that up to 100 pages of results per search, however, and inevitably, 99%+ of those results will have to come through third parties. Limiting search results to a single page is a clever way to make Brave's index look a lot bigger than it really is.

The limited size of Brave's index is not at all unexpected. The tragedy is that there's no sign of a system that will exponentially expand it, or dismantle Google's stifling stranglehold over online creativity. As Brave has now clarified, the plan is to serve Google results where Brave's current resources are insufficient, monitor which ones get clicked, and add those to the Brave index. Lame, deeply restrictive, and painfully slow. And what happens when those results suddenly lose relevance? I don't see any contingency for dumping off "dead" results, and the Trump Twitter inclusion suggests there isn't one.

Brave's current approach is to try to be Google. Not reshape the search landscape. Assuming Google will serve Brave its top results, if Brave only delivers one page then nothing that doesn't appear on Page One of Google can be added to Brave. So this is really just cherry-picking Google's top SERP - which we know is as corrupt as hell and packed full of publishers who've blagged, bribed or bought their way into the most visible positions with backlink deals. This is no alternative to Big Tech. It IS Big Tech.

PRIVACY

Brave Search does raise some privacy concerns.

For one, your search queries appear in the URL of the Brave Search results page. This practice leaves potentially sensitive data in Brave's analytics (and let's not enter the fairyland of pretending they don't log visited URLs). [UPDATE: Note also in relation to this, that Brave Search has been found to be running on an Amazon server - which would mean Amazon could collect all the search data too. Very considerably worse than Google collecting it in my opinion.]

On search.brave.com, Telemetry and location tracking are set to ON by default. On a supposedly privacy-focused search engine, why are these obvious privacy invasions set up as opt out rather than opt in?

As documented by Brave, Telemetry (described as “Usage Metrics”) includes:

  • Number of weekly/monthly visits [How Brave would consider itself not to be tagging and tracking users in gathering this data I have no idea].
  • Number of returning visits [Again, obvious tagging/tracking].
  • Number of search queries per day [Aaaand, more tagging/tracking].
  • Average query length.
  • Whether you actually click the search results.
  • Whether you left feedback.
  • Which operating system and browser you use.

I repeat, this is ON BY DEFAULT at search.brave.com.

Brave goes on to claim, about the data it collects: "It will never identify you or the machine you’ve accessed from."

I mean, in all seriousness, tell me how you track a returning visit without identifying a specific user? It's just not possible. I get weary from trying to find the words to express the farce of these plain-sight Eth Tech contradictions. It's like: "Here's how we track you… But remember, we don't track you."

In the light of such contradictions, we would have to be very stupid to trust anything an Eth Tech company said about privacy. And it is Eth Tech in general - not just Brave. Provided the products can be used via Tor - as search.brave.com currently can - they're not getting our fingerprint [UPDATE: This may not be true. DuckDuckGo is involved in running Tor, and other Eth Tech brands may be too]. But that shouldn't be the point with a company whose marketing continually screeches "PRIVACY!"

ADVERTISING

So far, we have a search engine displaying very limited organic results, and that's dangerous ground if Brave is going to start pushing ads into the mix. I haven't seen Brave Search identify an ad, but there are already some distortions to the expected organic results, which are extremely hard to explain by anything other than paid insertion.

This magnifies the problem considerably. Very limited result volume, and we don't know how much of it is paid promotion. We're staring at a Welcome to Adware signpost, and not really sure whether we've actually crossed the border. Let me give you an example of what I consider to be unmarked advertising…

Brave Search letter T

Have a look at the above set of search suggestions. I've only typed in the letter “t”, and with all due respect to the clothing retailer topping the suggestions, it's clearly not there on organic merit. It's the 3,367th most popular website, whereas Twitch is 36th and Twitter is 42nd. Which suggests, as far as I'm concerned, that the top suggestion is a paid insertion. An ad. And this was far from the only instance of a top result looking suspiciously non-organic. Some plates of search suggestions looked to have multiple distortions of priority in them. The compilation below shows a couple more examples…

Brave Search suggestions

I mean, whether or not these top suggestions actually are paid ads, they blatantly don't represent a logical, information-based response, do they? They represent the kind of response you'd get from some Zango-style search bar from the 2000s. A perfect illustration of what I meant by the term "corruption of information integrity" in relation to advertising companies.

The greater issue is, if the search suggestions contain unmarked ads, how many of the actual results are unmarked ads?

In order for this search engine to have any integrity whatsoever, we need to be told what's a paid promotion and what isn't. Indeed, we need a lot more information all round. And if the above items are paid ads, it's worrying that we wouldn't be told - not just because of the dire ethics of covert advertising, but because it raises the suspicion that Brave is so riddled with paid insertions that it's embarrassed to identify them. It makes us wonder if this is a piece of adware, because that's exactly what adware search toolbars do. Pump a very limited results pool full of ads, and pretend it's all organic.

CONCLUSION

Ignoring the search suggestions and focusing purely on the actual results, I don't believe Brave Search currently fits the description of adware. And we have to remind ourselves that it's still in beta. But that's not a reason for us to say “Everything's fine”, when there are some pretty significant concerns.

My greatest disappointment is that Brave has chosen to take the easy route and do what the rest of the “private search engines” do: cement a deal with a data company and then make like that data company somehow doesn't want any data. DATA COMPANIES WANT DATA. It's in their DNA. And as is evident in the usage metrics list, Brave itself wants data too. In fact I'd go as far as to say that it's desperate for data.

As long as Brave Search will work via Tor and it continues to serve useful results that match the queries, it will be an option for me - same as DDG, Startpage and Qwant. It's not, at the time of writing, an ad bar in a face mask, but sadly, it looks more like that than it looks like a search industry revolution. And given Brave's heavy money focus, I can't dispel the feeling that it could start to look more like an ad bar in a face mask as time goes on.

[UPDATE 2 August 2021: Brave Search is also run on an Amazon Cloudfront server. See its Traceroute here.]