Adversarial Canarytext for Reverse-Engineering Machine Learning Profiles



Adversarial Canarytext for Reverse-Engineering Ad-Tracker Profile

Adversarial input is a technique in which we dissect production machine learning by inputting malicious text. For example, fuzzing the input with strings that have a high probability of causing issues can cause machine learning models to exhibit extreme behavior.

Canarytext is a type of fingerprinting. An analogy would be a dye pack for bank notes, which would mark the theft. An example would be the use of Morse Code through commas by Genius to mark Google's usage of their lyrics.

Now our goal is to reverse-engineer how machine learning models in production serve tailored ads based on their inference of personal buying patterns.

A well known style of strings that poses issues to render is Zalgo text. In short, Zalgo text renders unicode with a large layer of ascending and descending diacritical marks, exhibiting display behavior outside of convention. An interesting fact about Zalgo text is that Google translate detects its as Vietnamese. This is a key mechanism which we will exploit.

Using diacritical marks we construct a unique identifier that has no prior search results on Google. Inputting this unique identifier onto a Twitter account as the profile name yields an interesting effect.

Google Translate automatically infers what language and ethnicity you are on Twitter based on your profile name. The next step is to include a gender-ambiguous (ie adversarial attack on binary classification for gender) profile picture which subjectively exhibits results for women and men. This is achieved by interpolating faces that do not pre-exist on the image search engines.

What happens is that content posted to your Twitter is automatically scraped by Pinterest and parsed into categories of marketing profiles. However, it relies on the ethnicity/gender profile constructed by Google Translate and facial recognition software to finetune their results. Recall, we input a Vietnamese unique identifier into the profile without pre-existing search results. The result is then for that unique identifier, it will have several international Pinterest results, typically first page results. These Pinterest boards now serve as automatic assets for suggesting a content profile of interests for this particular unique identifier. The input being Vietnamese, for symmetry we chose an Asian profile picture.

Now what occurs is that Twitter is scraped by Pinterest using Google to construct the persona. And by strategically locating this data on various top-level domains such as .ru or .jp, we effectively have the instance of a fragmented, decentralized ad persona.

Using this, Facebook and Instagram is able to search Google and Pinterest to construct an ad profile of visual content for their targeted marketing. If it detects that you are a Vietnamese woman, for instance, you will be served aspirational marketing with Vietnamese women as the protagonist exclusively in the Vietnamese language.

This Facebook / Instagram profile can be otherwise left blank, the only necessity is that the username must match the username on Twitter, and for search results on Google.

In 1950, A. Turing published COMPUTING MACHINERY AND INTELLIGENCE in a journal for psychology and philosophy to describe The Imitation Game.

We now ask the question, ‘What will happen when a machine takes the part of A in this game?’Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?

Gender is at the heart of the Imitation Game, and is the critical factor in which we can assess the validity of our results.

Gender is often naïvely delineated into two categories, Male and Female. Usually represented as a bit in binary data, 0 or 1. Now, there is a great deal of machine learning that achieves remarkable results through methods of linear and logistic regression on binary data. In this case, we have that gender profiling is an inherently flawed measurement which will we exploit using adversarial examples. For instance, under many computer vision and facial recognition algorithms, we have that gender is presumed on a threshold of likelihood, where ambiguous examples (such as androgynous faces) can pose difficult in accurate classification. Moreover, we can extrapolate examples from a latent space of discriminator results from a generative adversarial network, specifically synthesized to find exhibit extreme behavior in our classification.

This is yet another example of fingerprinting.

So by selectively disseminating across multiple accounts using our unique identifiers and adversarial input for facial recognition, we can create sterile examples of how social media is used to synthesize input for machine learning and advertising inferences. This method is highly replicable, and canarytext profiles can be manipulated thereon to exhibit other behavior such as change in language, change in ethnicity or change in gender.

For an average Twitter account, enabling an adblocker while navigating the timeline can yield anywhere from roughly 22 - 350+ ad trackers blocked actively.

This image illustrates all connections with adblocking disabled, where nodes represent individual web applications and edges represent an active connection.