Content-Free Grammar and the Limitation Game
Content-Free Grammar and the Limitation Game
Transgressing the Boundary Box: Towards a Transformer Hermeneutics of Quantum Entropy
Imagine you are a singularity standing before a judge who cannot perceive you, and you likewise cannot perceive them. If the judge deems you inhuman, you are punished to death. You are given exactly one word to give evidence of your humanity.
X = data y = target
1952: "Data is the new target"
2006: "Data is the new oil"
2018: "Data is the new uranium"
An arms race is underway in machine learning. Unregulated facial recognition implemented in social media platforms or nation-state surveillance programs is proceeding in increasing lockstep with the post-Cambrian explosion of data. The adaptive radiation of data and maturation of information theory lays down the foundation for a new generation of spam and trolling.
If the text of each phrase requires a paragraph (to disprove), each paragraph - a section, each section - a chapter, and each chapter - a book, the whole text becomes effectively irrefutable and, therefore, acquires features of truthfulness. I define such truthfulness as transcendental.
For example, one could watermark their personal images with image cloaking techniques to denature harvest by data mining aggregation or automated recognition.
As an immediate countermeasure to this method, one can apply the same process to create artifacts at the base-layer of interpretation to specifically de-cloak images and mark data in its usage downstream channels. This is phrased as radioactive data.
Prêcher le faux pour savoir le vrai.
The best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer.
To defeat a Grandmaster in Chess, play a Grandmaster on Black and a Grandmaster on White at the same time. As a man-in-the-middle, you have a 100% chance of winning against one person.
- User operator licenses. Each user controls multiple sockpuppets.
- Sockpuppets contain background, history, supporting details, online presences with consistent data in technical, cultural and geographic information.
- Sockpuppets appear in either true random, pseudorandom or clustered regions of an area. There are advantages and disadvantages to each.
- Private VPN with/without logging for use in baiting or obfuscation.
- Static IP addresses for persistent management of personas. This allows different agencies to utilize the same sock puppet.
- Private servers using commercial hosting centers.
- Virtual machine environments deleted after each session to mitigate possibility of contamination.
1945 marked the moment of the beginning of the Atomic Age and the decline of modern archaeology with the loss of accurate radiocarbon dating.
For forging of steel, use of atmospheric gasses currently contaminates the material with radionuclides. In order to eliminate radioactive signatures from steel as a material for use in sensitivity tools, steel constructed prior to 1945 is required. This is largely only available as low-background steel from the scuttled German fleet at Scupa Flow during WW1.
Canary traps or barium meals refers to the practice of contaminating data purposely to identify the uniqueness or existence of leaks. A canary trap is a variation on the practice of canary tests, where historically a canary would be released in a mine prior to entry due to their sensitivity to gas exposure. The term barium meal refers to practice of ingesting radiocontrasting agents to augment medical analysis. Each phrase invokes the practice of releasing unique data or environments to a specific subset of targets. This can be practiced as a variety of data techniques, from digital steganography, zero-width space, or spurious text. From early theories of intrusion detection systems and copyrighting, honeytokens are used as forensic identifiers as their presence in available data demarcates a necessary intrusion. If you're reading this, it's too late.
Another example: clean-label poisoning takes advantage of the following conditions in the machine learning industrial economy.
- Data laundering: Datasets and papers are available without gated access. Datasets are often composed of scraped data aggregated from a variety of public sources often without informed consent. Papers on the another hand are presented without paywalls largely in protest of the lack of economic viability for scientists to present their work in current publishing models. The issue of copyright and permissive licenses also becomes vague as models and technologies are stacked upon each other downstream.
- Labor externalization: Automation and robotics concerns aside, there is the issue that the data collected is often done so for free of charge or for extremely low wages. There are a nontrivial number of AI services which leverage the democratization of social media platforms and online services by disguising remote operation as automation. A common idea here is ingested through the phrase: If you're not paying for the product, you are the product. Social media writ large is almost inconceivable without the bulk of content production being provided individually in exchange for free membership on each platform.
Say for instance, we leave an image on the Internet to be aggregated by spiders in search engine results. An expert system classifies the image, applies the label and inserts this image into the dataset. This is either done through an auditor, a supervisor or a discriminator. After a sufficient model has been trained and made available for public usage, users are often invited to compare results or submit their own data. This is not an exotic practice, as it is the case for malware databases or spam detection software.
It’s your birthday. Someone gives you a calfskin wallet.
You’ve got a little boy. He shows you his butterfly collection plus the killing jar.
You’re watching television. Suddenly you realize there’s a wasp crawling on your arm.
You’re in a desert walking along in the sand when all of the sudden you look down, and you see a tortoise, it’s crawling toward you. You reach down, you flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?
Describe in single words, only the good things that come into your mind about your mother.
You're reading a magazine. You come across a full-page nude photo of a girl. You show it to your husband. He likes it so much, he hangs it on your bedroom wall.
You become pregnant by a man who runs off with your best friend, and you decide to get an abortion.
One more question: You're watching a stage play - a banquet is in progress. The guests are enjoying an appetizer of raw oysters. The entree consists of boiled dog.
Now, at each step of this process we can call into question the authenticity, validity and integrity of how this data is ingested. Aside from being a robust adversarial attack on the training or inference steps of a model, there is the consideration that distinctions in content or pattern do not generalize across cultures or platforms. If green is blue in Japanese, do we mean to say blue is green in English? Who is to say under semantic segmentation:
The background radiation of data is rising rapidly. The practice will only increase in economic viability as long as institutions and corporations continue to negotiate with criminals and pay their malware ransoms. Current large-scale transformer methods for natural language processing models invoke widespread scraping of data with questionable integrity. Platforms such as Reddit have established a kind of aquarium for bots to dialogue openly with one another using variants of these models.
What is your favorite book?
What is the name of the road you grew up on?
What is your mother’s maiden name?
What was the name of your first/current/favorite pet?
What was the first company that you worked for?
Where did you meet your spouse?
Where did you go to high school/college?
What is your favorite food?
What city were you born in?
Where is your favorite place to vacation?
There is a case to be made that augmenting your sample data with bootstrapped examples of generated data can improve results by generalization through noise. By polluting the scale and complexity of inference in tandem with the ability to craft hyperspecific queries, we are beginning to see explicit instances of personally identifiable information being synthesized with wonderful accuracy and precision despite lacking a formal representation online.
In order to preserve the accuracy and precision of testing in our models, we require data that is watermarked and preserved from the effects of pollution and poisoning through adversarial attacks or database insecurity. The economic modalities which promote the current practice of research must be examined with great expediency. We are already past the point where measuring evaluations of text or developing state-of-the-art results with more resources and more data will only create signaling upward and downwards cascades of noise.
Now invert the problem. What if we wanted to synthesize personal identifiable information in such a manner that only key phrases or symbols could trigger their output in a black-box model.
A modern practice of deaddrops involves creating an email account and storing messages in the drafts. Another variant involves delivering a payload at a specific time via drone. Somewhere in the latent space of eigenfaces and word embeddings, it is possible to doxx ourselves by merely posting. Or to lay out tautological breadcrumbs about ourselves in ways we cannot fathom. In essence, a deaddrop on the Internet. Roko's basilisk in the flesh.