Plato’s cave – knowledge represented

These last months, the concept of Plato’s cave seems to be having a revival. In the story, prisoners are chained in a cave, seeing only shadows on a wall and believing those shadows are the whole world. When one prisoner escapes and sees the real world outside, he realizes the shadows were mere representations of reality. The story is part of the conversations in The Republic in which Plato explores what justice is and what an ideal society should look like. It is a seminal work for many disciplines, including epistemology (the knowledge of what can be known), ontology (what things exist) and ethics (what good is). In that sense, it is not surprising that Plato’s cave has become popular again, in this time, in which these branches of knowledge are challenged by changing technology.

One of the places I encountered Plato’s cave again, was in the very valuable podcast Knowledge Graph Insights by Larry Swanson. He was interviewing Torrey Podmajersky who told that Plato’s cave was her bedtime story: ‘Geez, dad, just fine. Plato in the cave. We don’t really know anything. All we have is facsimiles and representations of meaning and representations of reality, and through that we construct meaning. And I feel like that’s all we’re ever doing is using language to construct meaning based on our inability to fully perceive reality’.

Now, for many years, centuries even – in the West we have used text to build up logical arguments – presenting our facts in ways that they support theories and using the scientific method to test these buildings of logic for sturdity. We have retained what stood these tests and discarded what did not (as with Lombroso’s theory that head size was related to criminality). These arguments were contained in books and journals. And libraries created metadata to make these titles accessible. The high quality long form content contained in libraries was on of the first places Google used to enhance their search engine, as Ben Lewis showed in the documentary Google and The World Brain (2013).

In the meantime, publication was democratized by the web, and at this point it seems that Googling is giving way to Chatting. Large Language models mimic human use of language, though they only do so on the basis of probabilistic statistics (as I reason elsewhere).

So perhaps it is time to reframe our thinking about knowledge and storing and sharing it. Long form content is a very bulky and linear way to represent knowledge. It is created for man, not machine. César Hidalgo argues that knowledge is connection between entities – as ‘Rembrandt’ ‘has as profession’ ‘painter’. The knowledge can be reduced to statements, that are entities themselves. These can be identified and used and reused. This is the basis of the Linked Data concept that Tim Berners-Lee proposed, in which the fifth and last star is won by connecting your data to data of others (‘my Einstein is the same as your Einstein’).

Knowledge can be represented in many ways, from symbolic systems like ontologies, taxonomies, and knowledge graphs that explicitly define concepts and their relationships. Newer statistical and neural methods, such as embeddings and large language models, capture meaning through patterns in data rather than formal logic. Increasingly, hybrid approaches combine these strengths, using structured knowledge to ground AI models and using AI to extract or enrich structured representations.

Canadian researchers systematically reviewed 77 studies on the interaction between knowledge graphs and large language models. In their peer-reviewed article Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. They conclude that LLMs can effectively support the construction and enrichment of knowledge graphs, while knowledge graphs can improve the accuracy and reliability of LLMs. They highlight the promise of this two-way integration but note ongoing challenges in scalability, domain adaptation, and ensuring transparency and fairness.

Combining symbolic AI (knowledge graphs, ontologies) and semantic/statistical AI (LLMs) to gain benefits of both is an interesting way forward for libraries, who have traditionally mediated knowledge. Can we use the clustering of LLMs—where words and concepts gravitate toward each other in abstract semantic space—to identify relationships that are meaningful to humans and turn those into usable representations of knowledge? And in the opposite direction, if we analyse the high-quality corpora stored in our collections, could we extract what might be called “consensus science” and convert it into machine-readable knowledge? This might allow us to build an “archaeology of knowledge,” revealing where ideas branched off from reliable sources or followed erroneous paths. Such a structured, aggregated representation—the closest we can get to a “sum of human knowledge”—could then serve as a guardrail for LLMs, reducing hallucinations by keeping their outputs within the bounds of what humanity broadly accepts as true.

What this asks us to acknowledge is that the form of knowledge representation is changing. From a book-and-letter based world, we moved to journal articles and data stories – and now the question is whether humans remain the primary target audience for knowledge? Do we want to embed knowledge in technology, making it easily accessible, but less embodied? What would that mean for humans? Would it enhance us – not having to spend time with finding out what is already known but moving beyond the known in the realm of the unknown – being creative in a fully human way? Or will we be the human in the loop – setting norms and evaluating outcomes? Will we still be able to do that if we lose the hard work of making sense of the world by argument? Will evolving chat-like systems make us dumber, or smarter?

One thing we do know at this point, is that there is a strong Mattheüs-effect, in which advantages tend to accumulate for those who already have them. The term comes from a verse in the Gospel of Matthew: “Those who already have will receive even more, and they will have plenty. But those who have little will lose even what they have”. This brings power into question. Who builds these systems and to what end?  Who maintains them and with what resources? Who gets access and at what price? In AI and governance I explore what that means for us.

César Hidalgo, in his brilliantWhy Information Grows, calls humans unique in that they can store information outside of their bodies. He calls these embodied works of the mind ‘crystallized imagination’. If more and more of what we call “knowledge” lives outside of us – in graphs, algorithms, embeddings – what remains inside? Perhaps our role will be to cultivate judgment, imagination, and the ethical courage to question the systems we build. Or perhaps we risk becoming spectators to our own intellectual inheritance, sitting like Plato’s prisoners before a new wall of glowing shadows. Knowledge has always been more than a stack of facts; it is an attitude toward the world, a willingness to look again, to doubt, to argue, to be changed. The path we choose will determine not just what we know, but who we are.