Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
COLLABORATIONS
6 min read
Share
Large language models (LLMs) have catalyzed a new era of unprecedented innovation, leading to new products and the enhancement or reinvention of established ones for both businesses and the general public. LLMs have become integral to our daily personal and professional routines, helping us query information or summarize extensive data.
However, their capabilities, as impressive as they are, come with their own set of imperfections, notably concerning the reliability and accuracy of the content LLMs generate. Understood to sometimes produce ”hallucination” errors, these models need to be used with care, particularly when it comes to tasks where accuracy is critical.
At Cisco Research, where we are committed to the principles of responsible AI, we have championed both academic and internal AI research initiatives aimed at addressing the issues of LLM hallucination and reliability. The effort has led to many great scholarly papers and open-source contributions featured in premier AI conferences.
Recently, we held a PI (Principal Investigator) summit that featured four distinguished NLP (natural language processing) researchers who unveiled their latest LLM research on detecting and mitigating LLM hallucinations. I shortly delve into the insights presented by our esteemed panel of experts below.
William Wang from University of California, Santa Barbara led with a presentation featuring two of his notable research initiatives. Initially, he explored the complexities of in-context learning and prompt engineering algorithm design. In a recent publication at NeurIPS 2023i, the authors examine in-context learning through the lens of latent variables, positing that Large Language Models (LLMs) inherently serve as topic models. In the latter segment of his talk, he shifted focus to the Logic-LMii project, which was highlighted at EMNLP 2023. This project underscores the potential of symbolic reasoning to enhance the reasoning capabilities and truthfulness of LLMs in selected areas. Here, they adopted techniques like logic programming, first-order logic, constraint satisfaction problem (CSP) solvers, and satisfiability (SAT) solvers to reframe problems into structured language, enabling them to deploy suitable solver tools that align with the problem domain to refine the reasoning processes of LLMs.
Kai Shu from the Illinois Institute of Technology brought a new perspective to the conversation about hallucinations in Large Language Models (LLMs). Kai highlighted recent researchiii shared at the EMNLP conference, which delves into combating misinformation in the age of LLMs, underscoring the potential for malicious use in provoking AI-generated hallucinations. Shu and the authors argued that LLMs act as a double-edged sword in the misinformation domain, being capable of both detecting misinformation and, unfortunately, also generating it.
With a growing concern over the misuse of LLMs to craft sophisticated and hard-to-detect misinformation, the team addressed three main aspects: detection, mitigation, and source identification of misinformation. They proposed creative solutions to tackle these intricate challenges.
Danqi Chen from Princeton University recently showcased their innovative research on citation quality generation and evaluation at the EMNLP conferenceiv. A key enhancement for the utility of searches conducted by LLMs is the integration of dependable citations throughout different segments of the model's output. This not only potentially eases the process of verifying the response but is also crucial for its reliability.
Tackling the challenge of evaluating the merit of such citations is no simple feat. To address this, Danqi introduced a benchmarking system named ALCEv, (Automatic LLMs' Citation Evaluation). This system promises to standardize the assessment of citation quality and, as a bonus, it paves the way for the development of more sophisticated citation mechanisms in LLM-driven searches.
Finally, Huan Sun from The Ohio State University brought a refreshing perspective to the table regarding the phenomenon of hallucination in large language models. Contrary to common beliefs, she argued that hallucination shouldn't be seen as a glitch in LLMs but rather as an inherent feature that, when leveraged appropriately, can offer significant value. She likened it to human perception, which could be viewed as a type of controlled hallucination.
Further into her presentation, Sun introduced two of her team's projects: Mind2Webvi, the first dataset for developing and evaluating generalist agents for the web, and SeeActvii, which focuses on providing visual grounding for web agents.
You can find in-depth coverage of these presentations and the lively panel discussion that concluded the summit on Outshift’s YouTube channel.
To wrap up, here are some personal insights and reflections:
Subscribe to Outshift’s YouTube channel for more featured Cisco Research summits.
References
Get emerging insights on emerging technology straight to your inbox.
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.