logo
đź”’

Member Only Content

To access all features, please consider upgrading to full Membership.

AI Ecosystem Intelligence Explorer

LLM

21 of 199 articles

GitHub - NVIDIA/garak: the LLM vulnerability scanner

the LLM vulnerability scanner. Contribute to NVIDIA/garak development by creating an account on GitHub.

LLM
Cybersecurity
 
11/18/2024

NuExtract 1.5 - Multilingual, Infinite context, still small, and better than GPT-4o!

We introduce NuExtract 1.5, the new version of our foundation model for structured extraction. NuExtract 1.5 is multilingual, can handle arbitrarily long documents, and outperforms GPT-4o in English while being 500 times smaller. As usual, we release it under MIT license.

LLM
 
11/17/2024

Introducing Multimodal Llama 3.2

Complete this Guided Project in under 2 hours. Join our new short course, Introducing Multimodal Llama 3.2, and learn from Amit Sangani, Senior Director of…

LLM
Applied AI
AI Fundamentals
 
11/15/2024

Improve your prompts in the developer console

Today, we’re introducing the ability to improve prompts and manage examples directly in the Anthropic Console.

LLM
Prompting
 
11/15/2024

The Effect of Sampling Temperature on Problem Solving in Large Language Models

In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks. We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks. Then, we used nine popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.6. Despite anecdotal reports to the contrary, our empirical results indicate that changes in temperature from 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks. In addition, these results appear to generalize across LLMs, prompt-engineering techniques, and problem domains. All code, data, and supplemental materials are available on GitHub at: https://github.com/matthewrenze/jhu-llm-temperature

LLM
Prompting
AI Fundamentals
 
11/6/2024

IBM’s Granite foundation model: A detailed look at its training data

While many AI model developers publicly release research papers and their data training approaches, we’ll focus on one model in particular– IBM’s Granite model, where IBM has gone one step further and released their specific training data.

LLM
AI Fundamentals
 
11/5/2024

From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization

Although many studies have investigated and reduced hallucinations in large language models (LLMs) for single-document tasks, research on hallucination in multi-document summarization (MDS) tasks remains largely unexplored. Specifically, it is unclear how the challenges arising from handling multiple documents (e.g., repetition and diversity of information) affect models outputs. In this work, we investigate how hallucinations manifest in LLMs when summarizing topic-specific information from multiple documents. Since no benchmarks exist for investigating hallucinations in MDS, we use existing news and conversation datasets, annotated with topic-specific insights, to create two novel multi-document benchmarks. When evaluating 5 LLMs on our benchmarks, we observe that on average, up to 75% of the content in LLM-generated summary is hallucinated, with hallucinations more likely to occur towards the end of the summaries. Moreover, when summarizing non-existent topic-related information, gpt-3.5-turbo and GPT-4o still generate summaries about 79.35% and 44% of the time, raising concerns about their tendency to fabricate content. To understand the characteristics of these hallucinations, we manually evaluate 700+ insights and find that most errors stem from either failing to follow instructions or producing overly generic insights. Motivated by these observations, we investigate the efficacy of simple post-hoc baselines in mitigating hallucinations but find them only moderately effective. Our results underscore the need for more effective approaches to systematically mitigate hallucinations in MDS. We release our dataset and code at github.com/megagonlabs/Hallucination_MDS.

LLM
AI Fundamentals
 
10/29/2024

Detecting when LLMs are Uncertain

A deep dive into a new reasoning technique called Entropix.

LLM
AI Fundamentals
 
10/26/2024

Large Language Models Reflect the Ideology of their Creators

Large language models (LLMs) are trained on vast amounts of data to generate natural language, enabling them to perform tasks like text summarization and question answering. These models have become popular in artificial intelligence (AI) assistants like ChatGPT and already play an influential role in how humans access information. However, the behavior of LLMs varies depending on their design, training, and use. In this paper, we uncover notable diversity in the ideological stance exhibited across different LLMs and languages in which they are accessed. We do this by prompting a diverse panel of popular LLMs to describe a large number of prominent and controversial personalities from recent world history, both in English and in Chinese. By identifying and analyzing moral assessments reflected in the generated descriptions, we find consistent normative differences between how the same LLM responds in Chinese compared to English. Similarly, we identify normative disagreements between Western and non-Western LLMs about prominent actors in geopolitical conflicts. Furthermore, popularly hypothesized disparities in political goals among Western models are reflected in significant normative differences related to inclusion, social inequality, and political scandals. Our results show that the ideological stance of an LLM often reflects the worldview of its creators. This raises important concerns around technological and regulatory efforts with the stated aim of making LLMs ideologically `unbiased’, and it poses risks for political instrumentalization.

Ethics, Governance and Policy
LLM
 
10/26/2024
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only