20Sep


The theory of questioning emphasises that while answering existing questions deepens understanding of a topic, it often leads to new questions.

To initiate this dynamic process, STORM simulates a conversation between a Wikipedia writer and a topic expert.

In each round of conversation, the LLM-powered writer generates a question based on the topic, its assigned perspective, and the conversation history.

This history helps the LLM update its understanding and formulate follow-up questions, with a maximum limit of rounds set for the conversation.

To ensure the conversation remains factual, trusted online sources are used to ground each answer.

If a question is complex, the LLM first breaks it down into simpler search queries. The search results are then evaluated with a rule-based filter to exclude unreliable sources.

Finally, the LLM synthesises information from trustworthy sources to generate the answer, which is also added to the references for the full article.

I executed the LangChain implementation in a notebook, utilizing the GPT-3.5-Turbo model.

Along with that, tools such as DuckDuckGo for search functionality and Tavily-Python for other resources are required.

The only modification required was the inclusion of the commandpip install -U duckduckgo-search in the notebook to ensure proper functionality.

Below is an example of a prompt within the LangChain implementation of STORM…

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
),
("user", "{topic}"),
]
)

And here is an example topic prompt…

example_topic = "Impact of million-plus token context window language models on RAG"

Below the gen_related_topics_promp prompt…

gen_related_topics_prompt = ChatPromptTemplate.from_template(
"""I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
"""
)

class RelatedSubjects(BaseModel):
topics: List[str] = Field(
description="Comprehensive list of related subjects as background research.",
)

expand_chain = gen_related_topics_prompt | fast_llm.with_structured_output(
RelatedSubjects
)

And the gen_perspectives_prompt prompt…

gen_perspectives_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

Wiki page outlines of related topics for inspiration:
{examples}""",
),
("user", "Topic of interest: {topic}"),
]
)

And the gen_qn_prompt prompt…

gen_qn_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
),
MessagesPlaceholder(variable_name="messages", optional=True),
]
)

This LangChain implementation is actually a LangGraph implementation. Below are the two flows that are used in this setup:

Lastly, below is the final article, featuring comprehensive and well-balanced content. It is neatly organised and includes a structured table of contents for easy navigation.



Source link

Protected by Security by CleanTalk