Home » How to Build a Multi-Round Deep Research Agent with Gemini, DuckDuckGo API, and Automated Reporting?

How to Build a Multi-Round Deep Research Agent with Gemini, DuckDuckGo API, and Automated Reporting?

We begin this tutorial by designing a modular deep research system that runs directly on Google Colab. We configure Gemini as the core reasoning engine, integrate DuckDuckGo’s Instant Answer API for lightweight web search, and orchestrate multi-round querying with deduplication and delay handling. We emphasize efficiency by limiting API calls, parsing concise snippets, and using structured prompts to extract key points, themes, and insights. Every component, from source collection to JSON-based analysis, allows us to experiment quickly and adapt the workflow for deeper or broader research queries. Check out the FULL CODES here.

import os
import json
import time
import requests
from typing import List, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re

We start by importing essential Python libraries that handle system operations, JSON processing, web requests, and data structures. We also incorporate Google’s Generative AI SDK and utilities, such as URL encoding, to ensure our research system operates smoothly. Check out the FULL CODES here.

@dataclass
class ResearchConfig:
   gemini_api_key: str
   max_sources: int = 10
   max_content_length: int = 5000
   search_delay: float = 1.0


class DeepResearchSystem:
   def __init__(self, config: ResearchConfig):
       self.config = config
       genai.configure(api_key=config.gemini_api_key)
       self.model = genai.GenerativeModel('gemini-1.5-flash')


   def search_web(self, query: str, num_results: int = 5) -> List[Dict[str, str]]:
       """Search web using DuckDuckGo Instant Answer API"""
       try:
           encoded_query = quote_plus(query)
           url = f"https://api.duckduckgo.com/?q={encoded_query}&format=json&no_redirect=1"


           response = requests.get(url, timeout=10)
           data = response.json()


           results = []


           if 'RelatedTopics' in data:
               for topic in data['RelatedTopics'][:num_results]:
                   if isinstance(topic, dict) and 'Text' in topic:
                       results.append({
                           'title': topic.get('Text', '')[:100] + '...',
                           'url': topic.get('FirstURL', ''),
                           'snippet': topic.get('Text', '')
                       })


           if not results:
               results = [{
                   'title': f"Research on: {query}",
                   'url': f"https://search.example.com/q={encoded_query}",
                   'snippet': f"General information and research about {query}"
               }]


           return results


       except Exception as e:
           print(f"Search error: {e}")
           return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]


   def extract_key_points(self, content: str) -> List[str]:
       """Extract key points using Gemini"""
       prompt = f"""
       Extract 5-7 key points from this content. Be concise and factual:


       {content[:2000]}


       Return as numbered list:
       """


       try:
           response = self.model.generate_content(prompt)
           return [line.strip() for line in response.text.split('n') if line.strip()]
       except:
           return ["Key information extracted from source"]


   def analyze_sources(self, sources: List[Dict[str, str]], query: str) -> Dict[str, Any]:
       """Analyze sources for relevance and extract insights"""
       analysis = {
           'total_sources': len(sources),
           'key_themes': [],
           'insights': [],
           'confidence_score': 0.7
       }


       all_content = " ".join([s.get('snippet', '') for s in sources])


       if len(all_content) > 100:
           prompt = f"""
           Analyze this research content for the query: "{query}"


           Content: {all_content[:1500]}


           Provide:
           1. 3-4 key themes (one line each)
           2. 3-4 main insights (one line each)
           3. Overall confidence (0.1-1.0)


           Format as JSON with keys: themes, insights, confidence
           """


           try:
               response = self.model.generate_content(prompt)
               text = response.text
               if 'themes' in text.lower():
                   analysis['key_themes'] = ["Theme extracted from analysis"]
                   analysis['insights'] = ["Insight derived from sources"]
           except:
               pass


       return analysis


   def generate_comprehensive_report(self, query: str, sources: List[Dict[str, str]],
                                   analysis: Dict[str, Any]) -> str:
       """Generate final research report"""


       sources_text = "n".join([f"- {s['title']}: {s['snippet'][:200]}"
                                for s in sources[:5]])


       prompt = f"""
       Create a comprehensive research report on: "{query}"


       Based on these sources:
       {sources_text}


       Analysis summary:
       - Total sources: {analysis['total_sources']}
       - Confidence: {analysis['confidence_score']}


       Structure the report with:
       1. Executive Summary (2-3 sentences)
       2. Key Findings (3-5 bullet points)
       3. Detailed Analysis (2-3 paragraphs)
       4. Conclusions & Implications (1-2 paragraphs)
       5. Research Limitations


       Be factual, well-structured, and insightful.
       """


       try:
           response = self.model.generate_content(prompt)
           return response.text
       except Exception as e:
           return f"""
# Research Report: {query}


## Executive Summary
Research conducted on "{query}" using {analysis['total_sources']} sources.


## Key Findings
- Multiple perspectives analyzed
- Comprehensive information gathered
- Research completed successfully


## Analysis
The research process involved systematic collection and analysis of information related to {query}. Various sources were consulted to provide a balanced perspective.


## Conclusions
The research provides a foundation for understanding {query} based on available information.


## Research Limitations
Limited by API constraints and source availability.
           """


   def conduct_research(self, query: str, depth: str = "standard") -> Dict[str, Any]:
       """Main research orchestration method"""
       print(f" Starting research on: {query}")


       search_rounds = {"basic": 1, "standard": 2, "deep": 3}.get(depth, 2)
       sources_per_round = {"basic": 3, "standard": 5, "deep": 7}.get(depth, 5)


       all_sources = []


       search_queries = [query]


       if depth in ["standard", "deep"]:
           try:
               related_prompt = f"Generate 2 related search queries for: {query}. One line each."
               response = self.model.generate_content(related_prompt)
               additional_queries = [q.strip() for q in response.text.split('n') if q.strip()][:2]
               search_queries.extend(additional_queries)
           except:
               pass


       for i, search_query in enumerate(search_queries[:search_rounds]):
           print(f" Search round {i+1}: {search_query}")
           sources = self.search_web(search_query, sources_per_round)
           all_sources.extend(sources)
           time.sleep(self.config.search_delay)


       unique_sources = []
       seen_urls = set()
       for source in all_sources:
           if source['url'] not in seen_urls:
               unique_sources.append(source)
               seen_urls.add(source['url'])


       print(f" Analyzing {len(unique_sources)} unique sources...")


       analysis = self.analyze_sources(unique_sources[:self.config.max_sources], query)


       print(" Generating comprehensive report...")


       report = self.generate_comprehensive_report(query, unique_sources, analysis)


       return {
           'query': query,
           'sources_found': len(unique_sources),
           'analysis': analysis,
           'report': report,
           'sources': unique_sources[:10]
       }

We define a ResearchConfig dataclass to manage parameters like API keys, source limits, and delays, and then build a DeepResearchSystem class that integrates Gemini with DuckDuckGo search. We implement methods for web search, key point extraction, source analysis, and report generation, allowing us to orchestrate multi-round research and produce structured insights in a streamlined workflow. Check out the FULL CODES here.

def setup_research_system(api_key: str) -> DeepResearchSystem:
   """Quick setup for Google Colab"""
   config = ResearchConfig(
       gemini_api_key=api_key,
       max_sources=15,
       max_content_length=6000,
       search_delay=0.5
   )
   return DeepResearchSystem(config)

We create a setup_research_system function that simplifies initialization in Google Colab by wrapping our configuration in ResearchConfig and returning a ready-to-use DeepResearchSystem instance with custom limits and delays. Check out the FULL CODES here.

if __name__ == "__main__":
   API_KEY = "Use Your Own API Key Here"


   researcher = setup_research_system(API_KEY)


   query = "Deep Research Agent Architecture"
   results = researcher.conduct_research(query, depth="standard")


   print("="*50)
   print("RESEARCH RESULTS")
   print("="*50)
   print(f"Query: {results['query']}")
   print(f"Sources found: {results['sources_found']}")
   print(f"Confidence: {results['analysis']['confidence_score']}")
   print("n" + "="*50)
   print("COMPREHENSIVE REPORT")
   print("="*50)
   print(results['report'])


   print("n" + "="*50)
   print("SOURCES CONSULTED")
   print("="*50)
   for i, source in enumerate(results['sources'][:5], 1):
       print(f"{i}. {source['title']}")
       print(f"   URL: {source['url']}")
       print(f"   Preview: {source['snippet'][:150]}...")
       print()

We add a main execution block where we initialize the research system with our API key, run a query on “Deep Research Agent Architecture,” and then display structured outputs. We print research results, a comprehensive report generated by Gemini, and a list of consulted sources with titles, URLs, and previews.

In conclusion, we see how the entire pipeline consistently transforms unstructured snippets into a structured, well-organized report. We successfully combine search, language modeling, and analysis layers to simulate a complete research workflow within Colab. By using Gemini for extraction, synthesis, and reporting, and DuckDuckGo for free search access, we create a reusable foundation for more advanced agentic research systems. This notebook provides a practical, technically detailed template that we can now expand with additional models, custom ranking, or domain-specific integrations, while still retaining a compact, end-to-end architecture.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *