How to scrape C
iteSeerX
with AgentQL

Looking for a better way to scrape CiteSeerX? Say goodbye to fragile XPath or DOM selectors that easily break with website updates. AI-powered AgentQL ensures consistent web scraping across various platforms, from CiteSeerX to any other website, regardless of UI changes.

Not just for scraping CiteSeerX

Smart selectors work anywhere

https://citeseerx.ist.psu.edu

URL

Input any webpage.

{
  papers[] {
    title
    authors[]
    year
  }
}

Query

Describe data in natural language.

{
  "papers": [
    {
      "title": "A Novel Approach to Text Summarization",
      "authors": [
        "John Smith",
        "Alice Johnson"
      ],
      "year": 2023
    },
    {
      "title": "Deep Learning for Natural Language Processing",
      "authors": [
        "Bob Williams",
        "Eva Davis"
      ],
      "year": 2022
    }
  ]
}

Returns

Receive accurate output in seconds.

How to use AgentQL on CiteSeerX

A dotted lineA blue lineA blue line
1

Install the SDK

Install code for JS and Python

npm install agentql

pip3 install agentql

2

Test and refine

Use the query debugger

3

Run your script

Install code for both JS and Python

agentql init

python example.py

Get started

Holds no opinions on what’s and how’s. Build whatever makes sense to you.