Adding a List of URLs
Managing an Extractor's URL List
From the Inputs tab of an Extractor, you can manage the list of URLs extracted for when an Extractor starts a crawl run. You can either manually add URLs, import them from a file, or extract them from other pages with Chained Extractors.
Elements of the Inputs View
- Input source: Dropdown to set whether the Extractor uses URLs from an explicit list of URLs provided or URLs extracted by another Extractor.
- Clear All: Removes all the URLs from the list to start over.
- Remove Duplicate Rows: Removes any duplicate URLs from the list.
- Cleanup URLs: Removes invalid URLs and empty rows from the list.
- Download Inputs: Download a list of the URLs in CSV, Excel, JSON, or NDJSON format.
- Import Inputs: Import a list of URLs from a CSV or Excel (XLSX) file.
- Generate URLs: Opens URL generator to create URLs from a example URL.
- Add input row: Add blank row to list of inputs.
- Reset to saved inputs: Resets URL list to saved inputs.
- List view: Shows all of the URLs currently added.
- TextBox: Manually add URLs by inputting them in the textbox.
- Save: This saves any changes made to the URL list. When you add/remove/update URLs using the URLs Input, the changes will not be saved until you click Save.
- Run Inputs: Starts a new crawl run. If you have unsaved changes, this button will be disabled until you save your changes.
- Total Inputs: Display a count of URLs in the list. This is also how many queries a crawl run will use with that list of URLs (If screen capture is enabled then the total number of queries will be doubled).