Skip to main content

Create URL File Upload

When you want to crawl URLs from a predefined list, you need to create a URL File Upload. It's a two-step process. First, call the createSignedUrlFileUpload mutation and retrieve the signedS3UploadUrl:

mutation createSignedUrlFileUpload($input: CreateSignedUrlFileUploadInput!) {
createSignedUrlFileUpload(input: $input) {
signedS3UploadUrl
urlFileUpload {
id
fileName
status
}
}
}

Try in explorer

Initially, your URL File Upload is in a Draft state. It's awaiting the upload of the file which is the second step. The signedS3UploadUrl is a pre-signed Amazon S3 bucket URL allowing you to upload your file so that we can process it. Assuming the file is named "url_list.txt" and you are in the same directory as the file, you can do it via a curl command:

curl -X PUT --data-binary @url_list.txt https://devops-infra-s3-urlfileuploads-resources-prod-use1.s3.us-east-1.amazonaws.com/UrlFileUploads/42319/url_list.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=CREDENTIAL&X-Amz-Date=20230621T073841Z&X-Amz-Expires=900&X-Amz-Security-Token=VERY_LONG_SECURITY_TOKEN&X-Amz-Signature=SIGNATURE&X-Amz-SignedHeaders=host&x-id=PutObject

The signedS3UploadUrl is valid for 15 minutes so make sure you upload your file within that period. Afterwards, we will process your file, count the URLs inside and change the URL File Upload status to Processed:

query getProjectAndUrlFileUploads($projectId: ObjectID!, $fileName: String!) {
getProject(id: $projectId) {
urlFileUploads(filter: { fileName: { eq: $fileName } }) {
nodes {
id
fileName
status
totalRows
}
}
}
}

Try in explorer

Now your file can be used for crawling.