Creating a crawler from a JDBC connection
This is an end-to-end scenario describing the different operations to create a crawler from a connection.List the JDBC connections
About this task
First, you need the ID of the connection you want to create a crawler on. To do this, you need to list all the existing JDBC connections.
method: GET
endpoint: https://api.<env>.cloud.talend.com/connections
headers: {
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections
. -
Click Add header to add a row and enter the following
key:value
pairs:Accept
:application/json
Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The details about connections are displayed in the BODY area and the status code 200 is returned.
If you want to create a crawler from one of the connections, make sure the typeLabel
value is Database
.
Check the crawler of a JDBC connection
This operation allows you to check if an existing JDBC connection already has a crawler.
About this task
You can have only one crawler per connection. If you try to create a crawler with a connection that already has a crawler, the creation will fail. That is why it is recommended to check the connection beforehand.
Before you begin
Make sure the connection is already created and the user issuing API calls knows the ID of this connection.
method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}
headers: {
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}
. -
Replace the placeholder with the correct values:
Parameter Value connectionId
Connection for which you want to check if there is already a crawler. You can find the dataset ID with a GET
request onhttps://api.<env>.cloud.talend.com/connections
. It is also available in Talend Cloud applications that use connections, in the URL of the connection’s overview page, after/connection/
. -
Click Add header to add a row and enter the following
key:value
pairs:Accept
:application/json
Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The BODY area is updated and the status code 200 is returned. The response for a connection without a crawler should look like this:
{
"data": [],
"offset": 0,
"limit": 0,
"total": 0
}
Scan the JDBC connection
This operation allows you to scan the content of a JDBC connection.
About this task
Before creating a crawler, you need to identify the tables and views you want to retrieve. The scan will look for all the tables and views on this connection. However, the tables and views found by this scan are not returned in the response payload. In order to get them, you need to call this endpoint with the GET method.
Before you begin
Make sure the connection is already created and the user issuing API calls knows the ID of this connection.
method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select POST from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
. -
Replace the placeholder with the correct values:
Parameter Value connectionId
Connection you want to scan. You can find the connection ID with a GET
request onhttps://api.<env>.cloud.talend.com/connection
. It is also available in Talend Cloud applications that use connections, in the URL of the dataset’s overview page, after/connection/
. -
Click Add header to add a row and enter the following
key:value
pair:Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The scan is launched and the status code 204 is returned.
Retrieve the tables and views of a connection
This operation allows you to retrieve the tables and views held by a connection based on the last connection scan.
Before you begin
- Make sure you first launched a scan of the connection.
- Make sure the connection is already created and the user issuing API calls knows the ID of this connection.
About this task
method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/scan/
. -
Click Add header to add a row and enter the following
key:value
pair:Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The connection tables and views are retrieved in the BODY area and the status code 204 is returned. Each time an element has been added to the connection, you need to call the scan with the POST and GET calls. Otherwise, you can use only the GET call.
Retrieve the users or groups to share datasets with
This operation allows you to retrieve the list of users and groups with whom a specific entity type can be shared. The list of available users and groups changes depending on which permissions they are assigned in Talend Management Console. You need this list to retrieve the user and group IDs to be able to share the datasets with them once the crawler is created.
method: GET
endpoint: https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset
headers: {
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset
. -
Click Add header to add a row and enter the following
key:value
pairs:Accept
:application/json
Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The BODY area displays the users and groups names and IDs, and the status code 200 is returned.
Create a crawler
This operation allows you to create a crawler on a specific JDBC connection. Now that you know:
- the users and/or groups you want to share the datasets with.
- the tables and/or views you want to retrieve.
You can create the crawler.
There are two crawling modes that you can use depending on your use case:
- The dynamic selection to retrieve all tables that match a specific filter, regardless of the content of your data source at a given time.
- The manual selection to manually select the tables to retrieve from the current state of your data source.
method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers
headers: {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
payload [{
"connectionId": "<connectionId>",
"name": "<name>",
"description": "<description>",
"selectedDatasets": [
"<table>",
"<table>"
],
"dynamic": true
"filters": [
{
"field": "<field>",
"values": ["<values>"],
"operator": "<operator>"
}
],
"sharings": [
{
"scimType": "<type>",
"scimId": "<userId>",
"level": "<role>"
},
{
"scimType": "<type>",
"scimId": "<groupId>",
"level": "<role>"
}
]
}]
Procedure
-
Select POST from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/crawlers
. -
Click Add header twice to add two rows and enter the following
key:value
pairs:Content-Type
:application/json
Accept
:application/json
Authorization
:Bearer <your_personal_access_token>
-
In the BODY area, enter the different information about the crawler depending on the crawling mode you want to use. Example for a manual selection:
{ "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89", "name": "My crawler", "description": "This is a description", "selectedDatasets": [ "TABLE1", "TABLE2" ], "dynamic": false "sharings": [ { "scimType": "user", "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ", "level": "OWNER" }, { "scimType": "group", "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12", "level": "READER" } ] }
Example for an automatic selection:
{ "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89", "name": "My crawler", "description": "This is a description", "selectedDatasets": [ "TABLE1", "TABLE2" ], "dynamic": true "filters": [ { "field": "name", "values": ["RETAIL"], "operator": "startsWith" } ], "sharings": [ { "scimType": "user", "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ", "level": "OWNER" }, { "scimType": "group", "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12", "level": "READER" } ] }
-
Send the request.
Results
The crawler has been created and the status code 201 is returned. The response is the ID of the new crawler.
Run a crawler
This operation allows you to run a crawler, once it is created.
About this task
When calling this endpoint, the crawler will rely on its configuration in order to retrieve all the selected tables and views and turn them into datasets. Once the dataset is created, the crawler also retrieves their samples.
You can launch the crawler as many time as you want. Running a crawler once will create the datasets. Running a crawler again will only refresh the sample of the existing datasets.
Before you begin
Make sure the crawler is already created and the user issuing API calls knows the ID of this crawler.
method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select POST from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
. -
Replace the placeholder with the correct values:
Parameter Value crawlerId
Crawler you want to run. You can find the crawler ID with a GET
request onhttps://api.<env>.cloud.talend.com/connections/crawlers
. -
Click Add header to add a row and enter the following
key:value
pair:Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The crawler is launched and the status code 202 is returned.
Retrieve the status of a crawler
This operation allows you to check the status of a running crawler.
Before you begin
Make sure the crawler is already created and running, and the user issuing API calls knows the ID of this crawler.
About this task
Once the crawler is launched, you want to know its state and check if it ended. The time a crawler takes to finish is proportional to the number of tables and views selected. It can only take a few minutes or several hours.
method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
. -
Replace the placeholder with the correct values:
Parameter Value crawlerId
Crawler you want to run. You can find the crawler ID with a GET
request onhttps://api.<env>.cloud.talend.com/connections/crawlers
. -
Click Add header to add a row and enter the following
key:value
pair:Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The status code 200 is returned. In the BODY section, if you check the runStatus
value, you get the status of the crawler.
Retrieve the statuses of datasets related to a crawler
This operation retrieves the statuses of datasets related to a crawler. Once the crawler is done, you can check the datasets created by the crawler.
Before you begin
Make sure a crawler is already created and the user issuing API calls knows the ID of this crawler.
About this task
method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets
headers: {
"Accept": "application/json",
"Authorization": "Bearer <your_personal_access_token>"
}
Procedure
-
Select GET from the Method list and in the field aside, enter the endpoint to be used:
https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets
. -
Replace the placeholder with the correct values:
Parameter Value crawlerId
Crawler you want to run. You can find the crawler ID with a GET
request onhttps://api.<env>.cloud.talend.com/connections/crawlers
. -
Click Add header to add a row and enter the following
key:value
pair:Accept
:application/json
Authorization
:Bearer <your_personal_access_token>
-
Send the request.
Results
The status code 200 is returned. In the BODY area, you get the status of the different datasets created.
For more details about a dataset, you can use this endpoint with a GET method: https://api.<env>.cloud.talend.com/datasets/{datasetId}
.