Creating a crawler from a JDBC connection

This is an end-to-end scenario describing the different operations to create a crawler from a connection.

List the JDBC connections

About this task

First, you need the ID of the connection you want to create a crawler on. To do this, you need to list all the existing JDBC connections.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections.
Click Add header to add a row and enter the following key:value pairs:
- Accept : application/json
- Authorization : Bearer <your_personal_access_token>
Send the request.

Results

The details about connections are displayed in the BODY area and the status code 200 is returned. If you want to create a crawler from one of the connections, make sure the typeLabel value is Database.

Check the crawler of a JDBC connection

This operation allows you to check if an existing JDBC connection already has a crawler.

About this task

You can have only one crawler per connection. If you try to create a crawler with a connection that already has a crawler, the creation will fail. That is why it is recommended to check the connection beforehand.

Before you begin

Make sure the connection is already created and the user issuing API calls knows the ID of this connection.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}.

Replace the placeholder with the correct values:

Parameter	Value
`connectionId`	Connection for which you want to check if there is already a crawler. You can find the dataset ID with a `GET` request on `https://api.<env>.cloud.talend.com/connections`. It is also available in Talend Cloud applications that use connections, in the URL of the connection’s overview page, after `/connection/`.

Click Add header to add a row and enter the following key:value pairs:
- Accept : application/json
- Authorization : Bearer <your_personal_access_token>
Send the request.

Results

The BODY area is updated and the status code 200 is returned. The response for a connection without a crawler should look like this:

{
    "data": [],
    "offset": 0,
    "limit": 0,
    "total": 0
  }

Scan the JDBC connection

This operation allows you to scan the content of a JDBC connection.

About this task

Before creating a crawler, you need to identify the tables and views you want to retrieve. The scan will look for all the tables and views on this connection. However, the tables and views found by this scan are not returned in the response payload. In order to get them, you need to call this endpoint with the GET method.

Before you begin

Make sure the connection is already created and the user issuing API calls knows the ID of this connection.

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}.

Replace the placeholder with the correct values:

Parameter	Value
`connectionId`	Connection you want to scan. You can find the connection ID with a `GET` request on `https://api.<env>.cloud.talend.com/connection`. It is also available in Talend Cloud applications that use connections, in the URL of the dataset’s overview page, after `/connection/`.

Click Add header to add a row and enter the following key:value pair:
- Authorization : Bearer <your_personal_access_token>
Send the request.

Results

The scan is launched and the status code 204 is returned.

Retrieve the tables and views of a connection

This operation allows you to retrieve the tables and views held by a connection based on the last connection scan.

Before you begin

Make sure you first launched a scan of the connection.
Make sure the connection is already created and the user issuing API calls knows the ID of this connection.

About this task

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/scan/.
Click Add header to add a row and enter the following key:value pair:
- Authorization : Bearer <your_personal_access_token>
Send the request.

Results

The connection tables and views are retrieved in the BODY area and the status code 204 is returned. Each time an element has been added to the connection, you need to call the scan with the POST and GET calls. Otherwise, you can use only the GET call.

This operation allows you to retrieve the list of users and groups with whom a specific entity type can be shared. The list of available users and groups changes depending on which permissions they are assigned in Talend Management Console. You need this list to retrieve the user and group IDs to be able to share the datasets with them once the crawler is created.

method: GET
endpoint: https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset.
Click Add header to add a row and enter the following key:value pairs:
- Accept : application/json
- Authorization : Bearer <your_personal_access_token>
Send the request.

Results

The BODY area displays the users and groups names and IDs, and the status code 200 is returned.

Create a crawler

This operation allows you to create a crawler on a specific JDBC connection. Now that you know:

the users and/or groups you want to share the datasets with.
the tables and/or views you want to retrieve.

You can create the crawler.

There are two crawling modes that you can use depending on your use case:

The dynamic selection to retrieve all tables that match a specific filter, regardless of the content of your data source at a given time.
The manual selection to manually select the tables to retrieve from the current state of your data source.

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers
headers: {
 "Content-Type": "application/json",
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}
payload [{
    "connectionId": "<connectionId>",
    "name": "<name>",
    "description": "<description>",
    "selectedDatasets": [
      "<table>",
      "<table>"
    ],
    "dynamic": true
      "filters": [
      {
        "field": "<field>",
        "values": ["<values>"],
        "operator": "<operator>"
      }
      ],
    "sharings": [
      {
        "scimType": "<type>",
        "scimId": "<userId>",
        "level": "<role>"
      },
      {
        "scimType": "<type>",
        "scimId": "<groupId>",
        "level": "<role>"
      }
    ]
   }]

Procedure

Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers.
Click Add header twice to add two rows and enter the following key:value pairs:
- Content-Type : application/json
- Accept : application/json
- Authorization : Bearer <your_personal_access_token>

In the BODY area, enter the different information about the crawler depending on the crawling mode you want to use. Example for a manual selection:

 {
 "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89",
 "name": "My crawler",
 "description": "This is a description",
 "selectedDatasets": [
   "TABLE1",
   "TABLE2"
 ],
 "dynamic": false
 "sharings": [
   {
     "scimType": "user",
     "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ",
     "level": "OWNER"
   },
   {
     "scimType": "group",
     "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12",
     "level": "READER"
   }
 ]
}

Example for an automatic selection:

  {
  "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89",
  "name": "My crawler",
  "description": "This is a description",
  "selectedDatasets": [
    "TABLE1",
    "TABLE2"
  ],
  "dynamic": true
    "filters": [
    {
      "field": "name",
      "values": ["RETAIL"],
      "operator": "startsWith"
    }
    ],
  "sharings": [
    {
      "scimType": "user",
      "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ",
      "level": "OWNER"
    },
    {
      "scimType": "group",
      "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12",
      "level": "READER"
    }
  ]
}

Send the request.

Results

The crawler has been created and the status code 201 is returned. The response is the ID of the new crawler.

Run a crawler

This operation allows you to run a crawler, once it is created.

About this task

When calling this endpoint, the crawler will rely on its configuration in order to retrieve all the selected tables and views and turn them into datasets. Once the dataset is created, the crawler also retrieves their samples.

You can launch the crawler as many time as you want. Running a crawler once will create the datasets. Running a crawler again will only refresh the sample of the existing datasets.

Before you begin

Make sure the crawler is already created and the user issuing API calls knows the ID of this crawler.

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}.
Replace the placeholder with the correct values:

Parameter Value

crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
Click Add header to add a row and enter the following key:value pair:
- Authorization : Bearer <your_personal_access_token>
Send the request.

Parameter	Value
`crawlerId`	Crawler you want to run. You can find the crawler ID with a `GET` request on `https://api.<env>.cloud.talend.com/connections/crawlers`.

Results

The crawler is launched and the status code 202 is returned.

Retrieve the status of a crawler

This operation allows you to check the status of a running crawler.

Before you begin

Make sure the crawler is already created and running, and the user issuing API calls knows the ID of this crawler.

About this task

Once the crawler is launched, you want to know its state and check if it ended. The time a crawler takes to finish is proportional to the number of tables and views selected. It can only take a few minutes or several hours.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}.
Replace the placeholder with the correct values:

Parameter Value

crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
Click Add header to add a row and enter the following key:value pair:
- Authorization : Bearer <your_personal_access_token>
Send the request.

Parameter	Value
`crawlerId`	Crawler you want to run. You can find the crawler ID with a `GET` request on `https://api.<env>.cloud.talend.com/connections/crawlers`.

Results

The status code 200 is returned. In the BODY section, if you check the runStatus value, you get the status of the crawler.

This operation retrieves the statuses of datasets related to a crawler. Once the crawler is done, you can check the datasets created by the crawler.

Before you begin

Make sure a crawler is already created and the user issuing API calls knows the ID of this crawler.

About this task

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets.
Replace the placeholder with the correct values:

Parameter Value

crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
Click Add header to add a row and enter the following key:value pair:
- Accept : application/json
- Authorization : Bearer <your_personal_access_token>
Send the request.

Parameter	Value
`crawlerId`	Crawler you want to run. You can find the crawler ID with a `GET` request on `https://api.<env>.cloud.talend.com/connections/crawlers`.

Results

The status code 200 is returned. In the BODY area, you get the status of the different datasets created. For more details about a dataset, you can use this endpoint with a GET method: https://api.<env>.cloud.talend.com/datasets/{datasetId}.

Next section: Scheduling a crawler run

Creating a crawler from a JDBC connection

List the JDBC connections

Check the crawler of a JDBC connection

Scan the JDBC connection

Retrieve the tables and views of a connection

Retrieve the users or groups to share datasets with

Create a crawler

Run a crawler

Retrieve the status of a crawler

Retrieve the statuses of datasets related to a crawler