Creating a crawler from a JDBC connection

This is an end-to-end scenario describing the different operations to create a crawler from a connection.

List the JDBC connections

About this task

First, you need the ID of the connection you want to create a crawler on. To do this, you need to list all the existing JDBC connections.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections.

  2. Click Add header to add a row and enter the following key:value pairs:

    • Accept : application/json
    • Authorization : Bearer <your_personal_access_token>
  3. Send the request.

Results

The details about connections are displayed in the BODY area and the status code 200 is returned. If you want to create a crawler from one of the connections, make sure the typeLabel value is Database.

Check the crawler of a JDBC connection

This operation allows you to check if an existing JDBC connection already has a crawler.

About this task

You can have only one crawler per connection. If you try to create a crawler with a connection that already has a crawler, the creation will fail. That is why it is recommended to check the connection beforehand.

Before you begin

Make sure the connection is already created and the user issuing API calls knows the ID of this connection.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers?connectionId={connectionId}.

  2. Replace the placeholder with the correct values:

    Parameter Value
    connectionId Connection for which you want to check if there is already a crawler. You can find the dataset ID with a GET request on https://api.<env>.cloud.talend.com/connections. It is also available in Talend Cloud applications that use connections, in the URL of the connection’s overview page, after /connection/.
  3. Click Add header to add a row and enter the following key:value pairs:

    • Accept : application/json
    • Authorization : Bearer <your_personal_access_token>
  4. Send the request.

Results

The BODY area is updated and the status code 200 is returned. The response for a connection without a crawler should look like this:

{
    "data": [],
    "offset": 0,
    "limit": 0,
    "total": 0
  }

Scan the JDBC connection

This operation allows you to scan the content of a JDBC connection.

About this task

Before creating a crawler, you need to identify the tables and views you want to retrieve. The scan will look for all the tables and views on this connection. However, the tables and views found by this scan are not returned in the response payload. In order to get them, you need to call this endpoint with the GET method.

Before you begin

Make sure the connection is already created and the user issuing API calls knows the ID of this connection.

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}.

  2. Replace the placeholder with the correct values:

    Parameter Value
    connectionId Connection you want to scan. You can find the connection ID with a GET request on https://api.<env>.cloud.talend.com/connection. It is also available in Talend Cloud applications that use connections, in the URL of the dataset’s overview page, after /connection/.
  3. Click Add header to add a row and enter the following key:value pair:

    • Authorization : Bearer <your_personal_access_token>
  4. Send the request.

Results

The scan is launched and the status code 204 is returned.

Retrieve the tables and views of a connection

This operation allows you to retrieve the tables and views held by a connection based on the last connection scan.

Before you begin

About this task

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/scan/{connectionId}
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/scan/.

  2. Click Add header to add a row and enter the following key:value pair:

    • Authorization : Bearer <your_personal_access_token>
  3. Send the request.

Results

The connection tables and views are retrieved in the BODY area and the status code 204 is returned. Each time an element has been added to the connection, you need to call the scan with the POST and GET calls. Otherwise, you can use only the GET call.

Retrieve the users or groups to share datasets with

This operation allows you to retrieve the list of users and groups with whom a specific entity type can be shared. The list of available users and groups changes depending on which permissions they are assigned in Talend Management Console. You need this list to retrieve the user and group IDs to be able to share the datasets with them once the crawler is created.

method: GET
endpoint: https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/sharing/sharings/eligibles/dataset.

  2. Click Add header to add a row and enter the following key:value pairs:

    • Accept : application/json
    • Authorization : Bearer <your_personal_access_token>
  3. Send the request.

Results

The BODY area displays the users and groups names and IDs, and the status code 200 is returned.

Create a crawler

This operation allows you to create a crawler on a specific JDBC connection. Now that you know:

You can create the crawler.

There are two crawling modes that you can use depending on your use case:

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers
headers: {
 "Content-Type": "application/json",
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}
payload [{
    "connectionId": "<connectionId>",
    "name": "<name>",
    "description": "<description>",
    "selectedDatasets": [
      "<table>",
      "<table>"
    ],
    "dynamic": true
      "filters": [
      {
        "field": "<field>",
        "values": ["<values>"],
        "operator": "<operator>"
      }
      ],
    "sharings": [
      {
        "scimType": "<type>",
        "scimId": "<userId>",
        "level": "<role>"
      },
      {
        "scimType": "<type>",
        "scimId": "<groupId>",
        "level": "<role>"
      }
    ]
   }]

Procedure

  1. Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers.

  2. Click Add header twice to add two rows and enter the following key:value pairs:

    • Content-Type : application/json
    • Accept : application/json
    • Authorization : Bearer <your_personal_access_token>
  3. In the BODY area, enter the different information about the crawler depending on the crawling mode you want to use. Example for a manual selection:

     {
     "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89",
     "name": "My crawler",
     "description": "This is a description",
     "selectedDatasets": [
       "TABLE1",
       "TABLE2"
     ],
     "dynamic": false
     "sharings": [
       {
         "scimType": "user",
         "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ",
         "level": "OWNER"
       },
       {
         "scimType": "group",
         "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12",
         "level": "READER"
       }
     ]
    }
    

    Example for an automatic selection:

      {
      "connectionId": "d54a8f03-7906-4930-a7cc-4eb90e968f89",
      "name": "My crawler",
      "description": "This is a description",
      "selectedDatasets": [
        "TABLE1",
        "TABLE2"
      ],
      "dynamic": true
        "filters": [
        {
          "field": "name",
          "values": ["RETAIL"],
          "operator": "startsWith"
        }
        ],
      "sharings": [
        {
          "scimType": "user",
          "scimId": "9d733659-9312-46f9-b39f-abb3e35215fe ",
          "level": "OWNER"
        },
        {
          "scimType": "group",
          "scimId": "e053dffc-e7d1-415e-857f-60ffe4d42c12",
          "level": "READER"
        }
      ]
    }
    
  4. Send the request.

Results

The crawler has been created and the status code 201 is returned. The response is the ID of the new crawler.

Run a crawler

This operation allows you to run a crawler, once it is created.

About this task

When calling this endpoint, the crawler will rely on its configuration in order to retrieve all the selected tables and views and turn them into datasets. Once the dataset is created, the crawler also retrieves their samples.

You can launch the crawler as many time as you want. Running a crawler once will create the datasets. Running a crawler again will only refresh the sample of the existing datasets.

Before you begin

Make sure the crawler is already created and the user issuing API calls knows the ID of this crawler.

method: POST
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select POST from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}.

  2. Replace the placeholder with the correct values:

    Parameter Value
    crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
  3. Click Add header to add a row and enter the following key:value pair:

    • Authorization : Bearer <your_personal_access_token>
  4. Send the request.

Results

The crawler is launched and the status code 202 is returned.

Retrieve the status of a crawler

This operation allows you to check the status of a running crawler.

Before you begin

Make sure the crawler is already created and running, and the user issuing API calls knows the ID of this crawler.

About this task

Once the crawler is launched, you want to know its state and check if it ended. The time a crawler takes to finish is proportional to the number of tables and views selected. It can only take a few minutes or several hours.

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}
headers: {
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}.

  2. Replace the placeholder with the correct values:

    Parameter Value
    crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
  3. Click Add header to add a row and enter the following key:value pair:

    • Authorization : Bearer <your_personal_access_token>
  4. Send the request.

Results

The status code 200 is returned. In the BODY section, if you check the runStatus value, you get the status of the crawler.

This operation retrieves the statuses of datasets related to a crawler. Once the crawler is done, you can check the datasets created by the crawler.

Before you begin

Make sure a crawler is already created and the user issuing API calls knows the ID of this crawler.

About this task

method: GET
endpoint: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets
headers: {
 "Accept": "application/json",
 "Authorization": "Bearer <your_personal_access_token>"
}

Procedure

  1. Select GET from the Method list and in the field aside, enter the endpoint to be used: https://api.<env>.cloud.talend.com/connections/crawlers/{crawlerId}/datasets.

  2. Replace the placeholder with the correct values:

    Parameter Value
    crawlerId Crawler you want to run. You can find the crawler ID with a GET request on https://api.<env>.cloud.talend.com/connections/crawlers.
  3. Click Add header to add a row and enter the following key:value pair:

    • Accept : application/json
    • Authorization : Bearer <your_personal_access_token>
  4. Send the request.

Results

The status code 200 is returned. In the BODY area, you get the status of the different datasets created. For more details about a dataset, you can use this endpoint with a GET method: https://api.<env>.cloud.talend.com/datasets/{datasetId}.