Pagination: changes while parsing data?

Hi there,

I’ve written a recursive function to pull all data from certain API endpoints from my company’s pipedrive, which is required for ETL purposes. For Deals it works perfectly fine, however for Activities I sometimes encounter a problem:

We have 33k+ activities and growing. When pulling them with a limit of 500 per request, it takes some time to pull it all. During this time, the underlying data may change. Now, this is not so bad per se if one or two activities update during the process. However, it seems that the order of the activities is changed, leading to an error when parsing the data to a data warehouse where the activity id is the primary key.

The important question to me now is which one of the two statements is true (or both?):

  1. when a new activity is created, it becomes the first activity to be pulled in a request
  2. when an existing activity is altered/updated in any way, it becomes the first activityz to be pulled in a request

If neither is true, what is the logic for the order of activities (and, in extension, all pipedrive data pulled via API)?

My main concern is that I’m missing existing data. I don’t care about either new activities or changes to existing activities, but I DO care about missing existing data, since this compromises my data consistency. So if number 2 is true, how can I pull a “snapshot” of all activities at once?

Best,
Bijan

Hey Bijan,

Activities should actually be pulled by their “due_date” and “due_time”.

1 Like

Understood! That makes a lot of sense.

Now, when a new activity is created, what is the default due_date?

It will default to the day the Activity was created (no specific time will be added).

1 Like

I have a similar question in relation to mail threads. What are mail threads pulled by? update_time?