How do you keep large amounts of data in sync?

thomax · September 9, 2019, 12:24pm

I work at a SaaS business and we have the following data types which we want to keep in sync in Pipedrive:

Users (maps to Pipedrive Persons)
Projects (maps to Pipedrive Deals)

Both these are in the five-digit range, and growing steadily every month. Depending on which plan a particular Project is on, it has a max quota on various resources (API requests, bandwidth etc), and a current usage number for those resources.

We’re trying to enable the following workflow:

A Project which spends any resource is flagged and queued for import into Pipedrive
Once updated in Pipedrive, a workflow detects when e.g. current_usage_bandwidth > quota_bandwidth and sends an email to the Person which belongs to that Deal

In short, we want to mirror some of our own data into Pipedrive and use the Workflow automation to react sensibly to new states.

The last week we’ve been working on two things:

A way to bulk-import all our Users/Persons and Projects/Deals into Pipedrive
A way to keep existing Persons and Deals up to date, and create new stuff as new users keep coming

We’ve accomplished both these tasks, but because of the sheer number of our Users + Projects, combined with the fact there’s no support for bulk create/update of Persons or Deals in Pipedrive, we’re blowing through the 10k daily rate limit.

Here’s some quick math which explains why. Say we have a Project and we want to make sure there’s a current matching Deal in Pipedrive. We then have to:

Look up the Person which should be assigned on this Deal
Look up the Deal
3a. If the Deal exists, update it
3b. If the Deal doesn’t exist, create it

Thus, for each Project we want to keep current in Pipedrive, we need to perform 3 requests. If we had 20k active Projects, that would mean 60k requests per day. Then there are the Users/Persons we want to keep in sync as well…

So I guess this is an elaborate way of asking other heavy users of Pipedrive out there: How do you deal with automatically keeping large amounts of data in sync?

TimMunro · September 9, 2019, 8:07pm

Hey Thomax, in order to minimize the number of API calls you can maintain a cache of relevant PD records in your middleware (the PD data extract API’s support batch operations). Then perform your “per-Person” and “per-Deal” lookups against the cache rather than the API. By doing so, when you run a sync, you only need to extract changed records from PD. Then (in you middleware) detect which Persons/Deals should be created/updated and only write to those PD records. If you still have > 10k changes to apply that will be a problem. See the Recents endpoint for info on how to extract changed records https://developers.pipedrive.com/docs/api/v1/#!/Recents.

thomax · September 10, 2019, 7:29am

Thanks a lot for the constructive response @TimMunro! Implementing our own cache would indeed cut down on the number of requests we’d have to do. However, just creating or updating the Persons+Deals which needs to be persisted will consume more than the daily 10k requests.

thomax · September 10, 2019, 11:27am

Ah! It was just brought to my attention that GET requests do not count towards the daily rate limit (only POST/PUT/DELET count). Thus, the caching solution would regrettably not be of any help; we still need to perform cosiderably more POST/PUT requests than 10k.

thomax · September 10, 2019, 12:02pm

@David perhaps you can suggest a solution to if/how we can get around this limitation?

Sorry for the blatant tagging, but if there’s a guru around here it seems like it might be you

andoitz · April 13, 2020, 1:41am

@David @thomax did you guys found a solution? Lets say I have 30k organizations and I want to add a new customfield. Can I update those 30k in batch to prevent 30k calls?

I think we need some batch implementations to update stuff, otherwise people with many items will have issues.

David · April 15, 2020, 11:07am

Hi @andoitz!

Unfortunately there’s currently no other option to do that as we don’t have bulk update for organization or field in API.

One of our engineers did suggest maybe sending multiple requests at the same time but this will depend on your plan’s rate limit. It might speed up your script