Matching PolygonIO 'Grouped Daily (Bars)' with historical trades

Discussion in 'Data Sets and Feeds' started by Zuiquan, Jul 20, 2021.

  1. Zuiquan

    Zuiquan

    Hi,

    has anyone succeeded in reproducing the Grouped Daily (Bars) close from the trades history?

    I tried many rules, including using conditions defined here (https://polygon.io/glossary/us/stocks/conditions-indicators)

    But, did not succeed in establishing a set of rules to match for every active tickers in a day.
     
  2. guru

    guru


    Looks like common problem with every data provider since electronic trading began, unrelated to Polygon. See such questions repeated every so often.
    Even if you match Polgyon’s data then it will not match someone else’s data, broker’s data, exchange’s data or the actual data, so what’s the point. When brokers include all the data then people complain that they see too many error spikes from dark pools and late reported trades. When they clean those up then people complain that something doesn’t match. So everyone has their own data cleaning approach and your data won’t match anyone else’s, including Polygon’s.
    And what’s the point anyway besides overfitting a system to every tick so you won’t be able to trade in actual world where things do get random and messy all the time. If you’re in trading biz then you have to plan for randomness.
     
    cruisecontrol likes this.
  3. Sounds a lot like your other thread. The answer is no. Nobody reproduces bars from any source. It is a fool's errand to try, and worthless even if you succeed.

    You are still in this newbie fallacy that there is one correct way to interpret trade data and you assume that Polygon must be doing it The Right Way, because they are professionals, or something. Therefore, reproducing them would be valuable, because that means you are also doing it The Right Way.

    That's not how this stuff works. There's no reason to think that Polygon Bars are anything special, or any other data provider. Even if if you correctly constructed bars based on the rules in that link for updating high/low/last, there are a million reasons why it might not match, including the possibility that (gasp) their code might have bugs in it.

    I would go so far as to bet that even if they told you how they think their bars are built, you would probably not reproduce them, because they wouldn't be able to describe every detail without actually giving you the source code. So if reproducing them is really what you want, that's what you should do: Ask Polygon to give you their source code for building the bars.
     
    Last edited: Jul 21, 2021
    guru likes this.
  4. Zuiquan

    Zuiquan

    > You are still in this newbie fallacy that there is one correct way to interpret trade data and you assume that Polygon must be doing it The Right Way

    No!
    I understood from you and other that no data provider, or platform, were providing any "right" answer.
    But as I rely on Polygon's aggregates, while doing some other processing using their raw stream,
    I just want to keep coherency.
    Being true/accurate or not.

    Hence, my question is really Polygon oriented, and not related to any other data provider.

    > I would go so far as to bet that even if they told you how they think their bars are built, you would probably not reproduce them

    This is exactly what I face currently.
    And, the reason why I asked here in case someone would have done the kind of reverse engineering.
    I started doing it myself, and reached only 40 mismatch over a set of 5663 tickers (on 1 trading day)
    For now, I am doing it with this
    But, if I can reduce the number of mismatch that would be better.

    > Ask Polygon to give you their source code for building the bars.

    I may come to this ;)
    Not sure to be successful, though...
     
    DoctorProfits likes this.
  5. OK, but why? You can't rely on them being coherent anyways. Even if you manage to eliminate every mismatch now, that doesn't mean that more mismatches won't appear in the future if Polygon changes or if the underlying data changes. Having been in this game for about a decade now I know it happens all the time. New exchanges, new conditions, etc. It's a moving target and you have to design accordingly to be robust to unexpected data.
     
  6. Polygon.io

    Polygon.io Sponsor

    Hi Zuiquan!

    As the other replies have stated, there really isn't a de-facto source for this data. We believe that following the SIP's guidelines for calculating these values is the correct way to do so, but it really is up to each provider.

    To shed some light on how we determine our close values. We use the last trade after 4:00:00 that is eligible to update the 'Last' value.

    Most of the time, trades that don't update 'Last' have conditions 12(Form T), 13 (late), or 37 (odd lot, aka < 100), but in general any condition listed here that does not update Last according to the consolidated processing guidelines (which is what we use) will make a trade ineligible to update the close.

    I hope this helps. Please let me know if I can clear anything else up.
     
  7. Zuiquan

    Zuiquan

    Hi Quinton!

    Thank you very much for your message.
    I had tried using the condition list, but I was using it the wrong way.

    I think the condition list is misleading because there are conditions that are not listed as Yes or No.

    And so, I was assuming the trade was eligible to change the 'Last' if all conditions (if there was) of the trade were associated to a Yes.
    This is not true.
    For example, a condition 41 was making the trade not eligible.

    And, it might also be better to specify that this works based on exclusion, not inclusion.
    Meaning that even if a condition says Yes, if there is another that says No, then it makes the trade not eligible.

    When I focus on using the No in the condition list, and excluding a trade as soon as one condition is a No, then I get an almost perfect match.
    Note: I search for the last eligible trade. So, searching backward through the raw trades history.

    Still, I have those 3 cases not matching (for the 2021-07-09) :

    ticker ' ARCC' market close price:
    - DailyBars=20.010000
    - Raw trade selected: p= 20.00630 t=1625860847731262844 conds=[] x=4 s= 179

    ticker ' STNE' market close price:
    - DailyBars=64.300000
    - Raw trade selected: p= 64.00000 t=1625861065311822384 conds=[ 14, 32, 41] x=4 s= 56100

    ticker ' VXUS' market close price:
    - DailyBars=65.610000
    - Raw trade selected: p= 65.54000 t=1625861438406984899 conds=[] x=4 s= 100

    Could you please tell me what makes those 3 tickers not matching?
     
    Last edited: Jul 23, 2021
  8. Polygon.io

    Polygon.io Sponsor

    This actually ended up being a bug on our end. We'll backfill the data tonight, which should correct these discrepancies. Apologies for the confusion.