The Gateway to Algorithmic and Automated Trading

Capturing the reaction to insider purchases

Published in Automated Trader Magazine Issue 40 Q3 2016

Visibility into insider trading (or director dealing) is important to market participants. Studies have consistently shown insiders' trades are informed and followed by significant abnormal returns. Consequently, market participants race to follow insiders who make unexpected moves.

In the US, the Securities and Exchange Commision (SEC) disseminates regulatory filings using the EDGAR system, which stands for Electronic Data Gathering, Analysis and Retrieval. This is used to automatically collect, validate, index, accept and forward form submissions by SEC-regulated entities. Within this system, the Form-4 filing is the "statement of changes in beneficial ownership of securities." It is the form used to report legal insider trading. Insiders must file a Form-4 within two business days of acquiring or disposing of company securities. Every company director and officer is required to report trades in this way. Additionally, any person owning more than 10% of a class of equity securities must also file.

Greg Harris


Greg Harris is about to complete his PhD in Computer Science, with an emphasis in machine learning, at the University of Southern California. Prior to this, he developed quantitative trading strategies for Black River Asset Management. He also holds a Master of Financial Mathematics degree from the University of Minnesota. For around a year, he was the fastest Form-4 trader on the block.

Insiders are subject to many trading restrictions. For example, they aren't allowed to profit from short-term price swings (any period less than 6 months). If they buy stock, they are committed to holding it for a while. They aren't allowed to trade while in the possession of non-public information. In fact, most companies have policies barring any trading during the month before an earnings release. The only exception is Rule 10b5-1, which is "designed to cover situations in which a person can demonstrate that the material non-public information was not a factor in the trading decision." For example, a CEO can arrange with a broker to sell a small amount of shares on his or her behalf on a monthly schedule. Those trades are legal, even if they occur before an earnings announcement. On a side-note, this creates a legal loophole. Only trading can be illegal; refraining from trading is not illegal, even if the decision was based on material non-public information. So it is legal to cancel the pre-arranged sale right before a positive earnings announcement.


The SEC provides compressed archives of filings on their FTP site. The archives are nearly complete, missing only a small number of retracted filings. Unfortunately for backtesting purposes, the archives include only the filing date, and not the timestamp. It isn't enough information to tell whether the filing was submitted before or after the market has closed, which is necessary for backtests based on daily data. The solution is to extract timestamps from filing headers, which are found on the SEC's website. This can take weeks, because the SEC asks the public not to perform bulk requests between 6:00 and 21:00 Eastern time on business days. The URL for each header is derived from the company identifier and accession number, which are known as the CIK and specific filing ID respectively and which are available in daily index files on the FTP site. The 'Links' section at the end of this article contains an example header URL. Out of 10.7 million archived filings, timestamps can be retrieved for around 9 million of them. The timestamps have second-resolution, but filings can actually be made public up to a minute before or after the recorded time. One final problem is that corrected filings show the timestamp of the correction, and not the original.

Electronic filings began to be available back in 1995. However, the Form-4 became really useful for backtesting in 2003, when a requirement was put in place that such filings be submitted in XML format instead of plain text. The XML format is machine-readable, although it still has a few fields with unstructured text (e.g. officer title, footnotes.). Unfortunately, the issuer trading symbol (ticker) is one such unstructured text field, and some filers have input interesting values that are difficult to map to price data. For academics at subscribing schools, Wharton Research Data Services provides a mapping table which links the issuer CIK and the company identifiers used by CRSP and TAQ.

I myself store all filings in a single large binary archive. I use ZLIB to compress each filing and append the compressed bytes to the end of the archive. For each filing, I store the file offset and compressed size in a traditional database. This compression strategy keeps the archive relatively small (fits on a 1 TB SSD), yet still gives me the random access needed to decompress a single filing at a time. Using an SSD keeps the seek time low. If you don't want to go down this route, the Form-4 is sufficiently structured to process and store directly in a relational database.


The market reacts much more strongly to insider purchases than to insider sales. Insiders can sell for a variety of uninformative reasons, such as portfolio diversification, to pay for their child's tuition and so on, but there is only one reason to buy - because they believe the stock is undervalued.

Over time, I've tried a variety of regression techniques to model the response to insider buying using features from the Form-4. Figure 01 shows how well several of them identify the most important filings. This shows that the highest-predicted 1% of insider buys has an average intraday response of more than 2% return. These results are estimated using 10-fold cross-validation, which should minimize the effect of over-fitting. Models with parameters underwent an additional 10-fold cross-validation step in an inner loop to estimate optimal parameter values. The best performing model is RegENDER. I did not tune its parameters, but instead used the recommended values from its authors' paper, which interested readers can find in the references section. PRIMER is one of my own algorithms, designed with an emphasis on model interpretability. It is a regression-rule learning system for intervention optimization.

Figure 02: Response to Insider Buying by Year

The market reaction to insider buying has become stronger and faster over time. The response is increasingly front-loaded, with very little effect remaining after the first few minutes. Figure 02 shows the intraday response to insider buying for filings that pass this filter:

  • Transaction code "P" for purchase
  • Direct beneficial ownership
  • Transaction dollar value > 30,000 USD
  • Market capitalization > 200 million USD

In general, the following attributes are associated with larger market reactions:

  • High dollar value of purchased shares
  • Small company market capitalization
  • Purchase was for direct beneficial ownership
  • No other recent insider purchases
  • Insider was an officer
Figure 02: Response to Insider Buying by Year


For about one year, back in 2006, I was the fastest Form-4 trader. My strategy was simply to buy stock as soon as possible after an insider buy was reported. I held my shares for three minutes before automatically beginning to sell portions over the next few minutes. I didn't hedge, and I traded as much as I thought each stock could handle. I used rule-based filters to identify profitable opportunities. I traded primarily mid-caps, because large-caps didn't reliably react to the filings and small-caps were too expensive to trade. I excluded the entire banking sector, because banks had a constant stream of insider buys that didn't affect the stock price. Finally, I only bought stocks that hadn't had another insider buy within the last month.

My setup was unsophisticated. During business hours, I hit the SEC's website as fast as I could with a single program thread, looking for new filings. When I found a new insider buy, I bought a predetermined number of shares using a market order. Had I used a limit order, I would have first needed to ask my broker for the current order book to set the number of shares and the limit price. That would have taken too much time. Instead, I ran a screen each morning for tickers with sufficient liquidity, and I used the last closing price to pre-determine how many shares to buy. Using market orders wasn't ideal, however. Sometimes my order was too small, and I left money on the table. Sometimes my order was too big, and I blew through the shares offered, getting a bad price.

Although some stocks showed no response to the news, I was able to made money nearly every day. The strategy was very much capacity constrained, which made it suitable only for a personal account. I made so many trades that my tax forms that year were about an inch thick.

My system was fast, but not by today's standards. Once, during a move, I even successfully ran the trading system on a netbook with a Verizon Wireless AirCard. The competition must have been equally unsophisticated. There was competition though, and I couldn't understand why they persisted in running the trade when they were only left with table scraps.


One day I wasn't the fastest anymore. It happened pretty quickly, where I suddenly started breaking even or losing money some days. I tried experimenting with ways to be faster, but then the SEC banned my IP address for banging on their website so much. I contacted them, and they kindly agreed to unblock me, but I had to stop polling so frequently.

There is a two-tiered system for disseminating filings. I had been using the free website, so I decided to try the paid dissemination system. At the time, it was a program called FastCopy, provided by Keane Federal Systems. It cost about 25,000 USD per year for the service, but they assured me it was faster than the website. So, I signed the contract, hoping to be able to recoup the cost and even improve my system. They told me at the time there were only a handful of subscribers: Bloomberg, Reuters, Dow Jones, and me. FastCopy only ran on Windows, and it would place all new filings in a folder. I wrote software to hook into the Windows event system to be notified of new filings as they arrived. Somehow, I still couldn't reliably make money. I scraped the SEC's website for one more day, just to compare arrival times.

The results were mixed: 54% of the filings could be downloaded from the website more quickly than FastCopy could deliver them. One filing came a full 48 seconds after I downloaded it from the SEC's website! Keane had only recently won the contract to disseminate the filings, and they didn't know too much about it. I was the first person to test their speed. I worked with their engineers for a while, showing them various charts. They had me run some additional tests and were able to make a little progress with the speed by changing some settings on their end. Eventually, though, I had to give up. As it was, I would still have to hit the SEC's website, just to know if the FastCopy filings were stale, or not. Keane let me cancel my contract, since they weren't fulfilling their promise of faster dissemination. The compliance officer at my place of work was happy to have me stop trading in my personal account.

So, that was the end of the trade. It lasted only about a year, but it was my favorite trading strategy ever. If you're interested, in April this year a new paper came out about the difference in paid/free dissemination speeds. It looks like there continues to be variability in the delivery times. I often wonder what techniques the current-fastest filing trader uses. How do they do it without essentially launching a DDOS attack on the SEC?


Dembczynski, K., Kotlowski W., Slowinski, R. (2008) Solving regression by learning an ensemble of decision rules , Artificial Intelligence and Soft Computing-ICAISC 2008, 533-544

Fidrmuc, J., Novák, J., Contreras, H. (2015) Do Insiders Trade on Mispricing After Earnings Announcements?

Harris, G., Panangadan, A., Prasanna, V. (2016) PRIMER - A Regression-Rule Learning System for Intervention Optimization , Proceedings of the 10th International Web Rule Symposium (RuleML 2016), Stony Brook, New York, forthcoming

Roger, J., Skinner, D., Zechman, S. (2016) Run EDGAR Run: SEC Dissemination in a High-Frequency World


SEC EDGAR System Tutorial


SEC EDGAR System Sample Document