Being an AI enthusiast, my plan heading into the weekend was to try out the amazing new ChatGPT capabilities OpenAI announced early last week:
- OpenAI finally allows ChatGPT complete internet access (see here).
- DALL·E 3 is now available via ChatGPT Plus and Enterprise (see here).
Then Friday happened. Okta disclosed the unfortunate unauthorized access of their support system (see here).
So, I spent a few Saturday hours trying to use ChatGPT’s new features to see what it can tell me about Okta’s unauthorized access and how much I can dig about it, if I am limited to ChatGPT interface ONLY.
I recorded the sessions, so this blog will contain some screen video and screenshots, along with steps you can try yourself. To do so, you will need a paid ChatGPT Plus or Enterprise account though, since the above features are not in the free public account.
RAG takes LLMs to information riches
GPT-4 was trained in January 2022, so it was pretty useless by itself regarding the Okta news. But with the new feature from last Tuesday, ‘Browse with Bing,’ it was able to provide some useful data. Internet access doesn’t suddenly mean that the model is up-to-date with current information. It simply means that it can access specific items it gets pointed to and can learn from that new information. The technique is called Retrieval Augmented Generation (RAG). Let’s enable the capability exposed via ‘Browse with Bing’:
Getting ChatGPT to process Okta’s unauthorized access news
After enabling the above beta feature, some simple questioning was all it took. ChatGPT shows processing steps as it loads websites, reads them, and learns from them. Here is a small video (note that ChatGPT had more browsing and processing delays which I have cut out):
Getting it to show the IOCs reported by Okta
As you see in the above video, ChatGPT browsed and processed the Okta Blog where the IOCs are reported and shows the IOCs. Pretty impressive! (Note: screenshot is curtailed for length as there are more IOCs than shown below.)
Can it search for these IOCs in other sources?
Apparently not. (Query can!)
I tried to see if it can read all the CISA advisories and tell me if it saw any of these IOCs in any other advisory. Unfortunately, that didn’t really work out.
Getting it to enrich the IOCs
The next step for the analyst in me was to see what I can find out about the above IPs while constrained in the ChatGPT interface. I wasn’t getting anywhere with it until I found a somewhat useful plugin from ne.tools that could do WHOIS queries. Unfortunately, you can’t use a plugin in the same session you are browsing, which seems to be a big limitation. Maybe it will get addressed by ChatGPT later. So I had to manually extract and copy paste the IPs:
The plugin above gave me the geolocation information (curtailed for length):
Getting to a map from the above Geolocation
Analysts want to know who owns those IPs and where they are coming from. So I want to plot the above in a map. But let’s first convert the data into a more usable format:
Since my exercise was to stay constrained within the ChatGPT interface, I thought it would be good to give the newly added Dall-E 3 integration a try! Well, since you can only enable one of the Beta features at a time, again I had to start with a new session to switch the feature:
DALL-E 3 is good, but didn’t map it correctly
Further from here
I am sure that some of you can take it way further and get better results with better prompts, so do let me know whether you tried the above sequence and how far you got. I will say that today I got farther than I could last time with ChatGPT. Just a few months back, it was very limited. See my prior blog on an attempted log analysis here – Can ChatGPT help query my cybersecurity events data?
Will LLMs play a role in the security investigation process?
Cybersecurity companies are scrambling to include LLMs in their toolset. There are several risks — data leak, data privacy, and also integrity of data modified by LLM. This reminds me of the guiding model of the CIA triad (Confidentiality, Integrity, and Availability) the first two of which would be very hard to preserve if LLMs sit in between data and humans.
For now, LLMs are certainly good to experiment with while looking at public data, but be extremely careful at trusting any output. Any use outside of a testing lab environment would be dangerous.
Maybe chatting with LLM will become a standard process in the near future. AI chat is something that we at Query had built in our V1 platform, but we found that analysts preferred a more reliable and visually interactive federated search instead. For the above example, you can just paste the IOCs in Query’s federated search interface, and it searches all internal and external data sources then shows normalized and correlated results visually in a graph — with the comfort that your data and model is private to you.