(Not) obvious OSINT on Twitter

Twitter is a rather old microblogging service, which is still popular among a wide audience, which is actively used by both ordinary users and public figures. The best example is the official Twitter accounts of politicians, writers, musicians, and actors. Of course, such accounts are often maintained by “specially trained people,” but if we are talking about OSINT in the manufacturing sector, then monitoring the accounts of ordinary engineers or managers can give excellent results.

It is also important that to work effectively with an array of data from Twitter it is not necessary to process each tweet manually, or rush to the other extreme – connect to the API and write software yourself. Although the standard interface of the service gives us almost nothing in terms of search tools or filtering tools, while Twitter supports a huge variety of search queries and rules, which are not really mentioned in the standard web interface or application. It is with the use of these queries that it is worth starting.

Twitter Dorks

On Twitter, you can search for information by four key parameters: by content, by full tweet information, by type of media, and by user.

By “content” is meant the text itself or the keywords of the tweets. This category includes the following queries:

Search querySearch results
watching nowContains “watching” and “now”
"happy hour"Contains the exact phrase “happy hour”
love OR hateContains the value of “love” OR the value of “hate”
beer -rootContains “beer” but does not contain “root”
#haikuContains the Haiku Hashtag
$twtrContains a cache tag, used to track prices.
politics filter:safeTweets that contain “politics” will be deleted as inappropriate content
puppy url:twitterContains the word “puppy” and the URL in which there is a combination of “twitter” in any part of it
traffic ?Searches for tweets with the word traffic and a question mark

The category “information” includes both the category “content” and such data as the date of the tweet, geolocation, retweets and even such complex things as filtering by the number of retweets, replies and likes.

Search querySearch results
near:london within:15miPoint geolocation with radius in mi (miles) or km (kilometers)
superhero since:2015-12-21Contains “superhero” and sent later than the specified date “2015-12-21” in the format “year-month-day”
#brexit until:2016-06-23Contains “#brexit” and sent before the specified date “2016-06-23” in the format “year-month-day”
puppy source:tweetdeckContains “puppy” and was sent using “tweetdeck” (other source options: “twitter_for_iphone”, “twitter_for_android” and “twitter_web_client”)
puppy min_retweets:5Contains “puppy” and has a minimum of 5 retweets
puppy min_faves:10Contains “puppy” and has at least 10 likes
puppy min_replies:100Contains puppy and has at least 100 replays
puppy filter:nativeretweetsShows retweets that contain the word “puppy”
puppy filter:retweetsOld school filtering style for old retweets (RT) that contain the word “puppy”

In the “media” category, you can filter tweets by the content of media content, for example, search only tweets from a video or photo, or simply find all the posts that contain some external URL.

Search querySearch results
kitten filter:mediaContains “kitten” and any type of media (image or video)
kitten filter:native_videoContains “kitten” and tweeted video, or video from Periscope

kitten filter:periscopeContains “kitten” and a link to the video from Periscope
kitten filter:imagesContains “kitten” and any links that are defined as images
kitten filter:twimgContains “kitten” and an image uploaded to pic.twitter.com. Images can be one or more
kitten filter:linksContains “kitten” and any URL

With filtering by “user”, everything is more or less clear: this set of search functions allows you to filter records by accounts. At the same time, we can observe not specific tweets, but, for example, only for the responses of the monitoring object to specific users.

Search querySearch results
from:jackTweet sent from user @jack
list:NASA/space-tweetsTweet sent from NASA account to “NASA / space-tweets” sheet
to:NASAReplay other users to NASA account
@NASASearch for tweets that mention NASA
breaking filter:verifiedSearch for all tweets with the word “breaking” across all verified accounts

It is important to note that for all the above requests of the “filter: type” type, the rule of the “exclusive” request is valid through the use of the “-” symbol. For example, to get all the media from NASA except for images, you would enter a query like “from: NASA filter: media -filter: images”.

This whole set of search commands looks like the New Year in the middle of the summer, because it saves us from the need to parse with third-party tools, connecting to the API and other technical tricks and difficulties that are inherent in the search in public data.

In fact, Twitter itself allows us to take any information we are interested in, and according to a fairly wide range of parameters that we can combine with each other. Surprisingly, all the described search queries are obviously not documented anywhere. That is, Twitter allows you to subtly parse and search through the posts of your users, but this does not particularly concern. And it’s wonderful – that we have such an opportunity – because very often specialists suffer from the effect of “overqualification” and complicate their life where they could use Occam’s razor and find the most obvious and at the same time the most correct solution for their task.

Tweetdeck

A well-known application for filtering content by given parameters and categories is Tweetdeck. However, if an average user views Tweetdeck simply as another incarnation of the RSS feed, then in the case of targeted filtering of public data, it becomes a very powerful tool.

One of the most obvious scenarios for using Tweetdeck in the OSINT context is to reduce the noise level in a working sample of tweets. Out of the box, the service is able to create a blacklist of keywords and tweets containing them simply will not appear in your stream. If you are involved in OSINT and collect information from public sources, then such a function has a very obvious application.

Let’s take a conditional case: Stephen King’s Twitter account. Anyone who comes across a writer’s account knows that 30% of it consists of love for his dog, 60% of hatred of President Trump, and 10% of the tweets will drop the rest. If we need exactly the last 10% of records, then Tweetdeck will perfectly allow you to filter out most of the garbage that is not interesting to us and work with a more or less relevant selection. Let’s look at a piece of King’s tape:

We’ll filter out the Tweetdeck entries for the keywords “Trump”, “president”, “white house”, “Obama” to cut out most of the “politota” from our sample.

Actually, this is done elementarily. After a muta of tweets with keywords, the account we’ve observed has completely changed:

At the same time, Tweetdeck supports all the search queries that were discussed at the beginning of the article, which makes it a comfortable tool for finding information.

Now many of you have thought that with such a pool of supported search queries, it remains only to write software that will pull out tweets for you and collect them into reports, for example, into text documents, which can then be lazily flipped through while sipping fresh gulls. And here again the principle of “the simplest solution – the most correct.” After all, before you sit down to write software, you should open Google and he will answer us: you don’t need to write anything, such software already exists and is called Twitter Archiver.

Twitter Archiver

Twitter (Tweet) Archiver – extension for google tables, which collects data on our search queries from Twitter and uploads it all to Google Docs. The application itself clings to the add-ons of Google tables and allows you to upload the search results directly to the document.

At the time of creating the first search rule, Twitter Archiver will request a login to an existing account from which to work. That’s all. We search, parse, upload to an Excel table and analyze.

For example, we’ll tweet with the widest possible search parameters in order to show what the upload will look like:


Clickable

At the output, we get a table with the following columns: Date, Screen Name, Full Name, Tweet Text, Tweet ID, Link (s), Media, Location, Retweets, Favorites, App, Followers, Follows, Listed, Verfied, User Since, Location, Bio, Website, Timezone, Profile Image. Unfortunately, all this did not fit into the screenshot above, since it occupies two screens.

There is only one weakness in the tool presented: the free version of Twitter Archiver allows you to parse only by one rule and only once per hour. The paid version works without restrictions on the number of search rules and does an upload once every 15 minutes. True, it is not cheap: from $ 39 / year per user to $ 399 / year for an unlimited number of accounts.

Total

It is possible to collect information in a semi-automatic mode without any programming skills, just search queries from Twitter itself and the Tweetdeck application are enough. The bot for Google tables simplifies the systematization, but a full version of it costs money.

There is a more complicated way, which does not relate to the topic of the article as a simple solution, but it is necessary to mention it. This solution is getting Twitter API Access through an official request and the subsequent creation of their own tools. However, the private person has not too many chances for this: they say in profile forums that far from everyone gives access to the API and the percentage of rejection is very high.

But what if you want to protect your data, and not collect it? How to deal with data analysis on Twitter? Yes, in fact, nothing. This is a public microblogging network in which there are only two tools for protecting information: account blacklists and access restriction.

The first, obviously, will not save bots and parsing. The second is more effective, but then tweets will be available for viewing only to subscribers of the account. If we are talking about some kind of public account, then the second option does not apply, plus, an observer can always sneak into the list of readers if the account holder manually adds new followers.

So the only guaranteed way to reduce the likelihood of OSINT leaks via Twitter to an acceptable level is to not use Twitter.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *