When Twitter Likes become your Bookmarks
I confess, I’ve been (mis)using Twitter’s famous Like button since joining this great platform. Thus, if you look at my Twitter Likes you’ll mainly find tweets about distributed systems, Go and Kubernetes.
Yes, I know it’s wrong! But looks like I am not the only one going down this dark path :) Unfortunately, if you have hundreds or thousands of bookmarks, ehm Likes, finding the one you desperately need is almost impossible due to the way the twitter.com UI is designed.
I am not using any native Twitter client on OSX. In case your favorite client (Tweetbot?) already provides that functionality for you, including regular expression support and querying advanced Twitter API fields, nice! You might want to stop reading here. But in case you’re interested in a way to do that programmatically on the command line (CLI) of your operating system of choice with full flexibility and JSON support, read on. You might also learn some neat JSON filters along the way :)
Typically, I need to perform these actions when searching through my Twitter “bookmarks”:
- Filter tweets for a specific string, e.g.
"Kubernetes"
- Look for tweets containing “A” and “B”; order and capitals should not matter, e.g.
"...best practices...Golang"
or"Go...Best Practices"
- Filter tweets containing only Youtube videos of a specific topic or from a specific user; this requires access to some extended Twitter API fields because links in tweets are typically shortened to
https://t.co/...
This and much more can be done with the powerful combination of two CLI tools: tw
and jq
.
tw is a tool I created to easily access the Twitter API from your CLI. At the time of writing this post, only Likes are supported. jq is a lightweight and flexible command-line JSON processor.
Of course, you can also use grep
or similar tools to just filter for a specific string. tw
has a --pretty
option which generates a compact summary of your Twitter Likes (Author, Text, Link) which makes filtering with grep
easy. For advanced queries, tw
’s JSON output from the Twitter API and a processor like jq
is the way to go. Scott S. Lowe provides some alternatives to jq
in this post.
Getting the tools
tw
is a statically compiled Go binary which you can easily install on Windows, Linux and OSX. You can either grab a release, use Homebrew to install it on OSX, use Docker or build from source (see README in the Github repository).
With Homebrew: brew install embano1/tw/tw
With Docker: docker pull embano1/tw
The Docker image contains jq
. If you don’t want to use the Docker image, please install jq
as well to follow along with the examples. You can grab a release here or, e.g. for OSX, install via Homebrew: brew install jq
Accessing the Twitter API
In order to use Twitter’s API one has to authenticate. This is to protect the Twitter service.
As such, tw
needs credentials before it can make requests against the Twitter API. A step-by-step guide is provided in the README section. Please make sure to set this up before proceeding.
(Waiting…)
Ok, welcome back! Let’s show the power of both tools in action.
Examples
We’ll use the Docker image for our examples. The examples assume that you stored your Twitter API credentials in the file ~/auth.json
.
Let’s test if everything is working.
|
|
Thumbs up!
If tw
complains about missing/incorrect credentials, make sure you correctly followed the section “Accessing the Twitter API” in this post. Depending on the number of Likes it might take a while to query the API.
Note: The Twitter API uses rate limiting to protect against (D)DoS attacks. To prevent throttling or timeouts, I recommend dumping your Tweets into a file first and use that for filtering. It’s also much faster, especially when querying thousands of Likes.
Let’s move on with some concrete examples I use every day. We continue to use our Docker container session which started for our test above.
#1 Only print Tweets containing “String” (ignore case)
First, dump all Tweets into a file.
|
|
Since we want to do a case-insensitive search, we cannot use the contains
function in jq
. Fortunately, jq
supports regular expressions, regex for short, with the match
function.
|
|
Breaking down this command:
.[]
# query all array members in the JSON file|
# pipe the left object(s) to the next command, in our case the whole arrayselect(.full_text|match
# use object fieldfull_text
and pipe it to the regex filter (match
)match("handy";"i")
# filter for the string"handy"
(would also match the word “unhandy” for example), ignore case with;"i"
|.full_text
# pipe the result to next command; if there is a match, only output thefull_text
field
If you want to search for the exact word, e.g. excluding “unhandy” from results:
|
|
#2 Only find Tweets with these “two” “Words” (ignore case and order)
Sometimes you want to search for two (or more) words, but don’t know the order. For example, I typically search for “best practices” in combination with a technology, e.g. “Docker” or “Kubernetes”. Regular expressions with lookahead conditions are what we need here.
Let’s filter for “Go” respectively “Golang” related tweets also mentioning “production”, “practice” or “idiomatic”. Again, this regex would also match “practices” since we don’t ask for exact word matching here (not using \b
word boundary checker in regex).
|
|
Breaking down this command:
.[]| select( .full_text |
# we want to apply our filter tofull_text
entries from our tweetsmatch()
# we want regex"(?=.*(\\bGo\\b|golang))(?=.*(production|practic|idiomatic))"
# lookahead syntax for regex; simply speaking order of our two filters (somethingGo OR Golang
andproduction OR practice OR idiomatic
) does not matter- Note that the
|
in regex queries means logic OR and not pipe to next filter ;"i"))|.text
# case-insensive regex; if there is a match, only print tweettext
#3 Only print “Field(s)” we’re interested in
Sometimes all you want is a simple Tweet representation, e.g. by User (incl. full name), Tweet and Link(s), if any. If tw
’s --pretty
option is not enough, using jq
arrays and maps is the right approach. It also keeps the JSON format, so you can do further filtering on the array/map.
|
|
Breaking down this command:
- Outermost
[...]
# create one resulting array for the output of the map/filter specified .[]|{user: .user.screen_name,...
# for each input object create a new map mapping the specified input fields to custom ones (left side of:
)url: [.entities.urls[].expanded_url]}
#.urls[]
could be empty, one or multiple, thus creating an array (with[]
) for the mapped entryurl:
#4 Filter for Tweets with Links to Youtube videos
There’s always an amazing Go or distributed systems talk which I bookmark to watch later, e.g. on these long flights to the US. Filtering for Youtube tweets is not that easy without JSON because Twitter shortens links (t.co...
).
Therefore we need to access an extended JSON field from our tweets (.entities.urls[].expanded_url
).
|
|
We filter whether .entities.urls[].expanded_url
contains the string youtube
(case-sensitive, thus contains
function works here) and pipe that into a map only containing the tweet text and url(s). The outermost brackets will create an array for us. So we could for example only print the first three items with a final filter: |.[:3]
before closing the jq
filter/query with '
.
I hope you enjoyed this post. Yes, jq
filters can be complex and hard to use, especially when not using them on a daily basis. For that reason, I maintain a Gist of filters I use regularly for tw
here.