Bluesky Social is a decentralized social network built on the AT
Protocol. For users, this means it is possible to programmatically
access a broad range of public data from the network. Bluesky organizes
information as a series of records, each defined by a
structured schema known as a lexicon. These lexicons
(e.g., app.bsky.*) specify the types of data (such as
posts, profiles, or social relationships) and the methods for querying
them. The bskyr package provides an R interface to these
endpoints, abstracting the underlying API and returning data in a tidy
format suitable for analysis.
This vignette provides an overview of how to collect public data from
Bluesky using the bskyr package. It focuses on common data
collection tasks such as retrieving user profiles, gathering posts,
exploring threads, accessing social relationships, and examining likes
or reposts.
Begin by loading the bskyr package:
Before gathering data, you need to authenticate with Bluesky. (This
requires having a Bluesky account and creating an App Password in your
account settings.) In this vignette, we assume you have already set your
Bluesky username and App Password via set_bluesky_user()
and set_bluesky_pass() (or by storing them in your
.Renviron). If so, you can create an authenticated session with:.
This authentication is used internally by most functions, though explicitly creating a session at the beginning of your script is recommended to avoid repeated logins. Note that this is not necessary to do for every function call, as your authentication is automatically cached for the session and refreshed if it gets stale.
Bluesky’s API distinguishes between data you can access about your
own account versus public data about others. A few endpoints only allow
fetching your personal data (for managing your account) and not other
users’. For example, you can retrieve your preferences with
bs_get_preferences(), but you cannot retrieve another
user’s preferences. Similarly, you can only get your list of blocks or
mutes, and your notifications. .
For instance, to fetch your current preference settings (such as content filtering preferences):
bs_get_preferences()
#> # A tibble: 3 × 2
#> `$type` details
#> <chr> <list>
#> 1 app.bsky.actor.defs#savedFeedsPrefV2 <tibble [1 × 1]>
#> 2 app.bsky.actor.defs#savedFeedsPref <tibble [1 × 4]>
#> 3 app.bsky.actor.defs#bskyAppStatePref <tibble [1 × 1]>Other self-focused functions include retrieving your blocked accounts
(bs_get_blocks()), your muted lists
(bs_get_muted_lists()), and your notifications
(bs_get_notifications()). These functions correspond to
lexicons in the app.bsky.notification.* or
app.bsky.graph.* namespaces, and they only return data for
the authenticated user. .
In the remainder of this vignette, we’ll focus on gathering public data about other users and content on Bluesky. All such data is accessible via the Bluesky API (with your authentication), provided the information is public.
Most bskyr functions are designed to gather data about
other users or content they’ve created. We will demonstrate these using
a sample account: chriskenny.bsky.social. (This is the
Bluesky handle of the bskyr package author, Christopher T.
Kenny.) Using this account as an example, we’ll show how to retrieve
various types of information:.
Each subsection below focuses on one of these data types, with example code and notes on the relevant lexicons.
A basic starting point is retrieving a user’s profile. The
bs_get_profile() function queries the
app.bsky.actor.getProfile lexicon to fetch profile details
for one or more users. For example, to get the profile of
chriskenny.bsky.social, use
bs_get_profile().
profile <- bs_get_profile('chriskenny.bsky.social')
profile
#> # A tibble: 1 × 13
#> did handle display_name avatar associated viewer created_at description
#> <chr> <chr> <chr> <chr> <list> <list> <chr> <chr>
#> 1 did:plc… chris… Chris Kenny… https… <tibble> <tibble> 2023-09-1… "Postdoc, …
#> # ℹ 5 more variables: indexed_at <chr>, banner <chr>, followers_count <int>,
#> # follows_count <int>, posts_count <int>This returns metadata such as the user’s handle, display name, description, and follower counts. To retrieve multiple profiles:.
bs_get_profile(actors = c('chriskenny.bsky.social', 'simko.bsky.social'))
#> # A tibble: 2 × 14
#> did handle display_name avatar associated viewer created_at description
#> <chr> <chr> <chr> <chr> <list> <list> <chr> <chr>
#> 1 did:plc… chris… Chris Kenny… https… <tibble> <tibble> 2023-09-1… "Postdoc, …
#> 2 did:plc… simko… Tyler Simko https… <tibble> <tibble> 2023-09-3… "State & l…
#> # ℹ 6 more variables: indexed_at <chr>, banner <chr>, followers_count <int>,
#> # follows_count <int>, posts_count <int>, labels <list>To access posts authored by a user, use
bs_get_author_feed(), which queries the
app.bsky.feed.getAuthorFeed lexicon:
feed <- bs_get_author_feed('chriskenny.bsky.social')
feed |>
dplyr::select(uri, like_count, reply_count)
#> # A tibble: 49 × 3
#> uri like_count reply_count
#> <chr> <int> <int>
#> 1 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 4 0
#> 2 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 1 0
#> 3 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 2 0
#> 4 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 0 0
#> 5 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 13 2
#> 6 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 3 0
#> 7 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 1 0
#> 8 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 3 1
#> 9 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 0 0
#> 10 at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.p… 2 1
#> # ℹ 39 more rowsEach row in the feed tibble represents a post made by the user. Key
columns include the post content (text), the number of
likes and replies, and potentially other metadata like repost count,
timestamps (created_at), and unique post IDs
(uri and cid). By default,
bs_get_author_feed() returns the most recent batch of posts
(the Bluesky API typically returns up to 50 at a time). .
If the user has more posts, the cursor returned can be
used to paginate:
Social interactions on Bluesky often happen in threads: a post and
its replies (and replies to those replies, and so on). The
bskyr package provides tools to retrieve entire
conversation threads so you can analyze the context of a post. .
To get a full thread for a particular post, use bs_get_post_thread(). This function calls the app.bsky.feed.getPostThread lexicon, which returns the specified post, its ancestors (if it was a reply itself), and all of its replies (and replies to those replies, up to a specified depth). In other words, it fetches the whole conversation tree. .
Suppose we have a specific post by me that we want to examine in context. If we know the post’s URI or URL, we can retrieve the thread. For example, using a post URL (as you might copy from the Bluesky app or web):.
thread <- bs_get_post_thread('at://did:plc:ic6zqvuw5ulmfpjiwnhsr2ns/app.bsky.feed.post/3k7qmjev5lr2s')
thread
#> # A tibble: 1 × 22
#> `$type` uri cid author_did author_handle author_display_name author_avatar
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 app.bs… at:/… bafy… did:plc:i… soubhikbarar… Soubhik Barari https://cdn.…
#> # ℹ 15 more variables: author_associated <list>, author_viewer_muted <lgl>,
#> # author_viewer_blocked_by <lgl>, author_labels <lgl>,
#> # author_created_at <chr>, record <list>, embed <list>, bookmark_count <int>,
#> # reply_count <int>, repost_count <int>, like_count <int>, quote_count <int>,
#> # indexed_at <chr>, viewer <list>, labels <lgl>This returns the conversation context around a specific post. If the
post URI is known, bs_get_posts() can be used to retrieve
it directly:.
post <- bs_get_posts('https://bsky.app/profile/chriskenny.bsky.social/post/3loagm2phgk2t')
post$record[[1]] |>
dplyr::select(`$type`, text, created_at)
#> # A tibble: 1 × 3
#> `$type` text created_at
#> <chr> <chr> <chr>
#> 1 app.bsky.feed.post #rstats `bskyr` 0.3.0 is now on CRAN. Changelog… 2025-05-0…Finally, let’s look at how to gather data on post
interactions like likes (and similarly, reposts). Bluesky
treats likes and reposts as separate record types, and the API provides
endpoints to query them. Posts liked by a user: To fetch the posts that
a given user has liked (the content of their “Likes” tab on their
profile), use bs_get_likes(). This function hits the
app.bsky.feed.getActorLikes lexicon and returns a
tibble of posts. Essentially, it’s the reverse of
bs_get_author_feed(), instead of posts the user authored,
it gives posts the user liked. For example:.
liked_posts <- bs_get_likes('bskyr.bsky.social')
liked_posts |>
dplyr::select(author_handle, record_text)
#> # A tibble: 3 × 2
#> author_handle record_text
#> <chr> <chr>
#> 1 bskyr.bsky.social "[vignette] Posting via bs_create_record()"
#> 2 chriskenny.bsky.social "Big quality-of-life update for the #rstats package `b…
#> 3 bskyr.bsky.social "Test quoting from r package `bskyr` via @bskyr.bsky.s…Each row here is a post that was liked by
chriskenny.bsky.social, showing who the author of that post
is (author.handle) and the post content (text). The full data frame
would also include details like the post’s URI, timestamps, and
engagement counts. This is a great way to retrieve a user’s liked
content for further analysis (for instance, to see what topics or users
they engage with). .
Who liked a specific post: Conversely, if you want to see
which users have liked a particular post, you can use
bs_get_post_likes(). This corresponds to the
app.bsky.feed.getLikes lexicon and returns a tibble of
actors. You need to specify the post by its URI or a bsky.app URL. For
example, if we want to find out who liked one of my posts (identified by
a known URI):.
bs_get_post_likes('at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.post/3lnghukd7vk22')
#> # A tibble: 7 × 19
#> actor_did actor_handle actor_display_name actor_avatar actor_associated_act…¹
#> <chr> <chr> <chr> <chr> <chr>
#> 1 did:plc:z… smachlis.bs… Sharon Machlis https://cdn… followers
#> 2 did:plc:6… transport-t… Dr. U https://cdn… followers
#> 3 did:plc:e… radovanmile… Radovan Miletić https://cdn… followers
#> 4 did:plc:k… benfigura.b… Benjamin J. Figur… https://cdn… followers
#> 5 did:plc:r… fitipaldi.c… agustin https://cdn… followers
#> 6 did:plc:b… fbrady.bsky… fbrady https://cdn… followers
#> 7 did:plc:e… bbolker.bsk… Ben Bolker https://cdn… followers
#> # ℹ abbreviated name:
#> # ¹actor_associated_activity_subscription.allowsubscriptions
#> # ℹ 14 more variables: actor_viewer_muted <chr>, actor_viewer_blocked_by <chr>,
#> # actor_created_at <chr>, actor_description <chr>, actor_indexed_at <chr>,
#> # created_at <chr>, indexed_at <chr>,
#> # actor_associated_chat.allow_incoming <chr>, actor_labels_src <chr>,
#> # actor_labels_uri <chr>, actor_labels_cid <chr>, actor_labels_val <chr>, …This would return a tibble of users (handles, display
names, etc.) who have liked that post. It’s similar in structure to the
followers list we saw earlier. By examining this, you could see the
audience engaging with a particular piece of content. Reposts: Bluesky
also has a concept of “reposts” (analogous to retweets), where a user
rebroadcasts someone else’s post. To get the users who reposted a given
post, you can use bs_get_reposts(), which calls the
app.bsky.feed.getRepostedBy endpoint. Its usage is just
like bs_get_post_likes(), you provide a post URI or URL,
and it returns a tibble of users who reposted that post. For brevity, we
won’t show a full example, but it works in much the same way.
bs_get_reposts('at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.post/3lnghukd7vk22')
#> # A tibble: 1 × 12
#> did handle display_name avatar associated_activity_…¹ viewer_muted
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 did:plc:aqr5h6… econm… econmaett https… followers FALSE
#> # ℹ abbreviated name: ¹associated_activity_subscription_allowsubscriptions
#> # ℹ 6 more variables: viewer_blocked_by <chr>, viewer_followed_by <chr>,
#> # created_at <chr>, description <chr>, indexed_at <chr>, uri <chr>would give the list of accounts that reposted the specified post.
By combining these functions, you can gather a rich set of data from Bluesky Social. For instance, you might collect a user’s profile and posts, then fetch all replies to those posts, the list of users who follow them, and the posts they’ve liked. This opens up many possibilities for analysis: social network analysis (using follows data), content analysis (using posts and threads), and engagement analysis (using likes and reposts).
In this vignette, we demonstrated how to use bskyr to
gather various types of public data from the Bluesky Social network. We
covered retrieving user profiles, fetching a user’s posts, exploring
conversation threads, listing followers and follows, and accessing likes
(and other interactions). All of these tasks are made possible by
Bluesky’s open AT Protocol and its lexicon-defined API endpoints, which
bskyr conveniently wraps in R functions. .
With these tools, R users can treat Bluesky as a data source for
research or development. You can pull data on social connections, user
behavior, and content trends from Bluesky and then apply the vast
ecosystem of R packages for data analysis and visualization. As the
Bluesky network grows and its API evolves, bskyr will aim
to keep up, providing an efficient bridge between the AT Protocol and
R.
DISCLAIMER: This vignette has been written with help from ChatGPT 4o. It has been reviewed for correctness and edited for clarity by the package author. Please note any issues at https://github.com/christopherkenny/bskyr/issues.