Chartmetric’s artist pronoun database is a major step forward for gender equity

Guest post by Michelle Yuen of  Chartmetric.

Empowering the music industry through the thoughtful application of data is one of our core values at Chartmetric. Our aim is to make data strategically helpful for creatives — like through our recently updated Cross-Platform Performance (CPP) metric, for example — and ensure that our figures and analyses measure up to the creatives they represent. For the past year, that’s meant focusing our content on Black representation in the music industry and following through on several actions to continue to promote racial equity. Today, we continue to champion representation from an equally important perspective — through gender equity.

Why to Pursue Socially Impactful Metadata

Gender is a sensitive and complex area to address. There are a number of reasons for this, chief among them issues regarding labeling and the continued evolution of more gender identities outside of the binary historically assigned at birth. These intricacies have not been properly addressed in terms of data classification, as traditional categorizations of gender data are almost exclusively binary, leading to a lack of representation and inclusion.

These are several of the many complexities we faced when we first explored the possibility of forming our own gender metadata set for artists. As a company that operates in male-dominated sectors of music, data, and technology, we have also had our own internal conversations about how we can consciously do our part to diversify, support, and include more equally talented yet underrepresented persons.

Our vision for this database is to create a platform for gender equity, where record labels, publishers, brands, booking agencies, sync teams, and more can balance their rosters, shows, and other content to uplift voices that are less heard but no less important.

We are always working to make data an empowering and equalizing tool for all and we hope to reach a new level of awareness about how the industry can further progress towards gender equity.

How We Are Building Socially Impactful Metadata

Once we committed to creating this gender metadata set, we started conducting research and reaching out to industry leaders to see if anyone had previously investigated this area. We did not, and still do not, claim to be experts of any kind when it comes to gender. In fact, we encourage this to be an open yet safe and respectful conversation, both in the construction of our metadata set and in the resulting analysis and application of this information. Unsurprisingly, we quickly found that there are few publicly available gender metadata sets for artists. Musicbrainz is probably the most comprehensive and well-known database, yet only around half of the solo artists in this set are gender-differentiated. This meant that, in addition to taking the sensitivities surrounding gender into account, we also had the challenge of essentially creating an entire database from scratch.

The amount of time and effort required to build such a database from square one is considerable. We contemplated trying to gather all of this data directly, such as via surveying and requesting self-reporting, but the overall process would still be highly manual, and we were aiming to have something that would be fairly comprehensive, reliable, and immediately scalable in the near future.

This is when we turned to pronouns. Just like gender, pronouns have their own sensitivities and complexities, not least of which is that they don’t always correspond to gender and should not be pre-assigned. However, artists’ pronoun data do, to a certain extent, exist in the public domain via biographies, interviews, and other such accounts. In particular, artist biographies on streaming platforms are often written by artists themselves, by their team, or are compiled through extensive research. Short of asking each artist individually, this was the closest we could get to pronoun data for artists, and it was a step toward our aim to create a database for gender equity.

———

Check out these articles too:

Best practices: Picking your music release date

25 very creative music merch ideas

5 ways to use music as a storytelling tool

Mental health in music: how/when to prioritize yourself

———

Finding the Right Words

Using bios to determine artists’ pronouns still had its own challenges. We had no concrete way of verifying if these accounts were specifically approved by each artist. What we could do was manually create our own shortlist of pronoun-differentiated artists and cross-check this against our automated list to see how confident we could be in the collected data. This led to researching all the pronouns currently available in the English language, including, but not limited to, he/she/they and neopronouns like ze/xe/e, and compiling a manual list of 1K pronoun-differentiated artists. We then ran our automated pronoun script against this manual compilation to determine a confidence level for our methodology.

Pronouns currently used in the English language (as of July 2021):

  • She/her/hers: often associated culturally* with the female gender

  • He/him/his: often associated culturally* with the male gender

  • They/them/theirs: often used in the plural sense to refer to groups of people, this pronoun is also commonly* used singularly to refer to non-binary individuals or those who have not yet expressed particular pronouns

  • Neopronouns such as Ze/zir, ze/hir, xe/xem, e/em, ae/aer, fae/faer, it, per, etc.: These are just a few examples of neopronouns, which have existed since the 1800s and are often associated culturally* with non-binary identities (e.g., transgender, genderfluid, etc.)

  • Multiple, other, or custom pronouns**: Some people use multiple specific pronouns (e.g. he/she), others are comfortable with any pronouns or none at all, and still others only use their name or custom pronouns (e.g., v)

*While pronouns tend to be associated with gender in many cultures, the pronouns that a person uses do not necessarily equate to, or give any indication of, their gender identity.

**There are many different types because pronouns are self-appointed identifications.

Language Can Be Muy Dificil

At this point, we ran into another roadblock. Many artists had non-English bios and so those were being incorrectly classified. We aim to make this dataset as comprehensive as we can, but determining pronouns in all of the languages around the world is neither a small nor simple task. We wanted to produce something that is both reliable and immediately scalable, so we decided to separate out all the bios in non-English languages (180K+, via the Python package langdetect) so that we could at least apply the automated script we did have to the majority (589K+) of the artist bios available to us.

Being very aware, however, that our vision for this database is equity, we didn’t want to just leave out those artists with non-English bios. Instead, we manually put together another pronoun-differentiated list, this time of the top 1.3K+ artists for the Top 10 non-English languages most commonly used in artists’ bios. This is only the first iteration of this dataset, but it is still a real opportunity for us to contribute to the wider dialogue on social equity, as data can also be a compelling force for representation and diversification.

Having addressed this language barrier, we took another look at our English pronoun script. A second cross-check of this automated list against our manual 1K list showed us that other than artists having bios in different languages, there are also many artists that either don’t have any pronouns in their bios or don’t have bios at all. After removing these artists from our analysis, a third and final cross-check of our automated findings against the manual list resulted in a confidence level greater than 90 percent. Out of the 1K artists, only 40 were matched with incorrect pronouns.

Checking Our Homework

Overall, this initial phase of our pronoun metadata set is a combination of automated analysis and manual corrections and now includes 454,445 pronoun-differentiated artists. We haven’t forgotten that pronouns do not necessarily reflect gender for all people though, and that the vision of this dataset is to be a platform for gender equity. We just have yet to find a relatively comprehensive, reliable, respectful, and immediately scalable way to determine gender.

American pop singer Demi Lovato shared her new gender identity in May 2021.

In the meantime, we have still put together a separate 2.5K+ list of gender-differentiated artists. This dataset is much smaller than it is for pronouns because it is mainly shortlisted to the top artists in the world, who are much more likely to have public information on gender. As such, it is entirely based upon manual research. We also included 164 artists that have publicly identified themselves as non-binary or otherwise to really do our due diligence to include all gender identities. We will continue to research and update this list respectfully.

Gender identities (as of July 2021):

  • Female: includes women, cisgender women, transgender women, intersex women, girls

  • Male: includes men, cisgender men, transgender men, intersex men, boys

  • Non-binary/transgender/other*: non-binary often refers to any gender identity that does not strictly fall within the traditional Western male-female gender binary; transgender often refers to a gender identity that is different from the one assigned at birth

  • These can include transgender man, transgender woman, trans-feminine, trans-masculine, non-binary, genderqueer, genderfluid, gender non-conforming, two-spirit, agender, androgynous, etc.*

  • Multiple genders: some people identify with multiple genders, either simultaneously or shifting between different gender identities at different points in time

*There are many different gender identities so this list is dynamic and constantly developing.

Anyone Want to Work on Socially Impactful Metadata With Us?

Ideally, the best way to construct these pronoun and gender metadata sets would be to ask each artist to self-report. By extracting the data from 454K+ artists’ bios, we aim to kickstart this process. There are 4.8M+ artists in Chartmetric to date, many of whom are emerging and developing artists who have less public information available about them. As gender equity, diversity, and inclusivity is a social conversation, we invite artists and their teams to help expand our pronoun and gender metadata sets. With a confidence level of more than 90 percent in our automated pronoun dataset, we have tried to minimize any errors as much as possible, but we know there may still be inconsistencies. Our aim is to work together with our users to get the most comprehensive and reliable picture of this side of the music industry today, so we welcome corrections and updates as long as they are accurate and respectful to the artist in question. We only display public data and we want this conversation to be safe for everyone. As with any requests for data updates, we will do our due diligence and we just ask that users are equally respectful in both updating and applying this data for their own use.

Just as genders and pronouns continue to evolve, our datasets are also not set in stone. Our vision for creating this database is to help advance gender equity, and we hope this starting point and its future evolutions further open up this discourse. We firmly believe that there is space for everyone to be included and represented, and data can be especially powerful for understanding and helping to uncover and obtain the opportunities we each need to succeed equally.