/1

Corpuses — large collections of language data — are very useful for writing and editing language-learning materials. This presentation explores how and why, with some specific examples...

#Tibetan #language #education #corpus #linguistics #materials #development
/2

One key is 'frequency lists'. Frequency lists are lists of words based on how frequently they appear in a language. It tells us if any given word is common or rare.
/3

Frequent words are useful. People use them again and again — meaning learners are exposed to them frequently naturally, too. And, the learner has opportunities to use them again and again — giving practice in production.
/4

How are frequency lists made? They're made by collecting large quantities of everyday language. We then use computers to count each and every word, and how many times they each appear.
/5

An example of the top 10 most-frequent words... Of note here are words like ཨ་ནས and ར (རེད་བ).

These don't generally appear in language-learning materials! It is used all the time, so it is useful for learners to learn it... But no one is teaching it.

That's a problem.
/6

While we can use frequency lists manually like this, we can also leverage them to help us grade texts automatically.

Here, we use the software tool Dakje to highlight non-level words. Our textbook material can be graded by identifying difficult (infrequent) words!
/7

Here's an example: The textbook writers have included words that are infrequent!

We can replace these with frequent words that are more useful for the beginners!
/8

Without analysis like this, we are at risk of making intuitive choices. Relying on intuition, however, can lead us to make value judgments that don't reflect reality.

For example, we might base our grammar or communicative goals on western norms ...
/9

...rather than being based on how Tibetan is actually used.

We might be tempted to teach grammatical persons in Tibetan — even though this isn't a feature of the native grammar. Or, we may teach "How are you?" as an informal greeting, while passing over "Where ya headed?"
/10

We might also give priority to prestigious language forms, while ignoring really common ones! For example, we might think teaching honorifics are super important, or that literary or Buddhist terms are more important than common ones.
/11

Why is this problematic?

Especially for the beginner, trying to learn these infrequent language forms first is actually making things more difficult than they need to be...
/12

Infrequent words won't be reinforced outside the classroom; the learner won't have the opportunity to use them; they won't be understood if they try, which is discouraging; all of which means they won't absorb and remember it, and will have to learn it again anyway!
/13

That means a prestige form like 'rogs-gnang' takes priority in textbooks. This is problematic b/c it means frequent forms are either de-prioritized, or not taught at all.
/14

'rogs-gnang' was found in all the textbooks we looked at. But super frequent grammar, like the present-continuous V + བསྡད + ཡོད, was missing in many — despite the fact that it's 25x times more frequent than making a formal request!
/15

So let's say we decide teaching the present continuous is important. What 'V' verb do we teach? Just any verb?

It turns out this is another way we can use corpuses. We can do a collocate search, and see which verbs come frequently with the formula ...
/16

...Here, some good choices would be འགྲོ / བསྡད / བཤད / བྱས / ཟ / བཟོ / འཐུང་. We might do best to fit it into the curriculum with topics like 'directions' (since 'going' and 'staying' are common collocates) or 'eating out' (since 'eating' and 'drinking' also both appear)!
/17

We can also use concordancing to find sample sentences. These are specific ways that the words are used together, and can help us build authentic speech into our dialogs and textbook materials.
/18

Another way we can use corpus tools is to analyze levels of meaning.

In general, words have multiple uses: For example, 'bat' is an animal, an object for hitting baseballs, as well as a verb.

If we're teaching a word, which meaning is most frequent is also important!
/19

For a Tibetan example, we have the word ཆགས. This word has multiple levels of meaning, including 'to become', 'to be situated', and 'to crave'.

Given that ཆགས is a frequent word, which version of the word is the frequent meaning?
/20

In popular dictionaries for Tibetan — unlike English — our levels of meaning don't actually indicate frequency.

A translation dictionary like Rangjung Yeshe doesn't distinguish levels of meaning at all; Monlam does, and puts 'to crave' in the top spot.
/21

But when we take a look at our corpus data, we find that the common usage is 'to become'. It's clear Monlam has probably prioritized the meaning 'to crave' due to it's religious connections — its use in Buddhist texts.
/22

So sentences like "Where's this place situated?" aren't the frequent way to use ཆགས་. We risk students forgetting the word, or, getting confused when they hear it used as 'to become' but thinking it means 'to be situated'!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with esukhiaX ༼ཁྱད་ལས་སྡེ་ཚན༽

esukhiaX ༼ཁྱད་ལས་སྡེ་ཚན༽ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(