@fclc@mast.hpc.social 🐘 Profile picture
Oct 19, 2022 7 tweets 4 min read Read on X
Something a little different today:
A blog post on the curious case of Alder Lake, the quest for reduced precision on x86, preparing HPC FOSS libraries and documentations, and how a certain vendor made it explicitly harder to support their own hardware.

gist.github.com/FCLC/56e4b3f4a…
This is an abridged recounting of months of works which is still ongoing to this day.

Writing this up stemmed from a tangent with @owainkenway about needing special microcode revisions within custom kernels, and instead of just an email, turned it into a post of sorts.
Some of the content is a little harsh towards certain entities. I want to be clear to those at said entities that I truly do value the amazing work you do and that I still believe in you longer term.

Heck I'm still open to working with you on fixing some of these problems!
At the same time, I can't exactly let you off the hook for it. If the goal is too have "the best and most vibrant HPC software ecosystem" don't actively work against those of us trying to help you on our own time and our own dime.
If you spot any omissions or mistakes, please do let me know. Wrote this up between supper and movie night 😅

Beyond that, feel free to send comments, questions, or concerns to me publicly or in DM's, mine are open 😊
People that have been helpful along the way have been @IanCutress for his initial coverage, @InstLatX64 for the in depth checks of implementations, several members of the #HPC, #AVX512 and #silicongang communities, insight from various people at [blank] and others I can't mention
@IanCutress @InstLatX64 @sramkrishna this is the one I was talking about 😊

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with @fclc@mast.hpc.social 🐘

@fclc@mast.hpc.social 🐘 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @FelixCLC_

Nov 30, 2022
Really exciting to see this coming together!

Accelerators as a kernel subsystem, built off of the existing DRM framework!
Now, there's also a *slight* risk of messes down the line.

ex: some SOCs might want to exposes the same ip block via multiple interfaces.

Example: A GPU can be thought of as both a traditional "display accelerator" in DRM as before

but it can also be an AI/matrix engine
Think of an NV GPU wanting to expose tensor cores, RT cores or any other family of "specialty" hardware that's more suited to specialized acceleration. Direct access is more suited to being exposed via something like /dev/acc1/mat, but you'd also want access via ex: /dev/dri1/mat
Read 6 tweets
Nov 28, 2022
1/3
A little late on this thread, but I can provide some context/info here:

Overall, mastodon moderation tends to be more active overall, and not being active is seen as a detriment to the instance itself as well as the “fediverse” as a whole.
2/3
Mastodon. Social is considered to be less moderated than ideal. Partially because of its scale, partially because it’s where most users land whenever there’s a huge influx of users etc. It’s more moderated than Twitter, but less than others would like.
3/3
Because of this, some instances/servers block interactions with mastodon. Social AT THE SERVER LEVEL to avoid the sort of “abusive” posting you’d see on the more toxic sides of Twitter.

As such there can be no interaction between blocked instances users.
Read 4 tweets
Nov 28, 2022
Cool little chip I came across while going through ark:
ark.intel.com/content/www/us…

1.7 GHz, dual core icelake chip that peaks at 25W, has AVX512 etc.
This with dual FMA units would be a fascinating little SKU, especially with QA and the other recent AI accelerators for edge applications
Because self NerdSnipe is a thing, I'm now thinking how an SPR variant would play as a jetson/Pi-CM4 board installed into something like a @turingpi.

4C8T with 2x512B FMA units per core, 1-2 quick assist engines (cut down E/F2000?)+GNA3+VPU+QuickSync, XeLP and ~ 8-16GB HBM2/3
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(