Artificial Intelligence With Senior Industry Leader Manas Talukdar

My guest for this episode is Manas Talukdar, a senior industry leader in Data Infrastructure for AI. Talukdar has significant experience designing and developing products in artificial intelligence used in mission critical sectors across the world. You might call him a Software engineering polyglot, as he’s comfortable programming in Java, C#, Python, and other languages.

He has bootstrapped multiple projects from ideation to prototype to product, is a senior member of IEEE, an AI 2030 Senior Fellow and Advisory, and Director of Engineering at a startup, building a data-centric AI platform. He has filed multiple patents in data processing and machine learning for AI, and has been invited to be an Ambassador at the AI Frontier Network.

In this episode we talk about:

  • Why AI won’t be smarter than you.
  • How instead of taking over the world, AI has the potential to help you get a better job, making the work you do more creative and useful.
  • Why you might want your personal data to be included in big data sets.
  • Privacy issues -Who’s responsible for keeping information private.
  • General myths and truths about AI (Artificial Intelligence)
  • Why, when faced with the task of choosing between two alternatives, autonomous vehicles known as driverless or self-driving cars, cannot decide whether to hit an old man or a child.

Please note that content for this show and on Talukdar’s website, have nothing to do with his employers, past and present, nor does any such content constitute an endorsement of anything of any kind by Talukdar.

 

 

Show Notes

Connect With Manas Talukdar:  Portfolio 

 

Transcript

My guest today is Manas Talukdar.

He's a senior software engineer with significant experience designing and developing products in artificial intelligence, used in mission critical sectors across the world.

You might call him a software engineering Polly as he's comfortable programming in Java, C Plus, or excuse me, C Sharp.

Python and several other languages, he's bootstrapped multiple projects from ideation to prototype to product.

He's a senior member of the IEEE, and AI 2030 senior fellow in advisory, and has filed multiple patents in data processing and machine learning for AI.

He's been invited to an ambassador at the AI Frontier Network.

Please note the following content in this podcast has nothing to do with his employer's past or present, nor does any such content constitute an endorsement of any kind by myself or my guests.

Please welcome Manas.

Thank you, Daniel.

Happy to be here.

Thanks.

Awesome.

You probably used to speak into some real experts in AI.

Most of my guests are in between.

Some of them think AI is going to destroy the world, take away our creativity.

And there's good applications and there's bad applications.

So what is, how does artificial intelligence really work?

What is it?

Because in my experience, I use chat GBT sometimes and it seems like a super Google computer more than something artificially intelligence, like stuff I could do if I Google enough and thought about it.

The applications you're working on are probably a bit more sophisticated.

Explain what it is and what it isn't.

So artificial intelligence truly, true artificial intelligence is intelligence that is able to reason, that is able to make sense of unknown information and provide some insight out of it.

In technical terms, we would call that AGI.

How does it do it?

For example, I've been told you take a data set, you collect a bunch, collect a bunch of data.

For example, I'm a doctor and I want to look, I want an AI application that helps me determine, is this bone broken?

Are there any broken bones?

If I understand it right, it's basically, it's not quite intelligent as much as your brain is in terms of pattern matching, meaning your programmer is going to teach it first.

How do I recognize certain images like a broken bone from a picture of a non-broken bone or something else in the body?

And then number two, how do I actually say that's broken, to which the data set, if it's incomplete, let's say you haven't accounted for every conceivable way you might hurt yourself playing football.

And then that picture of the slightly different broken bone is not in the database.

And even though the algorithm can kind of go in between and go, okay, it doesn't have exactly every picture, but it's trained to recognize what happens if it's incompleter, invalid.

In that case, how would a doctor know, how would a patient know, and what happens if the doctor, the patient's not sure?

Or could even the AI say, we're not sure?

They could, they could.

But I mean, I think there's like multiple questions and discussion topics.

Yeah, yeah, yes.

So maybe we can distill that a little bit.

So going back to what I was saying, right?

True artificial intelligence, in my opinion, is intelligence that is able to reason.

In technical terms, you would call that AGI, but we're not, we're not.

What is that term?

I haven't heard that one.

AGI.

AGI is, what's the term for that?

It's called artificial general intelligence.

Okay.

And how is that different than AI, general intelligence?

Well, so AGI is equivalent to human-like intelligence.

Okay.

Which is really able, as I was saying, is able to reason based on unknown information and still provide some insights.

Currently, for example, if you ask Chad GPT or other large-scale models, something, information on data, that is something it has absolutely nothing to do with what it's been trained on.

It's not going to be able to give you any action.

It will give you a negative answer, I do not have anything which I can provide you on this particular question.

So that's not close to AGI.

So we are still some distance away from AGI.

So what we are currently saying, artificial intelligence truly is a facet of what is known as Generative AI.

Generative AI as in you train these massive language models.

And then they are basically a lot of these are basically trained on data encompassing what's available on the Internet.

So for example.

Excellent.

I was going to ask for an example.

Yeah.

So an example could be say, you know, it's trained on Wikipedia data maybe.

So if you ask a question, it's probably going to use that training, you know, whatever the training data was to spit out a well-formed answer on a particular individual.

If you ask a question on an individual, like for example, if you ask, tell me about Neil Armstrong, right, it's going to gather data from its training data, whether it's Wikipedia, other sources, and it's going to spit out to your well-formed answer.

So going back to your point about Google, you know, in Google or being on these other source engines, you would have to search 10 different pages and then use the human intelligence to formulate something which is more presentable.

So now what's going on is because these large language models have been trained all this on, basically, the entire internet.

So they're able to take a disparate number of sources and spit out something which is well-formed.

Now, if you ask a question about, say, some individual who that large language model has not been trained on, it's not going to be able to answer that question.

Let's say I ask it based on the astronaut database, just to give an example of, hypothetically.

Say I ask a system, give me the ideal candidate to be the first person to land on Mars.

How would you, as a programmer, program that?

What kind of answer might it spit out?

Well, I wouldn't program it.

So the way this works is it's all based on the training data.

So the training, it could be trained on, hey, what was the profile of all of the astronauts who have previously landed on the moon, or have gone to outer space?

What's their background?

What's their education background?

What kind of organizations did they go from?

Whether it's from EU or Russia, or previously USSR or NASA and so forth.

Then based on all of this information, they will form an answer and a response of what the large language model construes as the ideal individual to land on Mars.

Perfect.

I think your question earlier was very apt, so maybe that will help to distill this a little bit better about the broken bone X-ray.

The way this would work, say a machine learning model would be trained on lots of X-rays, let's make it very simple.

Say there are 10 X-rays and there are pictures of bones.

There's pictures of bones that's not broken, there's pictures of bones that's broken.

Currently, the way it works is mostly, it's called reinforcement learning and human feedback.

So some human labeler would go in from a bounding box on the portion of the X-ray where there's a broken bone.

And then it would, so out of 10, maybe there are five of those X-rays.

So we have five ground truth labeled data.

That's how we call it.

And then we would train the model with all of those 10 images and the model would know, all right, so here's what I see.

Here's the broken bones in those five images.

So now the model knows how to identify the broken bones.

I like that example because a lot of people are worried it's going to put them out of jobs and not make humans necessary when in fact, you can't have, see, we're back to that idea.

It's truly intelligent.

And you're basically teaching it how to be yourself, like a teacher in school would teach another student.

Can we look at AI as students?

And maybe they'll be smarter than the instructor someday, but they're students.

Okay, perfect.

Well, the thing is, you know, AI can certainly do better than humans in certain aspects.

For example, I think AI was able to recently identify diagnostic disease that human doctors were not able to.

Excuse me.

And primarily the reason behind that is because these large boundaries, probably cross boundaries, cross boundaries, and they also have been trained on massive amounts of data.

I mean, yes, a human doctor is also trained on massive amount of data, but we are limited by our, you know, the biological capacity of the neurons in our brain.

And it would be like you went back to college and you got a PhD in every conceivable possible degree you could get a degree in.

That's what you've trained the system with.

Right, right, right.

So, you know, the point is there, you know, AI will function in my opinion, in a enablement role.

Is it going to put people out of jobs?

I think it's a little bit overstated, that statement.

Perhaps the fear is a little too much, but at the same time, if people do not know how to leverage AI to be more successful in their jobs, then I think there is a potential that in certain fields, those individuals may be in danger.

To me, it seems the people that only want to do what they've been doing forever, never get creative and try something new or experiment such that they could teach the data set something new it didn't know before, because it can only learn what you teach it, right?

Right.

It can only learn what you teach it, but that's a really good point actually, because what we are trying to do now in terms of what's going on in the industry is.

So large language models work pretty well when you ask it something related to the training, what it's been trained on.

But how do you train a large language model to respond to new information?

New questions.

New questions.

If it doesn't have the information yet, you have to basically maybe teach it, hey, call Manas or call Daniel.

He'll tell you about blah, blah, blah, right?

Or something like that.

Right.

So that is a problem that is, I think, yet to be fully solved out, where how do we train a model to respond to new questions or new, we call it new knowledge.

That's the term kind of being used in the industry.

How to train a large language model to respond to new knowledge.

To which I have seen, I've been invited to some of them and I've participated before I realized it was AI generated the questions, and probably they were going to use my answers to store it in the database to which I would have no copyright or privilege or monetary benefit for.

They're asking me questions.

They find that you're an expert on something, and then they just start asking questions.

Do you know what I'm talking about?

Services like that?

Yeah, there are services like that.

I mean, I think what you're alluding to is basically model evaluation, where you would ask a model the question, and then you see how it responds, and then maybe you like the response, you rate the response a particular way, or if you don't like the response, then you can possibly edit it and tell the model, hey, here's a better response, and the model will actually learn based on the better response that you provided the model.

So...

For example, Quora, a popular site online, you ask it anything, and normally it has a database of experts that you join as an expert, so people know who you're talking to.

So I get questions by email.

It could be about life coaching, or rock climbing, or songwriting, and I'd start answering them, like, hey, I'm a kid in my bedroom trying to learn guitar.

Any recommendations?

I didn't realize they were AI-generated questions.

I was not speaking to an original and actual individual system trying to determine what kind of questions someone might ask and then gathering data, which is good, except we're not really getting paid for it or any...

You make a good point.

So you know why they're basically Cora or probably other companies.

They're using your response and others such as yourself responding to those questions to train, to further train their models.

Yes.

It's the same situation I see on LinkedIn.

I get prompted all the time.

Hey, you're an expert invited to answer this question.

Right.

I mean, basically, they're going to be using my answer to feed into their LNM.

And you think you're getting credit and recognition as an expert and that it's going to help your LinkedIn profile and your career when in fact, you probably won't get any recognition or credit for it.

And now your knowledge that you've worked so hard your whole life, they kind of own, right?

They own that information.

You don't have a copyright to it, right?

Yeah, there's no copyright per se.

For example, I may have a website, I publish a blog post and there's a 99.9% chance that one of those bigger large language models, whether it's OpenAI or Gemini or Cloud, they have already trained their models based on that blog post, perhaps I published or others have published all over the Internet.

As I said, some of these larger LLMs, in my opinion, they're basically run out of data, really available on the Internet to train your models at this point in time.

Would you recommend or not recommend or not have an opinion on, should people be careful about arbitrarily answering questions online such as this, unless they know it's a person and not AI, should they check first to see who the author is?

Is it helping society?

Are they really helping society or are they potentially, they are, but that's how they're going to lose their job because they're not even getting credit for it.

So the system doesn't even know next time, hey, that was a good answer.

Call this guy, ask him.

Well, it's always good to know who's asking the question.

Is it a human being asking the question or is it an AI that's going to be asking the question?

I think it is very important to be transparent about the intent.

So I think it behooves these organizations and others to be very transparent that, hey, this question is being asked, but it's the answer to this question that we're going to provide.

We'll be used to train our LLM.

So I have a question about privacy.

Yeah.

And even if they, whether or not you believe it's truly private or not, we know everything can be hacked or non-private.

What are the weaknesses of an AI system?

Your large, super large database.

If you have people who are not contributing or just not on the internet, maybe they just don't spend time on the internet and their profile is not on Facebook, their information is not there.

What's the weakness in the system?

What's the weakness in regard to privacy?

Some of the stuff is supposed to be private.

Then how does this is to question, let's cover the privacy issue first because you probably know a lot about privacy.

What are some of the issues with privacy?

Well, privacy is that the personally identifiable information, what's called PII, that needs to be anonymized when being fed into models as training data.

For example, if there's somebody's email is out there, associated with their name and whatnot, I don't think that should be fed directly as training data to a model.

So that needs to be anonymized.

And it is incumbent on these companies that are building and developing these LLMs.

And it is also incumbent on companies that are generating this training data to make sure any PII has been very diligently anonymized.

That's one.

Secondly, you know, somebody might have a public profile.

I mean, that's a tough one, right?

Because once you have a public profile, you're basically you're signing off any rights to privacy.

Those are my question is, how do you get good data and keep it private?

Because we really want both.

You want to be in the data because if you're a minority, politically or economically or socially, you want to be in the database and accounted for.

If they're looking for an astronaut to go to Mars, you want your profile to be considered.

But you don't want to know who you voted for.

If you want to learn things like that, how do we get good data, not make people afraid to give good data, but also keep it private?

Right.

Well, I think right now, the range for maintaining that privacy, for better or for worse, is with those organizations that are developing these algorithms.

For example, I think you made a very good example here about, say, who somebody voted for right now.

Now, that's obviously private, but what's not private is somebody made a political donation.

I mean, that's available on the Internet.

But maybe I made a political donation, but I don't want that to be easily available to LLM, so that somebody, maybe you go and go to chat with the PTA, who did so-and-so contribute to it, right?

So the point is, that is, as I said, for better or for worse.

Currently, that is incumbent on these companies developing these LLMs to anonymize that out.

One, that's one way to do it.

One way to do it is, you mask out that information, you anonymize it when training your model.

That's one way.

Second way to do it is, when the model actually speaks out its output, you mask that information on the way out.

That's another way to do it.

Third way to do it is that when you ask a question, it's called prompting.

When you prompt an LLM with a question, and it's kind of controversial, then you can potentially have something intercept that prompt.

Then you don't even ask the model the question.

For example, if you say, hey, give me so-and-so SSN.

I mean, the probability that a large language model will have the SSN is very low, but to even get that to zero, somebody might choose to intercept the prompt and then tell the LLM, this question is asking for a PII, please do not provide the question information.

And the model will then spit out the answer, saying, I cannot answer that question.

So it's up to the programmer to insert that, is what you're saying.

What I'm saying is all of these mechanisms that I talked about, it's currently, it is incumbent on these companies, these organizations, developing these large language models and the usability interfaces around them to build all these guardrails and checks and balances.

What about the opposite, where it's actually not in the database?

Let's say again, I'm the best candidate to go to Mars because they didn't realize that I know all these languages and I'm just making this up now, that where we go and the other cultures we're going to interact with, my language would be the best.

It hadn't thought of that.

It hadn't thought of the actual communication between the humans on the spaceship to that extent.

Something that the computer just didn't think of because the programmer didn't program it, right?

How are the users of the AI to know that that's not the best possible answer and not just assume it is?

Or is that just common sense?

Right now, that's common sense.

Okay.

Just like, don't trust everything you see on the Internet.

Similarly, just because it's a more consumable format, now with these LLMs, don't trust whatever the LLM answers.

It's just a tool, like anything else.

I think that's the best way to put it.

It's just a tool.

That's it.

I think we're making progress on me trying to explain to the people who think AI is gonna take over the world and be some super intelligent, artificial being.

To which, here's a good one that, this is a real example, someone I'm talking to on the street, who goes, isn't AI gonna be great?

Because you have this autonomous vehicle, car, driving, and some incidents happen on the street.

Well, it has to decide, it can't hit the brakes fast enough.

It either has to hit an old man or a child, and it will know to hit the old man.

How will it know?

You're smiling, Manas, because you get where this is going.

And what if the old man was a brilliant scientist treating the young boy for a rare brain cancer, and now you just killed the old man, so the kid is going to die anyhow?

Explain to people how AI works and doesn't work.

If I'm correct, if you haven't programmed and you don't have the data to know that about the man, the child, and what if you don't have that data and you just go based on age?

If the programmer said based on age, it would kill the old man, right?

But it's not super intelligent, or is it?

What are your thoughts on that?

I think that's a very difficult situation, you know, to live alone for an AI, even for a human being, right?

If a human being were the driver, even the human being wouldn't have that information, right?

So I think these sort of moral dilemma situations, I don't think we will have an answer to these questions before AGI lands.

It's probably in my, I would say, quote unquote, naive opinion, AGI is probably going to land whenever that happens.

And then there will be a rush to figure out, how do we address some of these moral dilemma situations?

In a way, you know, in a way, this is not too dissimilar to, you know, Isaac Asimov's Three Laws of Robotics, where it's a very similar moral dilemma question, right?

Where, you know, Asimov deposited that an artificial intelligence or a robot, I'm not going to harm a human being.

And I'm just speaking from memory, so I may be speaking a little bit here.

Number one, I'm not going to harm a human being.

Number two, I think it was, I'm not going to harm myself.

And then number three, it was along the same line.

Right.

Yeah, you know, you see where I'm going, right?

I don't know exactly at the top of my head.

Yeah, basic safe systems.

Exactly.

Like, don't never point your gun at, don't point your gun at anything you would not be upset about if you destroyed or killed it.

Right.

It's the same like gun, hand gun safety laws.

Right.

Now, there is another aspect to these sort of situations, the situation you just described that who's going to be held liable?

Is it the company that manufacture the car?

Is it the company that developed the machine learning model that's driving the self driving the car?

Is it the company that provided the training data?

So the legal aspect of it is another landmine, which I think hasn't yet been figured out.

Or even could be the mother's boy.

She should have, maybe she's worried about the privacy and didn't disclose child was being treated by the doctor.

There's a lot of fingers that could get pointed in a lot of scenarios.

So I appreciate you basically telling us as an expert, it's like any new technology, right?

Till it comes out, you can't predict everything.

A lot of SpaceX rockets, well, not a lot of them, still don't land perfectly, right?

So how about the word sentient?

This was a new word to me.

I had to look it up.

What?

It's a question, is there sentient AI?

Sentient.

Thank you.

An intelligence system capable of thinking and feeling like a human, is that the AGI you were referring to?

That's the AGI, artificial general intelligence.

So we are not even remotely close to any sort of artificial sentient intelligence.

So let me give you and your listeners a sense of what this whole massive hyperbola is, what it truly is, right?

It's very probabilistic.

And without getting too technical, it's very probabilistic.

What I mean by that is when you say, you go to Google sometimes, you type, say, how do I get to, say, you start typing TEX, and then Google will start prompting you AS.

So how do I get to Texas, right?

You use stuff like that.

So that's called a pipe ahead algorithm.

So in a kind of an overtly simplistic way, what currently all this large language models are doing is actually not particularly dissimilar from a type ahead concept concept, except the massive difference is that these LLMs have been trained on a humongous amount of data.

And not only that, second, that's one.

Secondly, they use a particular algorithm, a transformer model algorithm, which allows them a bit more sophistication.

I mean, this is not even remotely close to being able to reason like a human being, or even a parrot, for example, or a dog, or a crow.

I mean, these features, in my opinion, have reasoning abilities which are dissimilar from, say, a four or five-year-old child.

So what we are calling AI right now is not even remotely close to being able to reason about things.

So sometimes you might see if you ask a question to some of these LLMs, the further along it gets in its answer, the less accurate it tends to become.

So maybe it forms like a four-line answer.

I mean, by the title and fourth line, I'm not saying all the time, but sometimes you can observe this, it's getting less and less accurate.

So as I said, because this is because of this probabilistic factor in how it's going about it.

So long story short, we are still some distance away from this sentient slash reasoning ability, slash AGI intelligence.

And is that what you work on?

I'm not going to ask you too specifically if you don't want to answer because some of it might be a private or you're working for companies that some of this information is proprietary.

But you day to day, you're working on developing AGI models?

Well, that's not what I do, but I like to think I'm playing a part in it.

I'm happy to talk about how.

So as I said earlier, right now, all these AI labs, they basically run out of free data to train their models.

Now, whatever is out there on the Internet, that's already been used to train these algorithms, which is why if you go to the chat, GPT versus Gemini or Claude, you ask the same question, right?

The answer is not going to be particularly different now.

I mean, yeah, there will be some nuances here and there, but you will get what you want, right?

Unless you're stripping hairs, you go to any one of them, you'll get what you want.

That's because they have been trained pretty much the same data that's available on the Internet.

This seems like a pretty exciting job you have.

Interesting.

Well, my job currently, what I do at my day job, the company I work for, the AI startup I work for, we basically work with these AI labs to provide them specialized training data.

So think about the example where we talked about this broken bone x-rays.

I think that's very closely tied to what we do.

We provide these ground truth labial data, particularly specialized labial data.

I think about say cardiology, for cardiology training, a large language models for cardiology, or for nuclear physics, or astrology.

So you're working with the actual data, and how to make it useful, make sense of it all for your clients.

Or for these AI labs, right?

By the way, I meant astronomy, not astronomy.

I don't think we want to do anything with astrology.

So say astrophysics, in fact, we have had LLMs that have actually made some kind of discovered some planet last year.

Looking at x-rays, right?

Looking at pictures of outer space.

Pictures of outer space.

Exactly, exactly.

So there is basically a gold rush right now to get ground truth label data, to train these LLMs more and more, because these companies have run out of free data to train their models.

There is a new job for somebody, right?

An astrophysicist, the AI needs you to tell it.

To train the AI, right?

A side gig, yeah.

But it's actually creating more jobs, not losing jobs, at least in the near future.

Yeah, I mean, it's creating new opportunities.

It's creating really excellent new opportunities for people who can walk from anywhere in the world and really help and enable the development of more and more powerful LLMs.

And basically, which is really driving us all towards AGI at some point in the future.

Is that five years from now, 10 years from now?

I'm not even going to try to answer that question.

If it comes to the point where I've taught my robot everything I ever wanted to learn and know, I would say, well, yeah, take over because why am I here, right?

What am I doing?

I have nothing else to think about and contribute.

And that I don't think would be a bad thing.

I wouldn't have to think anymore if I didn't want to.

But as long as there are people still being creative and using their intuition and coming up with new ideas, I think it'll always be a use for them to add to the model, right?

Right.

Now, I saw a meme come out probably a month or two back, and it really resonated with me.

Somebody said that I want AI to do my laundry or do my dishes.

I don't want AI to write an article for me.

So I think there is something to think about there, right?

Because there is something to be said about human creative endeavor.

I think we probably don't want AI to be the next Picasso or the next Mozart, right?

I think it makes sense.

I mean, that's what makes human existence special, right?

There's going to be some Mozart out of nowhere.

There's going to be a Picasso out of nowhere.

Yes.

So, it makes sense to think about it along those lines, while AI possibly acts as inspiration, perhaps, for the next Picasso or the next Mozart.

But AI needs to develop in a way where it makes our daily lives easier, right?

Exactly the example that individual had on the name, where do my laundry, do my dishes, right?

But I don't have to worry about all of this logistical stuff.

And we are actually moving in that direction, right?

Because we are gradually from LLMs, you know, these LLMs which use massive amount of resources.

We're slowly starting to drive towards a world where we have smaller language models which are using less and less resources, which means that we will be able to use these models in really edge devices and really embed them in pretty much anywhere.

So, I don't mind if the robot does my dishes.

It even helps me craft an article.

But how about the example I want to try to love letter to my wife?

And ironically, I did this on chat, TBT.

I asked for some just, I think I saw a dating app, and I'm not even on dating sites and apps, but I'm, it's fascinating to me in terms of this technology.

So I downloaded it was a free app.

And it said, you know, you can ask a girl what's a pickup line.

The cheesy stuff, it spit out.

It clearly scraped the Internet.

Yeah, I had no intelligent way of.

But what was interesting is when I did the opposite, I said, tell me how to write a makeup letter.

It actually crafted some heartfelt things.

Now, of course, it's still scraping the Internet and piecing together something based on your question.

It's not really thinking about it, to which I have to wonder if I really gave that to my wife, she's going to know, did you write this?

And I think Rob's people have the privilege to actually be creative when they rely too much on something thinking, the artificial thing can do better than you.

Right.

I think that's the thing, right?

We don't want AI to replace creative human endeavor, in my opinion.

For example, going back to our example, right?

We can ask AI, we can ask one of these machine learning models to throw us ideas to act as inspiration, maybe put together a a scaffolding for the letter.

But you know, you probably want to write the letter in your own words.

Yes.

But if somebody were to copy paste something that ChatGPT or Gemini or Lama or something spit out.

I mean, you know, that's, I can't wrap my head around that.

I saw something I would put.

Yeah.

Maybe it's a case where AI needs to teach us something back, for example, some people don't know how to write a letter, craft a letter like introduction of a body, a paragraph or two in the middle and an ending, a closing and a dear or a ma'am or how to.

So maybe it's, it's a two-way street.

You can teach the AI something and then it can teach you.

But instead of it asking you to do your homework, ask it just to help you with your homework.

And it can already do that if you ask it that way.

So if you tell them, hey, can you please, can you provide the scaffolding or can you provide the structure for a letter I want to write to so and so on ABC matter?

Of course.

It will provide that to you.

But if you tell it to write the letter, then it's going to write the letter for you.

Just doing what you asked it.

Exactly, exactly.

It's almost you have to have some intelligence to know how to use artificial intelligence intelligently.

Right, right.

Are there any other controversial topics surrounding AI or things, myths, misinformation, things you would like to tell listeners about AI that they may not be aware of?

Well, I mean, trust and safety.

You know, we talked a little bit about privacy.

I think trust and safety is also another perhaps topic of apprehension out there.

How do I trust what an AI is telling me?

I mean, you don't basically, just like you don't trust what's out there on the Internet.

I mean, my point is we are still some distance away from AGI and even when AGI arrive, should we trust AGI?

I think that's a question for another day.

But do we trust the LLMs currently?

A person should use their own common sense, in my opinion.

Now, we talked about safety, right?

Now, safety could be around, say, bias.

Bias is also another thing which broadly falls around this whole privacy, trust, safety, bias kind of umbrella.

Now, that also depends on training data.

For example, for certain groups of people, we may not have enough training data to train them all.

We have seen some rather bad examples of this, right?

So when you ask the model a question, it may tell you something which is completely biased against a particular group, which is not necessarily the model's fault, it's because it just did not have sufficient training data.

Right.

Right.

So there's ways to get around it.

There's machine learning techniques to get around it.

There's also, and these big AI labs, they're already working on it where you can also generate synthetic data to address the gap.

The other mechanisms are that you explicitly.

So since you're the company providing the data, you mean you would see a weakness and try to artificially inseminate the data with some made-up data to compensate for that?

Well, it's called synthetic data.

You can generate synthetic training data, right?

So it may not be actual data, but it could be synthetic data.

How do you keep track of it so it doesn't become misinformation in the future that you forgot was synthetic?

No, that's a good question, right?

Because if you think about it, if your model itself is biased from the beginning and you tell it to generate synthetic data, what's to say?

It's not going to be garbage in garbage out.

Some companies without going to names, well, we can talk about it because they have publicly published the paper.

Names change, so I try to avoid names just because name will be different next year or next month.

There are companies out there that's developing LLMs and training LLMs using synthetic data, using human in the loop.

You generate synthetic data, but then you also evaluate the quality of the synthetic data and you correct the synthetic data if it's inaccurate.

And that acts as new knowledge to retrain the model with or to incrementally train the model.

Let's put it that way.

So you're validating your synthetic data to go, does this make sense?

Would I put this on a pizza, ketchup or mustard or no, that doesn't make sense.

Pick something else in the fridge to make a pizza with.

Okay.

That's kind of creative in a way.

Right.

Right.

And I don't think there's ways around it.

Synthetic data will probably need more and more synthetic data because we're basically running out of freely available public data.

Would a more realistic example of than my pizza and the mustard be, say in New Mexico, Native Americans, there are small tribes, only so many people with that gene pool, and somebody dies of a disease, and it gets matched to the database with a bunch of Caucasian people and the diagnosis doesn't actually quite match up because the database doesn't have enough of the other people in it.

Yeah.

That's a really good example.

Now you have to be careful when generating synthetic data for things like medical use cases and so forth.

Which is why human in the loop is extremely important.

And now there's other machine learning techniques as well.

Let me give you an example.

So for example, let's consider men and women.

So say for a particular data set, there is an over-representation of men.

There's 90 percent men, 10 percent women, right?

So maybe 10 percent of women is not sufficient to train the model.

So what you do, I forget the exact name of this machine learning technique, but it is a published research where you train the model.

You don't tell the model, hey, there's 10 percent data for women.

You basically use the starting points for training the data on the female demographic, starting off at the endpoint of the male demographic.

So for all practical purposes, the model would think that I actually have more than 10 percent of female demographic data to train myself on.

Now, is that going to be resulting in accurate inferencing?

I think the jury is out on that one, but at least it's going to be way better than just having vastly underrepresented data in your training data set.

Certainly.

My bigger question was, how way down the road do you know it was synthetic data?

But I suppose you keep that data set labeled differently, like this was a training model versus actual data, real data.

I mean, you would tag that separately.

You would certainly, I mean, they have to keep track of what's actual data and what's synthetic data, yes.

I just thought of hacking.

What if somebody wanted to hack and insert artificial data to effect an outcome?

Beck, I'm going to Mars.

I want people to vote for me.

So I'm going to somehow tweak the data.

That could happen probably, right?

Well, given the scale of, it wouldn't happen, right?

So let's take a step back, right?

So let's talk about web search on Google.

So if, say, you and I created a website saying, Daniel and Manas should be the first astronauts to land on Mars, right?

And then if you search on Google, who should be the first astronauts to land on Mars?

I guarantee we're not going to show up.

And the reason we're not going to show up is because they have an algorithm where they rank, it's called PageRank, they rank websites based on incoming links.

So just because we put something out there on the internet doesn't mean it's going to be weight given the weight that it should be given the weight.

So similar concept also exists in the LLM world, where just because I put something out there, the LLM is not going to just take that and blindly spit that out.

So there are some guardrails in place to ensure that this don't happen.

I see.

Nice.

I think that's all the questions I had.

This has been a great conversation.

A lot of good ideas come up.

Oh, how did you get into software?

Did you know you wanted to go into AI when you started going to college for software engineering?

You did.

Well, let me take many steps back here.

So, well, I did my education, my engineering, education in electrical engineering, and in controls and sales, differential equations, robotics, the good stuff.

And then I started working for an industrial software company.

So yeah, I'm not doing any mathematical modeling right now, but I think the analytical skills I acquired in my academic life, I'm still using those same or similar analytical skills.

So I worked for the industrial software company, worked on time series historians, middle-year SDKs, then cloud-based data platforms for industrial AI and so forth.

Then I worked for an enterprise AI company where I basically, I would say the leading enterprise AI company, and that was an excellent experience for me.

I had a great time building out a really sophisticated and really powerful and scalable enterprise AI platform that's used all over the world to build large-scale AI applications.

Then earlier this year, I started working for this AI startup, and I'm really enjoying that.

Basically, I feel like I'm having a front row seat now in the development of these powerful new LLMs.

So do you have any career advice for someone in college, just beginning software engineering, and they're really fascinated and they want to get into AI?

Any recommendations for career paths, like what to do while they're in college, to focus on that?

So for students, I would say do the relevant coursework.

I think building a strong foundation is extremely important.

Taking courses on linear algebra, taking courses on say machine learning, on deep learning, this is extremely important.

If somebody is, I mean, most universities offer these courses on machine learning, deep learning, but so I strongly recommend building that foundation.

If somebody has already graduated, but now they decide, hey, I want to go to AI, then I would recommend that you go to the machine learning and deep learning specializations on Coursera, they're pretty good.

And then you should try to do some personal projects, right?

Do some personal projects, see what's out there in the industry, and there's tons of resources to do these personal projects.

And then just try to get your foot in the door.

There are a lot of AI, there's many AI startups that's coming up now.

Other things people can do is join some startup incubator, maybe volunteer or some startup accelerator.

Join then volunteer, work with some pre-seed companies, or if you have some relevant projects, then you can even reach out to startup founders and tell them that, hey, I have a lot to work with you, I'm interested in what you're doing.

And as I said, there's an explosion of early stage AI startups currently in the market.

So it's a really good time to enter the industry.

Now, obviously, that said, I should also qualify that statement by saying that it's a really difficult time for new grads and early professionals right now in the industry.

Well, why is that?

It's just, you know, finding jobs for early professionals and new grads has become a little bit difficult, more than a little bit difficult compared to say, three years, four years back.

Is the market flooded or the economy, is it time-based?

Well, because, you know, companies are now focusing on experienced talents.

That's the trend the industry is taking, which is not to say that the door has been entirely closed, which is, you know, some of the suggestions I made still applies.

Look for opportunities in early stage startups, look for opportunities in AI, you know, in like startup accelerators or incubators, and use these opportunities to get your food in the door, even if you don't get paid, offer to work, gain that experience, you know, and that will help you get your food in the door.

Right.

My background, I have a degree in aerospace engineering, and I graduated when, politically, depends on his president in the military and everything, it wasn't a good time to graduate with aerospace.

Fortunately, I had your advice.

My mom said, get a job.

I had one when I was going to school, which was tough, because I did and I was in semiconductor and wafer fab manufacturing.

Once I got the degree in engineering, I was able to just stick around there and apply my engineering skills to a job like that.

So I guess that's the case with any kind of career.

Experience matters and you can't just graduate and expect to get a job unless you have a PhD in something nobody else has a degree in and is in demand.

But more importantly, I think you got to love to do it.

If you really love this AI stuff and you're fascinated by it, you will succeed, right?

Right.

You apply yourself, you learn it, you love it, you'll do, you'll go far.

And help us build a better world.

Yes

And as you said, I completely agree.

Self-motivation is extremely important, I think.

Because it's rough out there for new grads, but just just be motivated and look for opportunities where we can find it.


Paths will open up.

Yep.

OK.

Thank you, Manas.

Appreciate you taking the time.

Wish you future success in all your endeavors.

And sounds like you're having a really good spot.

You're having some fun with it.

Yeah, I'm having a good time.

Thank you, Daniel.

I really enjoyed our conversation.

I think we spoke for almost an hour.

And the light just flew by.

So thank you.

We'll do it again sometime.

When your AGI comes out in a couple of years.

Well, it's probably going to be more than a couple of years, but we'll see.

Okay.

Thank you.

Nice talking to you.

Leave a comment