Data is the new oil – or is it skilled workforce ?
I just came back from the 2023 ASMC in Saratoga Springs, which was packed with 15 technical sessions and lots of great presentations. One topic was in the air throughout all the sessions – will the semiconductor industry have enough skilled operators, technicians and engineers ? Almost all keynotes brought this point up and there more I look at it the more I think the industries biggest problem in the next 5-10 years is the lack of skilled people.
Below are a few take outs from the presentations:
keynote Dr. Thomas Morgenstern, Infineon
keynote Thomas Sonderman, SkyWater Technology
panel discussion, Rick Glasmann, The MAX Group
keynote Robert Maire, Semiconductor Advisors
Robert Pearson, RIT
As a matter of fact, Roberts’ presentation sums up the situation perfectly and I got his permission to post it here:
Even the panel participants displayed a mirror image of the workforce situation. 3 out of the 4 panelists were seasoned Equipment Engineering / FAB veterans in contrast to the young AI / data expert. It really seems that the mechanical / electrical hands on work is slowly going extinct.
As one panelist shared with the audience: ” … I do have 3 kids, none of them want to work in the semiconductor industry …” asking about the why: ” … dad, look at you, you are always late back home, your are always stressed, the phone never stops ringing – why do I want to choose a life like this ? …”
The topic is serious – I think existentially serious. The semiconductor industry is extremely capital intensive and will only survive if the equipment in the FAB is running 24/7. Based on the numbers – showing the needed additional skilled workforce – it seems there will be many, if not all factories facing output and efficiency losses. But how much ?
Most forecasts show a delta of about 200,000 workers over a base of about 300,000 existing workers – that is a 40% gap. Depending on the specific field where the workforce is missing the impact will be different:
- missing operators in manual FABs will have massive direct impact – tools will not be loaded / unloaded in time and therefore there is direct loss in capacity and cycle time
- missing process technicians will have impact on hold lot release and overall process stability and therefore impact yield and reliability
- missing maintenance technicians and engineers will directly lead to less equipment uptime and lower equipment stability, both directly impacting FAB output, yield and reliability
- missing process engineers will lead to reduced process improvement work as well as to less stable manufacturing processes
- the list goes on and on
What will be the economical impact of all this ?
The total US semiconductor industry revenue in 2022 was in the neighborhood of $275 billion. If I just assume that a 40% shortage in skilled workforce will have a 10% overall impact (which I think is a extremely conservative estimate) that would mean, in the next 5 years there will be a loss of 5x $27.5 or
Even if my back of the napkin calculation is wrong by a factor of 2 or 3, this number is mind boggling – and it appears “nobody” really seems to take serious action – Why is that ?????
I guess, factors are ” it will affect not my company, since so far it has been not a major problem …” or ” .. if worst comes to worst, we can always raise salaries and get people from across the street …”
Nevertheless, if some companies might be less impacted, that means others are even more in a shortage. For the overall industry both scenarios are not good. The magic question is how to make jobs in semiconductor attractive again ?
” … dad, look at you, you are always late back home, your are always stressed, the phone never stops ringing – why do I want to choose a life like this ? …”
Being myself a long time semiconductor addictive I fully can relate to that. It might be easy to say ” the young folks nowadays don’t like to work hard anymore …” – but false or true that will not change things. The semiconductor industry will be only more attractive for new technicians and engineers, if we change within the industry, what is seen as the problem by the next generations. The companies, which react and change first will have the best chances to again attract people.
Let me throw out a few possibly controversial thoughts here:
you are always late back home
Long hours have been a sign for hard work for way too long. If people need to stay on regular basis long hours, thats a sign for understaffing or bad organized / trained organizations. Unfortunately, reducing headcount numbers is seen as the easiest way to reduce cost. Too often, the quarterly hunt for good numbers – to keep Wall Street happy – leads to cuts, which are counterproductive in the mid and long run. Frequent downsizings – which are not uncommon in the semiconductor industry – are not a strong signal to attract the next generation of technicians and engineers.
–> rethink overall human resource strategies and become much more people centric (vs. pure head count efficiency, short term thinking)
–> incorporate impact of missing or not well trained workforce in all business model calculations to put hard $ numbers behind the effects (vs. assuming, people will be there when needed and set availability to 100%)
your are always stressed
Stress typically is generated, when people feel under pressure, since they can not control their job, but are controlled by overwhelming tasks and timelines. Outside of the general not enough people issue, key reasons are not enough know-how, training and resources to successfully do the job.
–> massively invest in training and standardization, people need to know what and how to do it
the phone never stops ringing
This is another “evil’ of the modern time: Always on, always connected and no clear rules for protecting employees personal time. This might be part of the general understaffing problem, but also not having enough experienced people, who can share the burden of on call and critical problem escalation support. During COVID people realized that there is also a life outside of work. Enjoying time with family becomes more and more important. If employers do not react people will leave or not even join to begin with.
–> how about guaranteed personal time with no contact from work and possibly a general 4 day work week ?
(imagine the company across the street starts to offer 4 day work week to attract people)
I still think the semiconductor industry can be very exiting to work in: There is plenty of fancy high tech “stuff” to be proud of, to be involved for all levels of education. Salary needs to be at least somewhat competitive. If people get rock solid training and career path, there should be no reason why people do not choose a career in semiconductor. Employment will be almost guaranteed in the next 10-15 years, looking at all the shortages.
I think these are the main levers to make semiconductor industry attractive again:
- massive image campaigns to the greater public and schools
- create opportunities for young people to understand what it means to work in a FAB
- community colleges and universities to offer the needed classes to study what is needed
– input and funding to come from the semi industry
- seriously care about your people and get rid of the 24/7 grind with rules and appropriate staffing
- define semiconductor industry wide accepted job standards, which describe skills sets needed and certification levels
- training, training, training and clear career path visibility
This all will only happen if driven by the ones who have the problem in the first place – the semiconductor industry and universities that teach semiconductor engineering itself. It is not that the FABs should or can pay for everything themselves, but they need to start driving activities yesterday. Last but not least, programs like the CHIPs Act clearly need to involve workforce development with significant amounts, since else all the new FABs will not run as productive as planned . The result will be, that the attempt to bring semiconductor manufacturing back into the US will fail.
Super curious what you think about all this – please comment !
2 responses to “Data is the new oil – or is it skilled workforce ?”
Wie wahr, wie wahr.
Es bleibt spannend. Und die Frage nach dem Personal und anderen Resourcen bleibt offen.
Würde mich freuen, wenn wir uns bald mal wieder sehen würden.
Mit freundlichen Grüßen/best regards
+49 172 79 37 194
Very true. Many companies have not yet realized that it will become ever harder to get skilled employees.
Good point that new hires are not 100% up to speed immediately. For a maintenance technician the learning curve is 2 years. We need to factor this into financial considerations about “head count”. It is not only heads, it is skill. Lack of skill might become “apparent” only indirectly through yield excursions or too long maintenance times.
The Advanced Semiconductor Manufacturing Conference in Saratoga Springs is just one month away !
I’m looking forward to meet many industry experts and discuss Factory Physics topics in person. Here are some of my personal Agenda highlights:
- Keynote: Salvatore Coffa from STMicroelectronics – Silicon Carbide
- Keynote: Thomas Sonderman from SkyWater Technologies – US Chips Act
- Panel Discussion: Unintended Consequences of Government Subsidies on Moore’s Law and the Future of Semiconductors
and of course plenty of interesting technical presentations – these are some of my favorites:
- How to Teach Semiconductor Manufacturing and Why it is so Difficult
- Impact of Effective Technical Training in a Semiconductor Manufacturing Facility
- Automation in R&D: complying with contradictory constraints of seemingly incompatible world
- Granularity of Processing Times Modeling in Semiconductor Manufacturing
- Deploying an Integrated Framework of Fab-wide and Toolset Schedulers To Improve Performance in a Real Large-scale Fab
Hope to see you in person in beautiful Saratoga Springs, NY !
- Keynote: Salvatore Coffa from STMicroelectronics – Silicon Carbide
Product Mix and FAB performance
Semiconductor manufacturing is a highly complex and dynamic industry that requires constant innovation and adaptation to meet the changing needs of customers. One of the biggest challenges in this industry is managing changes to the product mix, which can have a significant impact on the overall efficiency and effectiveness of the manufacturing process.
Product mix changes can occur for a variety of reasons, such as changes in market demand, new product introductions, and shifts in technology. These changes can have a ripple effect throughout the entire manufacturing process, impacting everything from sourcing and procurement, to production and logistics.
One of the major challenges of product mix changes is the need to quickly and efficiently re-configure the manufacturing process to accommodate the new products. This requires significant coordination and collaboration across multiple departments, including engineering, procurement, and production. Additionally, changes to the product mix may also require the use of new or different equipment, which can add further complexity to the process.
Another challenge is managing inventory and supply chain logistics. With changes in product mix, the demand for certain raw materials and components may fluctuate, requiring adjustments to inventory levels and procurement strategies. Additionally, changes in product mix can also impact shipping and delivery schedules, requiring careful coordination with customers and suppliers.
Overall, managing product mix changes in semiconductor manufacturing is a complex and dynamic process that requires a high degree of coordination and collaboration across multiple departments. It is important for companies in this industry to have a robust and flexible manufacturing process that can quickly adapt to changing product mix requirements, in order to remain competitive and meet the changing needs of customers.
O.k. – the text block above was generated by OpenAI’s ChatGPT.
I asked it (?) to “… write a short article about the complexity of product mix changes in semiconductor manufacturing …”
pretty impressive, especially because it took only about 10 seconds – the next text block was it’s second try:
Semiconductor manufacturing is a complex and constantly evolving industry, with a wide range of products that are produced using a variety of different processes and techniques. One of the most significant challenges faced by manufacturers in this industry is the complexity of product mix changes.
When a manufacturer wants to change the product mix of its semiconductors, it must consider a variety of factors, including the specific processes and techniques required to produce the new products, the availability of equipment and materials, and the costs and risks associated with the change.
One major challenge in changing the product mix is the need to retool and reconfigure manufacturing equipment. This can be a costly and time-consuming process, as manufacturers must purchase new equipment and train employees on how to use it. Additionally, the process of retooling can also lead to downtime and disruptions in production, which can have a significant impact on the bottom line.
Another major challenge is the need to secure and maintain a steady supply of materials, as well as the need to dispose of any remaining materials that are no longer needed. This can be a difficult and costly process, as manufacturers must navigate a complex supply chain and ensure that they have the necessary materials on hand to produce the new products.
Finally, manufacturers must also consider the costs and risks associated with changing the product mix. This can include the costs of retooling and reconfiguring equipment, as well as the potential loss of revenue if the new products do not sell as well as expected. Additionally, manufacturers must also be prepared to address any potential issues that may arise during the transition, such as quality control problems or supply chain disruptions.
Overall, changing the product mix in semiconductor manufacturing is a complex and challenging process that requires careful planning and execution. Manufacturers must weigh the costs and risks associated with the change against the potential benefits, and be prepared to address any issues that may arise during the transition.
I promise, the following contains only 100% of my own thoughts …
One of the more tricky aspects of optimizing factory performance is understanding the true impact caused by product mix changes. The key problem here is the level of complexity to:
- calculate and/or estimate
- drive actions to reduce the impact
Unfortunately there is no “one size fits all” rule to deal with this – other than:
The higher the product mix is in a FAB – the more likely there will be performance loss.
Almost 20 years ago I was confronted the 1st time with the problem of increasing product mix. I visited a factory which had recently gone from an almost mono culture FAB to a factory running now 10+ different products. FAB cycle times increased, output went down -all this at the same wafer starts level like the years before – and the management wanted to understand what is going on … (it seems that ChatGPT today has more “understanding” about this than the management 20 years ago)
Experienced factory physics practitioners will likely smile about this, but without a good understanding of the effects of changing product mix plan and actual results will very likely not be in sync.
Let’s try to dig into the topic a bit. First we need to define what do I mean with a product. In the real world there are various levels of differences between a product and another one. Unfortunately different companies have different naming conventions for that. I will use this definition: A product is different to another one, if at least one processing or metrology / test step is different.
For example: if the manufacturing of a wafer leads to the exact same chips at the end of the wafer FAB processing and just get a different frequency capability assigned after testing ( like CPU’s) I would not call them different products. As soon as there are steps with different recipes in the flow – these are 2 different products.
The size of the impact scales with the number of differences: If 2 products with 500 processing steps have a different recipe on 12 steps are much more similar to each other, than a product which has 100 out of the 500 steps different.
What are they key effects of the differences :
- higher product mix will likely lead to smaller cascades on a process equipment since a recipe change might need additional setup times
- different recipes at the same equipment will likely have a different number of qualified tools (in the planned world but especially on the FAB floor)
- capacity planning no longer will work with simple static modeling approaches to reflect the real effective equipment and/or chamber dedication/availability
- batch building at batch tools will either lead to smaller batch sizes or to longer batch formation times
- higher number of products will likely increase the number of Litho reticles and therefore also increase reticle logistics and risk of “reticle not available when needed” scenarios
- back up reticle sets are likely less available for products with smaller WIP in the line
- small volume products tend to have higher metrology sampling rates (more often measured) since there is not enough “volume” statistics available
- achieving very high on-time delivery percentage is harder for very small volume products (if 1 lot gets scrapped, is this 33% of the total WIP ?)
- products with significantly different number of process steps (or mask layers) will likely create dynamic bottlenecks throughout the line due to different WIP arrival times
- higher number of different recipes on a tool might cause higher re-qualification time needs, which can impact tool availability negatively
- frequent changes in the product mix wafer starts will likely amplify the negative impacts of points 1. – 10.
How to deal with all of these complexities – especially if your FAB is going down the path of increasing product complexity ?
There are basically only 2 practical approaches to that:
A) – reduce general FAB loading and see what happens (with no guarantee what will be the outcome)
B) – invest in IE systems and experts to be able to quantify, calculate and plan for the expected effects
I have seen both in the real world, with B having significantly better chances to hit cycle time, output and on-time delivery goals.
Rework and FAB cycle time
In todays post I will discuss the impact of rework on the overall FAB cycle time.
Rework can happen for various reasons and at different process steps. Most common it occurs after a lot has been processed at a photo step. The picture below shows a typical scenario
After a lot was processed at the photo step “A”, typically it will be measured at a metro(logy) step to see if the photo step was done within the desired specifications. If not, the whole lot or some wafers of the lot will be send into the red depicted “rework loop”.
There are different implementations of this in the real world. In some FABs the rework wafers will be physically split into a new carrier and the good wafers will wait at point “B” until the rework wafers are back. If this happens there will be additional sorter steps to execute the split and later the merge. These are not in the picture above, but both will consume wait time and process time, additionally to the times the rework wafers will need until they get back to point “B”
Another scenario in more advanced FABs will keep all the wafers in the same carrier and send the full lot through the rework loop. In this approach there are 2 possible execution flows:
- one where only the “bad” wafers get processed in the rework loop steps
- one where no matter what, the complete lot (all wafers) get the rework process.
Obviously all this different versions will have different cycle times – and therefore a different impact on the overall FAB cycle time.
In general any rework is a hint on missing stability in the overall process. Any rework move will consume capacity on the involved processing equipment. Typically this is most impactful on the Photo equipment itself. For example, if the average rework rate in a FAB on all Photo steps is 3%, this adds 3% of tool utilization and likely a few more cascade breaks to the dispatch list or schedule. Since Litho Tools are usually some of the highest utilized tools in the factory, this will drive their utilization even more up and therefore the average lot cycle time of all lots at the photo steps will go up. This effect was already discussed in earlier posts.
On top of this “higher Utilization drives higher cycle time” effect on the photo tools themselves, the effect might be there also on all other involved tools. The impact there is likely less, since the base utilization of the non photo tools is probably lower.
So, how much adds the rework processing loop to the FAB cycle time ?
To calculate this in detail, we would need the exact data from the FAB of interest, but here is a simple formula which should work for a decent estimation:
Let’s put a few example numbers together:
Lets assume the FAB of interest has 30 mask layers (photo steps) and has a base cycle time of 60 days (2 days per mask layer).We see an average rework rate of 3 %.
How much is the typical time a lot spends in the rework loop ?
This depends as discussed on the exact form of the rework and the logistics around. I think typical numbers for a resist removal will be in the 10 – 20 minute range. Clean steps depend on batch or single wafer clean, but lets assume another 30 minutes. Additional metrology and possibly sorter steps might add another 30 minutes process time.
To keep it simple, lets assume all the rework route related process steps accumulate 1.5h of processing time. The key missing part is how much wait time will the average rework lot accumulate at each step ?
This depends heavily on the priority the rework lots will get (and of course the overall tool utilization of the rework tools). Most FABs I have seen use a rather high priority for rework lots – so lets assume they run with an x factor of 2. This will lead to a cycle time of about 3h for the rework loop, plus the second time through photo and the following metro step.
A good scenario could be an about 5 – 6 h adder for each rework round per lot. If the priority for rework lots is not very high it can be easily 8 to10h. As a matter of fact, I have seen rework rout cycle times greater 12h …
Let’s apply this assumptions to the red part of the formula:
3% x 30 x 6h –> 5.4 hours
The additional cycle time due to higher photo tool utilization is likely anything between 15 and 30 minutes per photo layer, so a total of possibly around 10h in my example factory. If we use the formula above, the original overall 60 days FAB cycle time will be increased by 0.5 .. 1.0 days with the given assumptions. Of course the impact will change if the rework rate is higher or the rework loop cycle time is significantly extended.
Summary: As long as rework rates are reasonably low and the time lots spend in the rework loop is short there is a small impact.
If you are interested in the topic of FAB cycle time reduction – I strongly recommend to head over to
In their newsletter (volume 23, No. 6) is an excellent “FAB Cycle Time Improvement Framework” discussed – great read !
Since this will be very likely the last post in 2022 –
I wish all my readers a few quiet days to recharge and a successful 2023 !
Equipment Uptime and FAB speed, part 3
My last post closed with a poll on achieved M-ratio values. Here are the results:
Unfortunately, not too many readers participated – so the statistics of the result are a bit weak. To some extend the data is reflecting my personal experience – what I have seen in various FABs. There is a significant amount of FABs (40% in the poll results) which have M-ratios below 1. In other words these FABs experience more unscheduled downtime than scheduled down time. The majority of the FABs in the poll (60%) shows a M-ratio greater 1 – they have more scheduled than unscheduled downtime.
Interestingly there are no M-ratios greater 5 looks like – which means at least 16.7% of all downtime is unscheduled. Compared to other industries this looks not too good. Imagine for example your car would have an M-ratio of only 5 …
Similar to semiconductor equipment, a car is nowadays a complex piece of machinery, but for sure there is not a lot of unscheduled downtime – at least not mission critical break downs.
With that in mind and knowing that cars of course have humans to transport and therefore the focus on safety and maintenance is (obviously) very different, I came up years ago with the picture below to define M-ratio classes for the semiconductor FAB world:
The reasons for having an M-ratio below 1 might be plentiful, but a key for that is for sure the general maintenance strategy of a FAB. M-ratios below 1 indicate in general a “run to fail” strategy. Often the reason for that is the cost aspect of a dedicated Preventive Maintenance set up, since man power, parts and meticulously executed scheduled maintenance are not easy and cheap to have. Another reason might be the age of the equipment and the availability of spare parts.
M-ratio as an indictor is not good for use as the one and only goal – since the real goal of equipment maintenance is to enable highest possible uptime – with “no-surprise” down times – but knowing the M-ratio of your factory might help to identify improvement opportunities.
Often the equipment organization in a FAB is measured (and valued) by indicators which only cover the pure equipment aspect. Therefore all optimization is focussed on getting these equipment centric indicators “look good”. Equipment performance and if an equipment goes down unplanned often can have massive impact on the WIP flow of the factory. Therefore the impact on the FAB cycle time and worst case on on-time delivery is not always taken into consideration.
In all my years I have seen a few typical “strategies” to deal with that problem:
- “run to fail” least cost maintenance approach (from a pure maintenance cost point of view)
- “zero unscheduled downtime” as an overarching end goal – to be fully in control, possibly even at the expense of lower overall uptime
- “predictive maintenance” do only interrupt when needed
While “run to fail” might be the easiest strategy to execute, it is also a completely reactive way of taking care of the equipment and often not good for the overall FAB performance. Aiming for almost no unscheduled downtime in the traditional way needs a very systematic and disciplined Preventive Maintenance program, which has been demonstrated to be doable but it comes at the expense of sometimes taking tools down for a scheduled maintenance, when it would be not really needed.
“Predictive Maintenance” seems to be the best solution of both worlds since it would only take an equipment down if it is really needed. The key here is to define “really needed” and to avoid running to fail. It would need to be detected early enough, so the needed action can be planned – for example “some time in the next few days” when it fits to the overall FAB and WIP situation.
I have seen papers and presentations about predictive maintenance for many years and it seems it was always the best thing to do.
The process and metrology equipment in a FAB represent a significant part of the total FAB investment and define to a large part what the overall FAB capacity and FAB speed will be. So one would assume that maintenance must have high priority, but M-ratio values in 2022 do not always support this assumption.
If you want to get more insight on the M-ratio indicator – a great read would be James P. Ignizio’s book
In chapter 8 of his book – titled
“Factory Performance Metrics: The Good, The Bad, and The Ugly”
M-ratio is discussed as well as other uptime related indicators.
Equipment Uptime and FAB speed, part 2
To illustrate which uptime pattern from the last post might be more favorable I will add some tool utilization to the same 3 charts:
the combined charts for the 3 scenarios are these:
This example assumes that the productive usage every single day is a flat 80% of the total time, which is a very optimistic assumption since in many factories with moderate to high product mix WIP arrival is highly variant for various reasons.
In my opinion: the tool group with less variability in the uptime pattern is in general a preferable situation – with one big exception: If all of the tool group downtime is scheduled downtime and therefore could be planned. The downtimes could be – at least in theory – perfectly synchronized to the WIP arrival patterns, which would reduce significantly the impact of downtime on the WIP flow and therefore also on the cycle time.
There is a very interesting indicator out there (that tries to measure exactly this) how much of the total downtime of a tool group is planned vs. how much is unplanned. It is actually a ratio:
M – Ratio ( or Maintenance ratio)
To calculate the M ratio of a tool group of interest – just sum up all scheduled down time hours and divide them by the unscheduled down time hours. The data collection timeframe needs to be big enough to capture all typical down events, so I recommend to use data at least from 3 months of history or more.
In the example above the result is 1 or in other words, this tool group has the same amount of scheduled or unscheduled down time. With respect to the synchronization to WIP arrival idea it would be of course good to have a better (higher M ratio). For example: a M ratio of 2 would indicate that only 33% of the downtime is unscheduled.
Before I dig deeper into the M ratio concept and how it can help I like to hear from the uptime experts out there: What are typical M ratio numbers you tool groups achieve ? Or should I ask : What M ratio numbers do your equipment teams achieve ?
Very curious to see the feedback. I will discuss the results in my next post
Equipment Uptime and FAB speed, part 1
I like to resume with the posts on the topic of FAB cycle time drivers. As mentioned in an earlier post – these are some of the key drivers for factory cycle time:
- overall factory size (number of equipment available to run a certain step)
- overall equipment uptime and uptime stability
- rework rate
- product mix
- number and lengths of queue time restricted steps in the process flow
- lot hold rate and holt times
- degree of automation of material transport
- degree of optimization in the dispatching / scheduling solution
I covered the factory size topic already – so here are a few thoughts on equipment uptime. I think everybody knows and agrees that equipment uptime is a very key parameter. Equipment Uptime has direct implications on the factory capacity and even in the most simple, Excel based capacity planning model, you will find a column for planned equipment uptime. But of course the impact on capacity is only one aspect.
One could ask: Capacity – at what factory cycle time ? Based on the earlier discussed operating curve, a FAB has different capacity at different cycle times. Here comes the equipment capacity into the picture. The ideal FAB achieves the planned uptime on each tool group also in real life.
But what does it mean, if a tool group achieves its planned uptime ? We need to look a bit closer. Over what period of time does the tool group achieve for example 90% ? The capacity planners typically assume an average uptime number and somewhere in the fine-print you can find if this is meant to be for a 1 week, 4 weeks, 13 weeks or another time frame. For the real FAB these timeframes of interest are usually much shorter – if a few key tools are down right now for let’s say 2 hours – that might create already a lot of attention.
A deviation from the average planned uptime has the potential to impact the FABs cycle time. Assuming that the incoming WIP to a tool group is somewhat constant over time (which is already an optimistic assumption) higher or lower average uptime will result in higher or lower effective tool utilization and that means the wait time of lots will be different:
If we zoom in a bit more, tool groups with the same average uptime might have different impact on the lots wait time based on how the day to day, shift to shift or even hour to hour uptime looks like.
Below are 3 “constructed” uptime day to day cases to illustrate that.
Tool group A has every single day 10% downtime (red) and 90% uptime (yellow) – no big surprise that the average uptime is 90%
Tool group B has alternating days with 80% or 100% uptime – which will result in the same 90% average uptime for the full time frame
Tool group C has a very different down time pattern, but the average will be again 90%. To make you believe it – take visually all the red blocks more than 10% and fill them into the 100% uptime days and you get the picture from tool group A)
If your capacity planning team is using average uptime values for capacity planning, these 3 tool groups are treated exactly the same. For static capacity planning purposes this will be fine, but if you like also to calculate/estimate/forecast the overall factory cycle time, these 3 tool groups will very likely impact the WIP flow differently and therefore the lot cycle time will be different as well.
Once this point of general understanding is reached the obvious next questions are:
1. Which uptime pattern is better for my factory A, B or C ?
(better as in: enables more stable and lower cycle time)
2. How do I change the not so good ones to look more like the best one ?
I will discuss this a bit more in the next post.
Summertime (Blog) Blues
It has been a while, but summer in upstate NY is too beautiful to not enjoy it as much as possible. Downside is (at least for the blog) – I spend less time on the computer to write new posts.
So instead posting about Factory Physics and Automation topics, I do more of this:
Later in fall there will be more time again for writing – for the time being just a very short post today.
There were 2 very nice articles written, referring to earlier posts from my blog and I think they are well worth the time reading:
Factory Size and how to benefit from it using advanced MES functionality – an article from Critical Manufacturing: LINK
Flexciton looks a bit more into the challenge of load port scheduling and load port utilization as a performance indicator: LINK
Enjoy the summer !
to get automatically notified about new posts:
19th Innovation Forum for Automation
Last week I had the honor to present a keynote at the 19th Innovation Forum for Automation in Dresden, Germany. After 2 years of virtual conferences, this year it was a full in person event again.
Here are the slides of my talk:
update: here are the video recordings of both days ( these are 3-5h videos):
Wafer FABs – how many are there ?
Today only a super short post. If you have ever wondered, how many Semiconductor Wafer FABs are there – here is a great article on that topic from Daniel Nenni on SemiWiki: LINK
Fab cycle time and capacity planning
I did not discuss the results of the last poll yet. This post will focus on that.
Unfortunately, not a lot of readers did participate. The data is statistically more on the weak side, but I think the outcome is in line with what I was expecting:
It seems that 70% of the voters use a rather simple method to define the maximum allowed tool group utilization. This matches with what I have experienced in a lot of FABs.
Given the massive implications the FAB capacity profile has on the FAB cycle time it is surprising, that in todays heavily data driven world not more advanced methods are used. I wonder what is the reason for that ? Here is some speculation from my end:
- not enough resources to manage the significant amount of data
- input data quality for capacity planning is limited due to grouping of products and averaged assumptions
- real factory performance data is highly dynamic and hard to forecast for the next 2…3 months
- planning scenarios change so frequently, that a more detailed planning takes more time than the next scenario ask rolls in
- decision makers are used to simple rules like flat 85% – since they have worked for the last 20 years to some extend and more advanced methods are “black magic” and capital intense decisions will be not based on “black magic”
- FAB cycle time is more a high level target, the operations/engineering department needs to figure out in the daily business how to get the cycle time down
- or maybe the FABs which use more advanced methods simply did not vote here
I would love to hear feedback on these topics.
Leave a Reply
Over the next weeks my posting activity will slow down a bit due to a lot of travel on my side. One special highlight is coming up with my visit in Dresden, Germany to participate at the
where I will have the honor to give a talk. Check it out here (LINK) and maybe we can meet in Dresden in person . After the event I will post the slides here.
US Semiconductor Ecosystem Outlook
I recently attended the Advanced Semiconductor Manufacturing Conference (ASMC) in Saratoga Springs, NY and listened to a very interesting presentation.
Bill Wiseman from McKinsey & Company spoke about the future of the US semiconductor ecosystem and a few fundamental challenges which will have significant impact.
Bill is a longterm insider of the semiconductor business and I got his permission to post his slide deck here:
View into a FAB
Happy Easter everybody !
Today only a very short post – since I’m discussing here normally how semiconductor FABs work in terms of cycle time and output, I thought why not have a quick look inside a FAB ?
This post was triggered by a recent YouTube post of an Intel FAB walk, which nicely explains how a modern FAB looks like. Please see below a few links to see inside a FAB:
Intel in Israel: LINK
Intel in US: LINK
Bosch in Germany: LINK
Micron in US: LINK
GlobalFoundries in Singapore: LINK
TSMC in Taiwan: LINK
Vishay (200mm) in Germany: LINK
Infineon (200mm) in Germany: LINK
Bosch (200mm) in Germany: LINK
Equipment load port utilization vs. FAB speed
I received and interesting comment to one of the older posts:
The topic is indeed very interesting. Most modern semiconductor processing equipment come with 4 load ports. Maine reason for that is to ensure the process chambers can be utilized as much as possible and do not have idle time because of exchange of lots. A simple 3 Chamber tool with 4 load ports might look like this:
Individual wafers will be removed from the carrier on the load port and travel through the various equipment modules depending on the actual process recipe sequence. An example is shown below:
Let’s assume there are lots with 5 wafers processed on this tool. The wafer will sit in different chambers at different times. Below is a simplified example, which ignores the time a wafer spends in the transfer chamber for handling.
The reason why tools have more than 1 load port is illustrated in the picture below. For example: let’s assume load ports 2, 3 and 4 are down and only load port 1 can be used.
Lot 2 can only be loaded after lot 1 has finished and was unloaded from load port 1. This will lead to idle process chambers:
Having more than 1 load port available will allow to load the next, while the 1st lot is still processing and therefore these chamber idle times can be prevented, since the 1st wafer from lot 2 can be processed immediately after wafer 5 from lot 1:
To load lot 3 early enough – to prevent chamber idle times between lot2 and lot 3 – load port 1 is again available:
At least in the example above, 2 load ports would be more than enough to keep the process chambers busy all the time. Why do equipment vendors deliver most of the tools with 4 load ports ?
How many load ports are really needed depends on a lot of factors. In my basic example above I ignored most of them. As always the devil is in the detail, but here are some factors which influence the need for more load ports:
- number of process chambers on the tool
- process times of the individual chambers
- time a wafer spends for transfer between chambers
- wafer flow logic through the tool (serial, parallel)
- load port down times
In my experience most of the tools with lot processing times greater than 15 … 20 minutes can easily be fully utilized with 2 load ports, since there is enough time to transport the next lot to the tool, while the current lot is still processing.
But let’s go back to the comment which initiated this post:
I completely agree, that having always all load ports loaded will lead to higher wait times. Here is the theory behind that. To illustrate the effect I will use a simple FAB with 4 process steps, running on 4 different process tools. Each process tool has 4 load ports and the lot process time is 1h on each step. Lots have 25 wafers each:
In this scenario – with all 4 load ports always loaded – these would be the factory performance data:
Let’s look at a second scenario, which only has WIP on 2 of the 4 load ports:
Since there is now significantly less WIP in the factory, the overall factory cycle time is much faster – at the same FAB output:
Here is my take on this: Having multiple load ports on a processing tool is for sure very beneficial since it will enable maximum possible equipment utilization. Having all tool load ports loaded with lots all the time is definitely in most cases not needed to achieve maximum factory output and clearly a sign of a relative slow factory. As a matter of fact, one can easily estimate the overall factory X factor by this logic:
Average number of lots waiting per tool equals the FAB X factor.
This ignores that lots might have different number of wafers and different processing times for different products at different steps, but if a FAB has always all 4 load ports on all tools loaded with lots and possibly 2 more lots waiting in stockers, this FAB will run not faster than an X factor of 6.
Another interesting fact is that there are a few tools (mostly with very fast processing times) where even 4 load ports are not enough to always feed the tools with wafers fast enough.
A last statement on the topic of load port utilization: I have seen in multiple cases that the manufacturing departments use load port utilization as a metric – mainly with the interpretation that idle load ports are “bad”.
I think this is driven by the general desire to have tools fully utilized and have enough WIP on the tool for the next hours, so even in case upstream tools have a problem or lot transportation is slow, the tool group of interest can still process “full steam”
Wafer FAB – size does matter !
In one of the earlier blog posts (LINK) I received interesting feedback on what are “acceptable” FAB cycle times. The results showed big differences and I think this is mainly based on voters professional experience. There are a lot of factors which influence a factories capability to achieve a certain cycle time. If we assume 2 factories are running the exact same technologies and process flows – but have different actual factory cycle times – the difference will not come from the process times of the lots, but mainly from different wait times. Key drivers for wait times are:
- overall factory size (number of equipment available to run a certain step)
- overall equipment uptime and uptime stability
- rework rate
- product mix
- number and lengths of queue time restricted steps in the process flow
- lot hold rate and holt times
- degree of automation of material transport
- degree of optimization in the dispatching / scheduling solution
Let’s dig into a few of them in more detail.
One of the biggest drivers for factory cycle time is the size of the factory itself. The key reason for this is that processing equipment does not have 100% uptime. In a very simple way: If a factory would have only 1 equipment to process a certain step and the equipment is down there is no path for the lots and they have to wait until the equipment is back up. If there is more than 1 equipment available, lots have a path to progress and there will be less waiting time. This effect can be seen very well with the help of operating curves:
Having more than one tool available to run lots will massively reduce the average lot wait time, if all other parameters of the tools are the same. Everyone in manufacturing knows this effect and for that very reason avoids having these “one of a kind” situations.
It also can be seen, that the effect going from 2 to 3 tools is smaller than going from 1 to 2 tools. I think a golden rule in capacity planning for semiconductor FABs is: “… avoid one of a kind tools as much as possible – or plan with very low tool utilization for these situations …”
For example if there is no way around a one of a kind tool and you still need to achieve cycle times around a X-factor of 3 – in the given setting – the maximum allowed tool utilization would be 44% !
The real interesting thing here is that of course each tool set’s operating curve is shaped differently and in order to understand the impact on the total factory cycle time, one needs to know and understand the operating curves of all tool sets in the factory. A second aspect to keep in mind is: How many times will a lot come back to a tool set – since this has big impact on the factory cycle times as well.
Example: The operating curve shows a X factor of 3 and the processing time at the tool group is 1 hour, which means there will be 2 hours of average wait time for each lot. Here is the impact on the overall factory cycle time based on the numbers of passes (number of times this tool set in in the flow)
Look at the last column – the impact can be massive !
Having all these effects in mind, I think it is easy to feel “relatively safe” if a FAB has at least 3 or 4 tools available for each process step. In my opinion it is a much better situation than having 1 or 2 tools available, but often the “name plate capacity” number of tools available in a capacity planning model is one thing.
How many tools or chambers are really available (and not inhibited, temporary disqualified or for other reasons not used) is often a different picture. Also, based on my experience, typically the actual number of tools available on the floor is seldom bigger than in the capacity (and cycle time) planning model.
improvement potential: Frequently check the real available number of tools vs. your capacity plan !!!
Back to the statement “… having 3 or 4 tools available is relatively safe …” – the positive effect of having more tools is of course also there at greater number of tools. As in the picture below, a 4 tool tool set runs nicely at an X factor of 3, but look what happens to cycle time if we would have significant more tools:
This is the real reason, why the big players in the semiconductor wafer FAB business build MEGA FABs. Having a very large number of tools in parallel allows to run at very fast FAB speeds and still utilize the expensive tools much higher. Now take into account that tool pricing also will be lower – if a FAB orders 10 tools instead of 2 or 3 – the whole thing becomes really desirable.
Of course, building a very large FAB requires a lot of upfront capital and also the expected demand needs to be big enough to fill a bigger FAB, but if you have the choice and are in doubt – always GO BIG, it will pay benefits for many years to come.
To close this post – I’m curious how the industry is dealing with the effects described above – specifically for planning purposes. In your capacity planning model: How do you define the maximum allowed tool utilization for a tool group ? I’m assuming that 100% planned utilization is not a legit assumption to avoid extremely high FAB cycle times. How is this modeled based on your experience ?
The value of 1 day of factory cycle time
Thank you everyone who participated in the last poll. Participation was significantly down beside the fact, that there were plenty of post viewers. My interpretation is that the readers are not too sure about the actual value of 1 day of cycle time. This observation is also in line with my personal experiences from working in semiconductor wafer FABs. It seems like that everybody acknowledges that fast cycle time is a good thing and it would be valuable to work on that – but what the actual value is – there is no clear understanding. The results of the poll itself look accordingly:
the same data sorted by the $ value:
The majority of voters pointed towards a few hundred thousand dollars but 33% said it is a million dollars or more !
I think one of the reasons why the real value of cycle time is not clearly defined is the missing of an accepted and standardized model how to calculate or at least how to estimate. I have seen a few different approaches from very simple to very complex – and what is worse – different models will generate different results, which does not really help to build confidence in the numbers.
One very simple model is the following:
If we look at our factory operating curve and assume we are running at our voters favorite operating point:
800 wafer starts per day at a X factor of 3 (or 60 days cycle time)
we can extrapolate the value of 1 day of cycle time by the following logic:
- 800 wafer starts per = 800 x 365 = 292,000 wafers per year
- 292,000 wafers per year times $1,150 selling price = $335.8 million revenue
If we now use the factories operating curve and “look” to the left and right of the current operating point we can do a very simply estimation of the value of 1 day of cycle time:
Since the operating curve is non linear there is a difference if we look towards lower or higher utilization – but if we assume only small changes around the current point – we can ignore this.
Towards higher utilization:
plus 50 wafers starts will lead to 20 days more cycle time or in a simple ratio:
2.5 wafers per 1 days of cycle time.
2.5 wafers x 365 days x $1,150 = ~ $1 million revenue
Towards lower utilization:
minus 100 wafers starts will lead to 20 days less cycle time or in a simple ratio:
5 wafers per 1 days of cycle time.
5 wafers x 365 days x $1,150 = ~ $2 million revenue
This is a big difference between the 2 numbers – but even if we use the smaller one to be on the safe side – $1 million is a serious number. Keep in mind, all the other benefits of faster cycle time are ignored in this simple model.
Another model – significantly more complex – which takes into account:
- revenue gain due to faster cycle time versus falling selling price
- revenue gain due to faster yield learning
It was developed by professor Robert Leachman. He teaches this method at the University of California, Berkeley. The complete coursework can be found here: LINK
I will not dig in more into the “fascinating world” of models to calculate the value of cycle time – instead will discuss a bit more the practical application of the value of speed.
Clearly the value of speed depends also on the overall market situation. In very high demand situation customers might be willing to tolerate higher cycle times, if they just can get enough supply. Factories tend to start more wafers in these conditions and simply “cash in”.
Still if the engineering team could implement measures to reduce the factory cycle time by lets say 1 day – management could “use” this gained 1 day of FAB capability to start a few more wafers – driving the FAB back to the previous speed, but deliver more wafers = more revenue.
In this scenario the question is:
1 day of cycle time is worth $1 million. How much will be the engineering team allowed to spent to enable this 1 day of cycle time reduction ?
This comes down to how ROI is handled in the company – but there is a path to calculate this. It will enable dollar spending for cycle time and this is based on a model – which will support decision making – if a measure or change is worth implementing or not.
In my next post I will start discussing a few more details around the operating curve and most important – what can be done to improve.
Chip shortage and FAB performance, part 3
Today only a very short post !
Very interesting poll results ! Of course the poll left a lot of things open to free interpretation and assumptions, but as expected voters had different opinions. Here is the feedback chart:
The same data in the context of the operation curve:
If I ignore the outliers at the 250 and 500 wafer starts per day mark the largest group of voters would trade
1 X factor of FAB speed for 100 additional wafer starts per day
and start 800 wafers per day instead of 700. Lets try to convert this into more understandable numbers:
1 X factor = 1 raw process time – for a “typical FAB” this could mean anything between 10-25 days of FAB cycle time. On the other hand, what do +100 wafer starts per day mean financially ?
Let’s assume there is an average profit of $150 per wafer – an additional 100 wafer start per day would add up to 100 x 365 = 36,500 more wafers per year or $5,475,000 more profit per year. This seems like an absolute no brainer.
How about an additional $5.5 Million dollar profit if we go to 900 wafer starts per day or $11 million additional profit ? Now the cycle time penalty looks very different. We would pay with 4 X factors or 40 – 80 days more FAB cycle time – still a no brainer ?
Here is a table for how this would look – assumption FAB raw process time = 20 days
I think it comes down to the famous question:
What is the value of FAB cycle time – the value in $$ ?
Based on the table above it seems like shorter FAB cycle times are not really desirable, but at some point customers will turn away to order wafers from someone else – if lead times between order placement and actual wafer delivery are too long …
Experienced wafer FAB practitioners know that shorter overall factory cycle times can have a lot of positive effects:
- faster learning cycles to improve yield
- lower overall FAB WIP – lower overall inventory cost
- lower overall FAB WIP – lower risk of excursion impact
- faster detection of possible process issues
- faster reaction capability on demand / product mix changes
- likely better on-time delivery for low volume products
The question really is: Until which point is it more beneficial to run the FAB faster (lower cycle time) since the benefits of being fast or outweighing the benefits of higher profit due to higher FAB output ?
I can not resist to put up another poll here to see, what the readers think is the value of 1 day of shorter (or longer) FAB cycle time. For our example factory above – running at these parameters:
- 800 wafer starts per day
- X factor of 3 or 60 days total FAB cycle time
- about 47,500 wafers of total FAB WIP
- 98% manufacturing yield (wafer scrap based yield)
- 90% die yield (electrical yield)
- cost per wafer of $1,000
- selling price per wafer of $1,150
- selling price more or less stable (chip shortage driven)
- high product mix in the FAB (greater 250 different products, running on more than 100 routes)
I can’t wait to see the results. To give a few more readers a chance to vote, I will keep this poll open for about 3 weeks – so the next post will be some time end of February.
Chip shortage and FAB performance, part 2
Reflecting on the wide spread of acceptable wait times and therefore acceptable FAB cycle times from the poll results, I was wondering: Why do people have these different opinions. I think it has to do with the actual factory conditions, the individual voters have experienced in their professional careers.
To have fast cycle times is an obvious goal, just how fast is “good” or possible ? The expectation must be influenced by real world experience, else everyone would have voted for the “less 30 minutes” bucket.
This leads to the question: Why do different FABs have different cycle times or different X factors ?
Absolute FAB cycle times are of course depending also on the raw processing times (RPT) of the products in the factory. For example:
RPT wait time cyle time X factor Factory 1 10 days 20 days 30 days 3 Factory 2 20 days 40 days 60 days 3
If just looked at the absolute cycle time – it seems that factory 2 is much slower, but in terms of how much wait time compared to the processing time (aka X factor) both factories perform similarly.
To really be able to “judge” or compare Fab speeds , cycle times or X factors need to be normalized to the overall factory loading or factory utilization. To explain why this is important I will use the picture of a 3 lane highway.
Imagine you use this highway for your daily commute to work and lets assume these basic data:
- distance from your home to work = 30 miles
- speed limit on the highway = 60 miles per hour
- you are not driving faster than the speed limit
- your “raw driving time” = “raw processing time” = 30 minutes
- the highway (the factory) is everyday the same, it has 3 lanes and a speed limit of 60 mph
Let’s try to answer this question: How long does it take to get to work?
I think everybody will agree, that there will be very different driving times (cycle times) for the different days and times – all happening on the exact same highway (factory). The difference is the utilization of the highway. Now lets assume the same highway, but we throw in a lane closure – which actually means the highway has now reduced capacity:
The table below shows some assumed drive times (think cycle times):
The point of this example is, that the drive time on the highway is depended on how much the highway is utilized. Also important, the highway capacity has impact on the highway utilization and therefore on the drive time as well.
If we plot the data points in a chart it will look like this:
If we translate this picture into a semiconductor wafer FAB, there are a few interesting points to note:
- the FAB itself has a certain capacity
- the capacity of the FAB will not be stable if things like number tools, tool uptime or product mix change
- the utilization of the FAB is a result of a decision – how many wafers to start
- the very same factory can have completely different cycle times depending on the FAB utilization
I personally think this behavior, which is famously know as the operating curve, is one of the biggest challenges in the semiconductor manufacturing world (assuming that process stability and yields are under control )
Each FAB has such a curve describing the factories ability in terms of what average cycle time can be achieved at which utilization level. Very important: the operating curve of different factories are extremely likely different (the shape of the curve)
The factory operations management team has “only” 3 tasks here:
- to know how the FABs operation curve looks like ( aka: what cycle time can be expected at which fab loading or utilization level)
- make a decision, how many wafers to start to achieve a desired cycle time and FAB output level
- execute daily operations and constantly improve the factories operating curve
To close today’s post, I like to ask again for your input. If you would be the FAB manager of the factory below, what would be your wafer starts decision – assuming you have enough orders to even start 1000 or more wafers per day ?
Results will be discussed in the next post.
Chip shortage and FAB performance, part 1
I like to be open – I could not resist to use the trendy “chip shortage” term to generate some interest. Everything I will discuss in this post series is of course fully applicable even in times without a chip shortage.
Let’s start with the results of my last poll:
The spread of the answers is bigger than what I did expect to see, but it makes sense to some extent. Let’s chart the same data in a different way, sorted by the wait time buckets:
What this means is: For the same assumption on “fully loaded FAB” wait times between below 30 minutes and up to greater 4 hours are seen as acceptable. Let this sink in …
How does it impact FAB performance ? It will result in significant different total factory cycle times.
In order to illustrate that, let me put a few assumptions down to estimate what these wait times really mean:
- about 80% of all steps of a product flow typically fall into the category “processing time 30 – 60 min.”
- the remaining 20% of the steps are shorter or longer – let’s assume it will average out to 30- 60 min. as well
- for the estimation I set the 30 – 60 minutes range to a fixed 45 minute processing time
The cycle time of a single step in the product flow will be always calculated as (ignoring any lot on hold times):
Based on that we can easily calculate the cycle time of a step, given different wait times. For the wait times from my poll it would look like that:
Another very common indicator to measure and compare cycle time is “X factor”.
Here is the definition of “X factor”:
The same cycle timetable from above now including the X factor:
The true implication of the differences in what is an acceptable wait time comes to light if we scale this up to full factory level. For illustration purposes let’s assume the following FAB parameters:
- typical products have 40 mask layers
- average of 15 steps per mask layer or 40 x 15 = 600 steps in the flow
- basic assumption of 45 minutes average process time per step (as discussed above)
With these input parameters the total acceptable cycle time of this FAB would look like this:
Different factories with different “acceptable wait time” assumption would have multiple months different cycle times for the same type of product.
I’m very sure, that FAB management with actual 80 days cycle time would really love to get down to 50 or 40 days – not to talk about 30 days. The magic question is: How ?
In my next post I will start looking into that.
Bottlenecks – download
here is the full bottleneck series as downloadable PDF file:
Bottlenecks, final part
Happy New Year !
This will be the last part of the Bottleneck discussion. As mentioned in part 3 – I think the most objective and telling indicator to see what is the true factory bottleneck is:
highest average lot wait time at a tool group
Wait time or cycle time in general is one of the very few indicators which can not be easily manipulated or “adjusted” by using different methods of calculation or aggregation. Time never stops and measuring the time between a lot arrives logically at a step and it starts processing at the step (on an equipment) are 2 simple time stamps which are typically recorded in the MES of the factory. For example:
lot arrived at the step: 01/02/21 4am
lot started processing: 01/02/21 10am
The wait time of the lot is super simple –> 6 hours.
The beauty of this metric is that no other information is needed – just these 2 time stamps. It will cover any possible reason why the lot waited 6hours – no matter what:
- equipment was not available due to down time
- equipment was not available since it was busy running another lot
- lot was not started due to missing recipe
- lot was not started due to no operator available
- lot was not started since operator chose to run another lot
- lot was not started due to too much WIP in time link zone
- lot was not started due to schedule had it planned starting at 10am
- lot was not started due to … “name your reason here”
One key part of the FAB Performance metrics – as discussed in part 2 – is:
- deliver enough wafers in time –> customer point of view –> cycle time of the FAB
In other words once the decision was made to start a lot into the factory it has some kind of target date/time by when this lot needs to be finished or shipped. Any wait time is by nature now a “not desired state” especially if the wait time is “very long”. That means tool groups which generate the highest average lot wait time will be very likely the biggest problem or bottleneck.
Let’s have a look at some example data to illustrate that:
The chart above shows the average lot wait times per step of our complete factory. Some steps have 1h wait time others have up to 6h.
Since this chart shows the data by step in the order of the route or flow it does not tell immediately which tool groups are the various steps running on.
The same data – including the tool group context – will tell this better:
If we now aggregate and sort this by tool group instead of step we have our bottleneck chart:
From this chart tool group 7 clearly has the greatest average lot wait time of all tool groups. An interesting version of this chart is the “total wait time contribution” chart which shows the sum of the individual step wait times.
For example tool group 7 has 3 steps in the route and on average a lot waits on each step 6h. If we plot the same data as “total wait time contribution chart” we will not average the wait time of the individual step but add them: Tool group 7 will show 6h + 6h + 6h = 18h of total wait time for each lot.
Note that the sort order of the tool groups is now different. For example tool group 1 which on average has the lowest wait time (1h) is now ranked as number 4. From an overall “is this tool group a problem for the factory ?” point of view I say no – since lots barely waiting there – it just happens that tool group 1 has a lot of steps in the flow. I strongly lean to the average chart for the overall definition of the FAB bottleneck but recommend always to have a look on the cumulative chart as well.
In part 2 of the Bottleneck blog series I discussed the “Factory Utilization Profile chart”. I think this chart enhanced with the wait time data from above will give the “complete view” what is going on in the factory and will spark enough questions to dig in deeper at the right tool groups.
The chart below shows the data sorted by the highest average cycle time:
Obvious question is: Why is there so much wait time on tool group 7 at such low utilization or asked differently: half of the time the tool group is idle – why do lots wait on average for 6 hours ?
Or another one: How is tool group 1 able to achieve such low wait time ?
At this point I like to stop for a second and point you to an excellent source of additional discussion on the the topic of bottlenecks and cycle time:
If you subscribe to the newsletter, you will have access to past editions as well !
Let me get back to the statement: Any wait time is by nature now a “not desired state” especially if the wait time is “very long”
Given the nature of the wafer FAB the ideal case of zero wait time at all steps is not very realistic since there are too many sources of variability in a factory. Therefore experienced capacity planners and production control engineers typically set an expected wait time target per step (and therefore by tool group). Using these expected wait times, the definition of “very long” becomes easier.
For example if
tool group A has an expected wait time of 2hours
tool group B has an expected wait time of 5hours
An actual achieved wait time of 6 hours would be kind of tolerable on tool group B but clearly seen as very high on tool group A.
Setting expected wait times per step and/or tool group depends on a lot of parameters, like:
- planned tool group utilization
- number of tools in the tool group
- duration of process time
- batch tool / batching time
- lot arrival time variability
- many others
I’m curious what the readers of this block think would be an acceptable average wait time for non bottleneck steps in a fully loaded factory.
Let’s assume that most steps in the factory have processing times of 30 – 60 minutes, running on non-batch tools, and the factory is fully loaded = the capacity planners tell you, you can not start more wafers. What would be an acceptable average lot wait time for these steps in your opinion ?
Please vote below, what you would see as good / o.k. / acceptable:
I will share and review the results in my next post.
Bottlenecks, part 3
Merry Christmas and Happy Holidays !
I hope everybody is having a good time with friends and family and after a lot of good food is ready sit down and discuss more details about factory bottlenecks. In today’s post I will start zooming in on the 3 not grayed out metrics from the poll results picture below:
To disclose my personal opinion upfront: I think that “highest average lot wait time” (or metrics that are derived from this) is the most objective way to measure and define what is the true factory bottleneck. But lets discuss all 3 of the metrics a bit.
highest miss of daily moves vs. target
I think every factory in the world is measuring and reporting in some way the number of “Moves” – the number of wafers which were processed/completed on a step in a day, a shift, an hour, for the whole FAB or departments and down to individual process flow steps and grouped by equipment or equipment groups.
“Moves” is a very attractive and popular metric for a lot of reasons:
- Moves can be easily measured and aggregated in all kind of reporting dimensions
- based on the numbers of steps in a process flow (route) it is clear, how many Moves a wafer needs to complete, to be ready to be shipped
- Moves is a somewhat intuitive metric – humans like to count
- target setting seems to be pretty straight forward – “more is better”
I personally think, measuring a FAB via “Moves” as the universal speedometer can be very mis-leading and might drive behaviors – which are actually counter productive – for the overall FAB performance. At the very least a well thought through and dynamic target setting is needed to steer a factory which is mainly measured by the number of Moves. The danger of Moves as the key metric might be less in fully automated factories, since the actual decision making is done by algorithms which usually incorporate a lot of other metrics and targets and therefore Moves are more an outcome of the applied logic, less an overarching input and driver.
In manually operated factories, where operators and technicians make the decisions, which lot to run next and on what equipment, a purely Moves driven mindset can do more harm then good – to the overall FAB performance.
I think a lot has been written and published on this topic and there are strong and different schools of thought out there, but I’m fully on board with James P. Ignizio’s view in his book
In chapter 8 of his book – titled
“Factory Performance Metrics: The Good, The Bad, and The Ugly”
“Moves” get a nice talk – in the “Bad and Ugly” department – for the very reason, that Moves can drive counter productive behavior. If you are interested in this topic – I strongly recommend reading the book.
Before I jump to the next metric – I just wanted to say – that I think that Moves are important to understand and is a useful indicator if used within the right context, but not “blindly” as the most important indicator, which drives all decision making.
highest amount of WIP behind a tool group
Almost one third of the voters picked this metric. Similar to Moves there are a lot of advantages to measure WIP:
- WIP can be easily measured and aggregated in all kind of reporting dimensions
- using “Little’s law” it is easy to define WIP targets
- WIP is a very intuitive metric, especially in manual factories – is my WIP shelf full or empty ?
In general – for daily operations – having a lot of WIP is seen as problematic, since it might lead to lots not moving, starvation of downstream steps and tools, long lot wait times before they can be processed. So high WIP is not a desirable status and very high WIP must be for sure a problem. I think here as well – it depends. For example it depends on what is the target WIP for the given context (like a tool group) to just try to lower the WIP as much as possible (“at all cost”) might lead to generating WIP waves in the factory and to underutilization and lost capacity.
Why do I not 100% subscribe to the highest WIP = the bottleneck ? It is simply, that the tool group with the highest WIP not necessarily has the worst impact on the FAB performance. Here are some data points for this:
Let’s assume we have a very small factory running a very short route – with only 30 steps. If we plot a chart showing the WIP (in lots) per step for each step and sort the steps in the order of the process flow – meaning lot start on the very left and lot ship on the very right – we get what is typically called a line profile chart.
In the picture below our factory is perfectly balanced ( if we define balanced as lots per step – another great topic to talk about) because on each step there are currently 3 lots waiting – or processing.
If we look a bit closer, different steps are of course processed on different tool groups, if we add this detail, the same factory profile looks like this:
For example tool group 2 has 2 steps in the flow and tool group 9 has 3 steps. Our bottleneck metric is the aggregation of the WIP by tool group (“highest WIP behind a tool group”). To find out, which tool group this is, we simply aggregate the same data from the line profile by tool group instead per step:
Tool group number 1 has the highest WIP of all tool groups in this FAB – it clearly must be the number 1 bottleneck – I do not think so. As discussed earlier, there is more content needed. For example, if tool group 1 is a scrubber process, which is typically in the flow a lot of times and it is an uncomplicated very fast process, having the overall highest number of lots there is not necessarily the biggest problem of the factory. Yes, one can argue, still it would be nice to have less WIP sitting at a scrubber tool set, but this is already part of the missing context, I mentioned earlier.
Measuring and reporting WIP is an absolute must in a semiconductor factory, but interpreting WIP levels and assigning them attributes like “high”, “normal” or “low” needs a very good reference or target value. Setting WIP targets should be done via math and science, to reflect what is the overall factory desired WIP distribution – in order to achieve the best possible FAB performance.
Before I close this topic for today – let me say: my simple “perfect balanced” line from the pictures above might not be balanced at all, if we incorporate things:
- different steps / different tool groups have very likely different capacities
- different raw processing times
- might be batch or single wafer tools
- might sit inside a nested time link (queue time) chain
At this point I will pause and hope that I could stimulate some thinking and of course would love to hear feedback from the readers out there. The next post will be fully dedicated to the last open metric …
Bottlenecks, part 2
A big thank you to everyone who voted in my little poll, here are the results:
I kind of expected a picture like this – but what does this mean ? Here is my interpretation:
Bottlenecks are widely known as the one thing one should work on 1st to improve the overall FAB performance. But it seems we have different opinions how to measure and therefore to define what is the bottleneck.
For a real existing FAB, that would mean if different people or groups use a different definition, they would very likely identify different tool groups as the bottleneck – for the very same factory ! Of course we did not yet discuss what type of bottleneck we are talking about: a short term current one, a long term planned bottleneck or any other definition. Nevertheless people would identify very likely different tool groups as the key FAB problem …
Before we discuss this a bit more, I think we need to clarify what is the meaning of “bottleneck for the FAB”. In my opinion the purpose of a FAB is to make money and in order to do this wafers need to be delivered to customers in a way that the overall cost is lower than the selling price. Selling price also means one needs to have someone to sell them to – the CUSTOMER. For the purpose of this bottleneck discussion I exclude topics like yield and quality, assuming these are “o.k. and in control”. I will just focus on the 2 other key metrics for “FAB performance”:
- deliver enough wafers in time –> customer point of view –> cycle time of the FAB
- manufacture enough wafers –> total cost / manufactured wafers –> cost per wafer –> FAB output
So in my opinion, a bottleneck is a tool or tool group which negatively impacts the cycle time of the FAB and therefore the FAB output in general, but more specific the output of the right wafers (products) for the right customers at the right time (aka on-time delivery)
With that in mind, I think we need to define the metric in a way that it measures the impact to these 2 parameters. In a semiconductor FAB the typical unit to track wafer progress through the line is a “lot”. Hence, in order to measure how good or bad a tool group impacts the flow of lots through the line, we need to look a lot related indicator. This disqualifies grey marked ones in the picture below and leaves us withe 3 potential candidates
Let’s have a look at the greyed out metrics.
highest planned tool group utilization
It is very tempting to pick this metric since very high tool utilization signals to some extend, we might reach capacity limits soon. Also it is widely known, that tool groups with high utilization tend to also generate high cycle times. So there is a good chance, that the true FAB bottleneck has a high or the highest utilization – but there is not guarantee – that this is the case. This very much depends also on the overall utilization profile of the factory.
Another interesting topic to discuss in a future post is: What means “high” utilization and “high” cycle time? Similar, how to define “FAB capacity”, which I will discuss also in a later post.
highest actual tool group utilization
Everything I wrote above for the planned high utilization is valid for the actual utilization as well. I just like to add at this point, comparing actual tool group utilization and planned tool group utilization should be a frequent routine, to understand how close or distant the capacity model is able to follow the actual FAB performance – or should I say the the actual FAB is able to follow the capacity model ? You guessed it, an interesting topic for another post …
Before we move on into the next metrics, I like to spend a few thoughts on the topic factory utilization profile. The factory utilization profile is a chart of all tool groups, showing their average utilization ( planned or actual, for selected time frame, like last 4 weeks or last 8 weeks) and the tool groups are sorted in a way, that the tool group with the highest utilization is on the left and the one with the lowest utilization is on the right. A theoretical example is shown below:
Different factories will have different utilization profiles. Even the very same factory will have different utilization profiles over time if things like wafer starts, product mix, uptime or cycle time change. So I always thought it is a very good idea, to keep an eye on that and also compare the profile planned data vs. actual data. An example of comparison (with dummy data) is below.
For example: Look at tool group number 3 ! How likely will become #3 a problem in FAB A vs. in FAB B ?
I think you get the general idea, but there is much more interesting stuff to read out of FAB utilization profiles. Before we go there – have you lately checked / seen your FABs utilization profile ?
most often discussed tool group
This metric has some advantage, since it is not focusing on one specific indicator and if a tool groups is very often in focus, it has for sure some problematic impact on the overall line performance. I rather would choose real data based metric, but for FABs with less developed automatic data generation and data analytics capabilities it is a usable starting point. I also like about this approach – once used for some time – it will inherently drive the demand for a more data based approach – to find out, why is a tool group discussed so often and where to start with improvement activities – which in today’s manufacturing world is an absolute must in my opinion.
highest OEE value
OEE it feels had its peek time when a lot of people talked about it, but it seems lately the topic became a bit quieter. The OEE method itself has its value, if used on the right tool groups with the right intentions. If applied solely to increase the name plate OEE value of every tool group in the FAB, it can become quickly counter productive and hinder the overall FAB performance ( at least if FAB performance is defined and measured as proposed in this post) In my active days as an FAB Industrial Engineer I often used the slogan:
“… if the OEE method is used the right way, its target should be not to increase the OEE value of the tool group, but increase the tool groups idle time …”
If OEE projects are aiming in that direction, they will for sure help to improve the overall FAB performance, but as the key metric to identify the biggest bottleneck I would not recommend to use OEE.
lowest uptime or availability
As mentioned above, uptime is a tool or tool group focused metric and for sure a very important one in every FAB. While low uptime is absolutely not desirable, it is not a good indicator if the tool group is indeed a factory bottleneck, since it will not tell us anything about the actual impact on the FAB without other information.
At this point I will stop for today. In my next post I will spend a bit more time on the 3 remaining – lot related – indicators and will also share, which one I think will be the most useful one to use. As always, I would love to hear feedback from you via a comment. One last thing: I will eventually stop announcing every new post via LinkedIn, so if you want to get notified when there is new content here, please use the email subscription form below
Happy Holidays !
Bottlenecks, part 1
Almost 15 years ago I had the opportunity to attend a 4 day seminar with the authors of the well known book “Factory Physics” LINK
In the opening session we talked about what is limiting factory performance and sure enough bottlenecks came up. The question was asked , what can be done to improve a bottleneck. After a lively discussion between all attendees about what they have done or what they think should be done, Dr. Mark Spearman stated:
“… I propose you walk on the factory floor and look at the tool or tool group and see if it is indeed running (at full speed and efficiency) …”
I had a pretty big “aha !” moment and I remember this, like it was yesterday. But this proposal comes with another interesting challenge:
How do we know what is the factory bottleneck ???
I think to answer this question correctly is the foundation for a lot of things. In its simplest form, the correct answer would lead the folks who actually want to see the bottleneck on the floor to walk to the right tool/tool group. Obviously, there is much more connected to that, for example:
- where to spend resources for improvement activities
- if the bottleneck capacity is used to define the overall FAB capacity, it would be great, if the correct tool/tool group was identified
- where to spend capital to buy another tool
How do we find out, what is the factory bottleneck tool group ? One obvious answer is lets look into data – what data – and how do we know it is indeed the bottleneck. The answer becomes quickly ” … it depends …”
It depends on what is the definition metric and I have seen a few of them so far:
- highest tool utilization as per capacity planning numbers
- highest tool utilization as per actual numbers (daily, last week , 4 weeks ?)
- highest amount of WIP behind tool group
- highest average lot wait time at the tool group
- highest miss of daily moves vs. target
- frequency / intensity a tool group is discussed in morning meeting as a “problem kid”
- lowest tool group uptime ( or availability)
- highest OEE value
I’m pretty sure all of these metrics have some value, if used in the right context. I do have my own opinion, what I would select as the key metric, to declare the FAB bottleneck, but I really like to get some discussion going here, therefore I like to run a little poll, to see what the majority would select as the key metric:
I can’t wait to see the results. I’m fully aware that the answer selection is not that straightforward without more content – so if you like to provide thoughts, please use the comment functionally at the bottom.
I will share and discuss the results in my next post, sometime before the holidays
I finally decided to start my own blog. It will be all about – surprise –
Factory Physics and Factory Automation
Why am I doing this ?
Over the years I had the chance to work very closely in different companies and their semiconductor factories and I found that especially in the non leading edge companies/FABs a lot of folks are very interested in these topics – but often even basic principles are not known or understood. This was often true for all levels throughout the organization, from operators up to the senior level leadership.
Throughout my professional career I enjoyed learning about these principles and using them for active decision making. I also realized that I liked sharing thoughts about those principles.
To keep this going also in the future, I will start in a loose frequency posting topics, questions and more. I hope you will get something out of it for your daily business and also contribute to a fruitful discussion and exchange.
Stay tuned for more and if you have suggestions for topics, please let me know, I will for sure give them a try.
enter your email address and click subscribe:
Leave a Reply